Big Data News – 18 Jan 2016

Top Stories
Europe's top antitrust authority is on the lookout for companies using big data to stifle competition, although it hasn't spotted any problems yet, according to Competition Commissioner Margrethe Vestager. It's good news when companies use data to cut costs and offer better service, the European Commission's competition chief said at the DLD conference in Munich on Sunday. "But if just a few companies control the data you need to satisfy customers and cut costs, that could give them the power to drive their rivals out of the market. And with less competition, there's a risk that there won't be enough incentive for companies to keep using big data to serve customers better," she said.

Above the Trend Line: machine learning industry rumor central, is a new recurring feature of insideBIGDATA. In this column, we'll present a variety of short news items such as people movements, funding news, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz.

One good point to keep in mind is that betas have problems. That's what they are released to find.

Nothing of quality in life is free. Most would agree with this statement. When it comes to the Internet, we have a wealth of knowledge available at our fingertips, information that would have required the investment time and money to acquire just a few short decades ago. In order for us to be able to maintain this availability, sometimes we have to pay a price.

In 2025, a shopper will walk into a store, try on clothes, pick the colors and a clerk will use a handheld device to have those items shipped to the customer's home address. The store won't have a big inventory of clothes so shoppers won't walk out with big bags. The inventory will be held in a warehouse and then shipped out as if the items were purchased online. This, according to a report from IBM, is the future of retail. "We're on the cusp of a lot of change," said Stephen Laughlin, a vice president and general manager at IBM. "It's not just about the technology, but it's about business models… You need brick and click together, but the roles are going to evolve in terms of how they work together. We'll see the role of the store becoming a showroom."

News: IRIS' advanced machine learning helps clients adapt and react against sophisticated attacks.

This is the second part in our "Data Management Platform" series of articles. In this part, we will outline the 10 key benefits to use a Data Management Platform. You can find the first part here . Are you as a marketer starting to wake up and smell the coffee about the connection between 'Big Data' and 'Real-time Advertising'? Have you finally noticed how these terms have come to dominate the lexicon of the marketing press? Well, if you haven't we've got something to say about it. This all got started by the consumers' increasing levels of digital media consumption across an ever widening plethora of devices and touch points. Aside from the positive impact on the digital advertising industry, the adoption of a DMP above all solves a range of different business challenges and thus its channel-agnostic role continues to expand beyond the remit of just the digital world. Let me show you how in today's day and age it is crucial to use a Data Management Platform and how you can't go on without one.

GUEST: Even for the smallest, leanest startup, bigger is better when it comes to data mining. The more data sources you have and the more data you can analyze, the better your conclusions — and the more successfully you can serve your specific customer base. The biggest lesson my company (GameSalad) learned in designing our own data-driven startup is that it's better to wire in analytics and gather intelligence before you build — and to focus obsessively on the customer from day one. Here are five simple steps that will help you mine with the big dogs. 1. Gather customer data before you build The first step before building is to gather data from your customers. Challenge your assumptions about who you think they are and who you want them to be. It might be as simple as surveying the people who visit your site.

This week in New York City is the National Retail Federation (NRF) where hospitality and retail organizations from around the globe gather to learn about the next big thing in their industries, share best practices and identify ideas for taking advantage of technology to drive better business decisions, create business opportunities and capture efficiencies as they face rapidly changing customer dynamics. Hospitality and retail organizations constantly face fierce competition and who can capitalize on changes in customer behavior to capture profit opportunities is critical. Caesars Entertainment is recognized as one of the world's most geographically diverse casino entertainment operator. However, what may be surprising is to learn that Caesar's customers are spending the majority of their personal income on non-gaming entertainment, lodging and dining experiences. The change in customer demographics & spending patterns is significant change to Caesars Entertainment original business model.

In our big data roundup for the past week, we've got news from Microsoft about what it's been doing with statistical modeling language R, IBM's acquisition of a real-time fraud analytics company, Baidu's donation of some machine learning efforts to open source, and a management shakeup at Apache Spark company Databricks.

At OdinText we've found that the best way to identify all key drivers in any analysis really, especially in customer experience management (including but not limited to KPI's such as OSAT, Net Promoter Score, Likelihood to Return or other real behavior) is through a dual process combining a theory-driven (aka "top-down") and a data-exploratory or data-driven approach (aka "bottom-up").

As big data and related technologies are coming on strong in 2016, many members of the vendor ecosystem are weighing with predictions for what we'll see in the next 12 months. Courtesy of Narrative Science, CEO Stuart Frankel made a list below of 2016 predictions for artificial intelligence and big data. I do beg to differ about data scientists not being as sexy as once thought!

In the last decade, a range of digital technologies and services have hit the market and moved quickly from niche use to the mainstream like Facebook, & smartphones.  In the next…

GUEST: The annual Consumer Electronics Show in Las Vegas has just wrapped up. With 160,000 attendees, 3,800 exhibitors, and almost 2.5 million square feet of exhibit space, this is the largest electronics show in the world. The array of new products and technologies has been featured on virtually every media outlet from Good Morning America to Conan. There is something for every interest on display, from self-driving cars and drones to video games, virtual reality, and smart appliances. I saw products for exercise, infant care, elder care, food preparation, smart trash cans, and toilets. It is really exciting to see the extremes of modern day technology.

This week brought Skype and Outlook updates, a new Windows 10 preview build, and the end of support for Internet Explorer 8, 9, and 10.

NewVantage Partners, a management consulting firm whose exclusive focus is the delivery of world-class expertise in Data Strategy and Big Data Execution for Fortune 1000 clients, has released the results of its 4th annual Big Data Executive Survey, entitled "An Update on the Adoption of Big Data in the Fortune 1000".

In this special guest feature, Miles Johnson and Sam Hochgraf of IBB Consulting Group, discuss how to build small, highly-specialized teams of experts that can work collaboratively to support the data science pipeline.

Good news for businesses using Microsoft's Azure cloud platform: their infrastructure bills may be shrinking come February. Microsoft announced that it will be permanently reducing the prices for its Dv2 compute instances by up to 17 percent next month, depending on the type of instance and what it's being used for. Users will see the greatest savings if they're running higher performance Linux instances — up to 17 percent lower prices than they've been paying previously. Windows instance discounts top out at a 13 percent reduction compared to current prices. Microsoft Right now, the exact details of the discount are a little bit vague, but Microsoft says that it will publish full pricing details in February when they go into effect. Dv2 instances are designed for applications that require more compute power and temporary disk performance than Microsoft's A series instances.

IBM made waves at the 2016 Consumer Electronics Show with a variety of innovations in cognitive computing. From IBM CEO Ginni Rometty's keynote address to conversations with dinosaurs, CES 2016 offered innovation at every turn.

IBM made waves at the 2016 Consumer Electronics Show with a variety of innovations in cognitive computing. From IBM CEO Ginni Rometty's keynote address to conversations with dinosaurs, CES 2016 offered innovation at every turn.

News this week included a decline in PC sales, five modems getting certified compliant, and a fake IoT data black market.

There's been no end in sight to the advance of machine learning into the world of enterprise software, but this week a new online tool debuted for the purpose of sheer fun: predicting the winner of the Super Bowl. Built on WS02's open-source Machine Learner technology for predictive data analysis, BigDataGame uses Apache Spark and random forest regression to compare teams and make predictions.

Think you know your Adabas from your Azurill? Then try the hilarious Pokemon or Big Data quiz by Tim Carry (and don't miss the captions)! (With thanks to Bob Muenchen.) If you're having trouble with…

Companies must accommodate the explosion of data, support a healthy data storage process, and apply data backup and recovery best practices.

Dedicated time for education can be elusive for even the best time managers among us. Find out how you can take charge of your learning by scheduling specific educational opportunities for yourself, and discover how a cohesive analytics platform can free you to focus on getting the education you need.

Dedicated time for education can be elusive for even the best time managers among us. Find out how you can take charge of your learning by scheduling specific educational opportunities for yourself, and discover how a cohesive analytics platform can free you to focus on getting the education you need.

Contending with unstructured data is no longer a priority reserved for the most well-financed, IT-savvy organizations, like Google and Facebook. As the world's data continues to increase at nearly exponential rates, the reality is the majority of that data is unstructured and incongruent–in its native form–with time-honored tables and SQL-based modeling. The prominence of a critical confluence of technological forces including mobile, social, and the cloud has produced a situation in which the majority of unstructured data is created by consumers. Consequently, traditional methods of managing, transforming, analyzing and, in some instances, even applying that data are no longer sufficient when incorporating such external, unstructured new data into the enterprise. Time-sensitive data simply cannot wait for conventional preparation processes and ETL for historic business intelligence analysis.

There's been no end in sight to the advance of machine learning into the world of enterprise software, but this week a new online tool debuted for the purpose of sheer fun: predicting the winner of the Super Bowl. Built on WS02's open-source Machine Learner technology for predictive data analysis, BigDataGame uses Apache Spark and random forest regression to compare teams and make predictions.

Cambridge Semantics, a leading provider of Smart Data analytic and data management solutions driven by semantic web technology, announced that it has acquired the intellectual property of SPARQL City and hired former CEO and founder Barry Zane and other top executives from the firm.

At some point we should all take stock of the way we use our valuable data resources and decide if the benefit is really worth the cost.

The IRIS Analytics software is designed to increase the amount of fraud transactions detected by as much as 40 percent right out of the box.

The clicks and searches of 20 million anonymous Yahoo users could help researchers in a number of different academic institutions expand the boundaries of machine learning and deep learning.

List: Whether you are checking your ski break weather or staying put for the big freeze these supercomputers are here to help.

As telecommunications companies offer a wider range of services, the amount of data they must process is increasing exponentially. This podcast discusses how telcos can use Apache Hadoop to keep up with rapid data growth.

Rather than move data between the ECM and a third-party BPM app, it makes more sense to layer BPM templates on the ECM platform to create those apps.

Even though PCs have undergone changes, buyers and investors still conceptually see them much like they were in the 80s and 90s.

Learn how improve Apache HBase usability by creating a custom formatter for viewing binary data types in the HBase shell. Cloudera customers are looking to store complex data types in Apache HBase to provide fast retrieval of complex information such as banking transactions, web analytics records, and related metadata associated with those records. Serialization formats such as Apache Avro, Thrift, and Protocol Buffers greatly assist in meeting this goal, The post How-to: Create and Use a Custom Formatter in the Apache HBase Shell appeared first on Cloudera Engineering Blog.

Big data and analytics are set for a big 2016, as more devices and types of software are connected and exchange information. Here are some considerations for businesses as they look to maximize the value of all this data.

Hortonworks, Inc. (NASDAQ: HDP) announced Hortonworks Partnerworks, a comprehensive global program to support and enable partners selling, implementing and innovating with Hortonworks solutions. The Hortonworks ecosystem is strong and thriving with over 1,500 partners involved today.

In anticipation of his upcoming conference presentation, Predicting Online Marketing Success: Five Lessons Learned at Predictive Analytics World San Francisco, April 3-7, 2016, we asked Matt Bentley, Founder of CanilRank.com, a few questions about his work in predictive analytics. Q: In your work with predictive analytics, what behavior or outcome do your models predict? A: We're… The post Wise Practitioner – Predictive Analytics Interview Series: Matt Bentley at CanlRank.com appeared first on Predictive Analytics Times.

In case you missed them, here are some articles from December of particular interest to R users.  A look back at accomplishments of the R Project and community in 2015. Segmented regression with the "segmented" package, applied to long-distance running records. Creating multi-tab reports in R with knitr and jQuery UI. New version 2.0 update to ggplot2 adds extensibility and many improvements. A circle diagram of translations of "Merry Christmas". Upcoming R events and conferences, and sponsorship for R user groups. How to embed images in R help pages. An Azure ML Studio fraud detection template…




In case you missed them, here are some articles from December of particular interest to R users. A look back at accomplishments of the R Project and community in 2015. Segmented regression with the…

How fast can compiled Python be compared to, say C?  You'd be surprised by the answer.  The study below contradicts common wisdom that you cannot get close to C for matrix oriented computation.  A good example of a study supporting the common wisdom is Sebastian F. Walter's Speed comparision Numba vs C vs pure Python at the example of the LU factorization.  He has shown that Numba, a recent compiler that can be used with Python, is between 2x and 3x slower than C code on a naive implementation of LU factorization. 

Today we announce the acquisition of IRIS Analytics GmbH, an award-winning provider of real-time payments fraud prevention software. This strategic acquisition aims to strengthen our entire Safer Planet portfolio with the addition of IRIS's advanced machine learning techniques and automated model generation to quickly anticipate, adapt and alter operations to help protect institutions and their customers.

Figuring out a corporate-wide PC upgrade timetable has never been easy for enterprise IT managers. These days, cloud-based applications and evolving Bring Your Own Device (BYOD) policies are among the factors complicating your decision-making process. Here are four tips to help you determine the best PC refresh strategy for your enterprise users.

Do you know what your customers want when they visit your website? And do you know what makes them bounce away to visit another site instead of choosing products or services and continuing to checkout on your site? All too often, business owners and executives accept customer behaviour on their websites as something natural and unavoidable. This is particularly unfortunate when they could be using big data and analytics to not only find out what obstacles or challenges are preventing them from completing more sales, getting higher overall values per sale, and increasing customer retention as well. All of this is possible with the help of big data and a digital transformation.  What Is a Digital Transformation and Why Is It Important? Before we go any further, we must be clear — big data and analytics are nothing more or less than tools.

This infographic is based on independent research and surveys conducted by Ventana Research and the Aberdeen Group, and supports the notion that we have mentioned here more than once — i.e., that data complexity is on the rise. This is due to the size and number of data sources growing exponentially with each passing year due to technological advancements in data collection, storage and querying technologies, along with increased adoption of business analytics tools.

Foursquare CEO Dennis Crowley boots himself upstairs. In a probably related development, the company gets a new funding round, that's said to peg Foursquare at far less than its previous valuation. It's a Series E round of $45 million from Union Square, Morgan Stanley, DFJ, Andreessen Horowitz, and Spark. Presumably the VCs weren't too keen on Crowley's handling of the location-based app company. In IT Blogwatch, bloggers check out the checkins. Your humble blogwatcher curated these bloggy bits for your entertainment.

Dennis Crowley, Foursquare's remaining co-founder gets kicked up to the boardroom, "of his own volition." In news that may or may not be connected, 4sq gets yet another honking chunk of change to burn through–but the price is said to value Foursquare at way less than it was worth previously. This will be the Series-E round, worth $45 million, led by Union Square. Other investors number Morgan Stanley, DFJ, Andreessen Horowitz, and Spark. Presumably one or more of these VCs weren't keen on Crowley's performance at the location-based service, so asked him to tag in the old COO and CRO. Meanwhile, he's been asked to "make something awesome."

Although financial criminals exploit a wide range of channels in their attempts to defraud banks, check deposit fraud takes a notable toll on financial institutions. But even this type of fraud is only one of several particularly acute threats to customers' accounts. For example, banks must also guard against account takeovers, for example, which cost financial institutions billions of dollars each year. However, you can help prevent financial loss at your institution by using advanced analytics to anticipate, prevent and adapt to threats while remaining compliant.

By: Abdul Razack, SVP & head of platforms, big data and analytics at Infosys Artificial intelligence is rapidly changing the business and consumer landscape, but has it reached its full potential? With the wide adoption of automation and machine learning among organizations, we're closer than ever before. And in 2016, we will continue this momentum and see artificial intelligence redefine the future of work, deliver on the promise of big data, and even transform consumers' lives.   Artificial intelligence will define the future of work The pace at which large-scale organizations more widely adopt artificial intelligence to replace manual, repetitive tasks will rapidly increase.

Many observers assume data and analytics can improve the way governments deliver services, but how can governments do so? The first step is to set realistic and attainable goals, rally stakeholders around meeting those goals, and then use data and analytics to measure and monitor progress.

Weather, terrain and access challenges can be especially damaging to forestry environments. Check out how the University of Washington, Seattle's College of the Environment, in collaboration with the Washington State Department of Natural Resources, applied advanced analytics to forestry management approaches that helped address these challenges.

Weather, terrain and access challenges can be especially damaging to forestry environments. Check out how the University of Washington, Seattle's College of the Environment, in collaboration with the Washington State Department of Natural Resources, applied advanced analytics to forestry management approaches that helped address these challenges.

Retailers may struggle against low margins and resource scarcity, but that can't keep them away from the National Retail Federation's BIG Show. Learn how attending this year's BIG Show can help you drive your retail business to new heights–and find out why thousands of other people from around the world are already packing their bags for the centerpiece of what's happening in retail.

Banks are seeking better fraud protection with deep analytics. These tools go beyond simple account monitoring in an attempt to mitigate the billions of dollars fraudsters steal each year through online and mobile channels.

Interesting to read what some statisticians write about data science, on the American Statistical Association (ASA) blog. Most of us don't care about our job title – there are so many breeds of statisticians and data scientists after all – and they do overlap to some extent. While I was once a statistician, I now call myself data scientist or business scientist. Anyway, below are some extracts from very lively and interesting discussions taking place on the ASA blog. Tommy Jones posted The Identity of Statistics in Data Science on the American Statistical Association (ASA) website  in December 2015. In his long and very interesting article, he wrote (this is just a tiny extract): Judging by current statistics curricula, statistics is more closely tied to the mathematics of probability than to fundamentals of data management.[…] 

In this Blog 3 — We will see what is Apache Spark's History and Unified Platform for Big Data, and like to have quick read on blog 1 and blog 2. Spark was initially started by Matei at UC Berkeley AMPLab in 2009, and open sourced in 2010 under a BSD license.

This entry was posted in News and tagged , , , , , , , , , . Bookmark the permalink.