Big Data News – 12 May 2016

Today's Infographic Link:

Top Stories
Salesforce experienced an outage and service disruption to the NA14 instance, sending customers to Twitter to complain and organizations to evaluate the best way to work with cloud software providers.

Date: May 6, 2016 Location: SVDS HQ Speaker(s): Harrison Mebane and Christian Perez Overview On May 6th, SVDS was honored to host an Open Data Science Conference (ODSC) Meetup in our Mountain View headquarters. The goal of ODSC is to bring together the global data science community to help foster the exchange of innovative ideas and encourage the growth of open source software.

The Opera Web browser has a new 'power-saving' feature. Opera claims you can get 'up to' 50% more battery life — but is that likely? Err, NO! [Developing story: Updated 8:42 am PT with Opera response] Yes, the actual software tweaks will make a difference, but the tests Opera's quoting are skewed, unscientific, and compare apples to oranges. But what do you expect from a company that's trying to get bought by a Chinese consortium for more than $1.2 billion? Your humble blogwatcher curated these bloggy bits for your entertainment.

Thanks to Pedro Boado and Abel Fernandez Alfonso from Santander's engineering team for their collaboration on this post about how Santander UK is using Apache HBase as a near real-time serving engine to power its innovative Spendlytics app. The Spendlytics iOS app is designed to help Santander's personal debit and credit-card customers keep on top of their spending, including payments made via Apple Pay. It uses real-time transaction data to enable customers to analyze their card spend across time periods (weekly, The post Inside Santander's Near Real-Time Data Ingest Architecture

  Better predictions. Faster! Making sense of the mountains of data collected on a daily basis requires specialized data science skills that are hard to come by and hard to keep.  But what if some of these specialized tasks could be augmented or even eliminated by machine learning? Such as: Simplified and accelerated data preparation. No need to spend 80% of your time wrangling data before analysis begins. Automated selection and optimization of predictive models. Eliminate guesswork and manual processes. Deployment of the best models into production with a single click.

Whereas more emphasis is being put on Android security, iOS is seeing an increasing number of vulnerabilities posing security risks to users.

This excerpt focuses on leveraging stakeholders to prepare an organization for change, an essential practice to ensure a successful product delivery.

Apache Hadoop® exists within a broader ecosystem of enterprise analytical packages. This includes ETL tools, ERP and CRM systems, enterprise data warehouses, data marts and others. Modern workloads flow from these various traditional analytical sources into Hadoop and then often back out again. What dataset came from which system, when and how did it change over…

News: The company will provide IoT solutions on Lumada – a new, open and adaptable platform.

What makes for a good R package? With over 8,000 packages up on CRAN the quantity of packages is clearly not an issue for R users. Developing an instinct to recognize quality, however, both requires and deserves some effort. I regularly spend time on Dirk Eddelbuettel's CRANberries site investigating new packages and monitoring changes in old favorites in order to recommend packages for inclusion in MRAN's Package Spotlight page.

Will data scientists disappear soon?  I am asking the question as I see more and more papers about why data scientists may be a parenthesis in history.  See Will The 'Best Job Of 2016' Soon Become Redundant? by Bernard Marr for a recent example. To Marr's point, there is indeed a number of software and cloud services aiming at automating data science.  Marr cites IBM Watson Analytics as a great example of this.  I tend to agree, and not only because I am an IBM employee.  Watson analytics does indeed automate data science: you upload data to it, and it analyzes it automatically for you.   

In the age of Big Data, companies are pushing to collect and analyze as many data sources as possible so they can enrich the customer experience and drive sales. One way to gather rich data is by integrating Customer Relationship Management (CRM) tools with business intelligence (BI) platforms.

Organizations are facing a digital transformation, as I have written, that is rapidly changing the applications and services that business use to operate and deliver information. This new digital generation addresses the expectations of consumers and business partners for information and service in real time. One example of it is enterprise messaging.

The videoconferencing industry is growing at an unprecedented rate. Experts from IDC reportspredict it will be worth $6.2 billion within the next four years.Why is videoconferencing becoming so popular? A number of global factors are driving the trend. It is one of the many ways that cloud technology is improving scalability for many businesses.

SAP SE (NYSE: SAP) unveiled new offerings for customers and partners on the comprehensive SAP HANA® Cloud Platform for the Internet of Things (IoT).

UCHealth, a nationally-recognized healthcare system based in Colorado, announced that it has partnered with LeanTaaS, the leading lean and predictive analytics company based in Silicon Valley, to improve utilization of its operating rooms.

The pace of innovation, vendor lock-in, production sustainability, cost-effectiveness, and managing risk… In his session at 18th Cloud Expo, Dan Choquette, Founder of RackN, will discuss how CIOs are challenged finding the balance of finding the right tools, technology and operational model that serves the business the best. He will discuss how clouds, open source software and infrastructure solutions have benefits but also drawbacks and how workload and operational portability between vendors and platforms give control back to the users and drives innovation.

Download the CIO May 2016 Digital Magazine.

In his final keynote speech at the 2011 Apple Worldwide Developer's Conference, Steve Jobs remarked that, "If the hardware is the brain and the sinew of our products, the software is its soul." Jobs' intimate understanding of and vision for his products stands out as one of the key reasons behind Apple's success.

Having lost faith in industry standards groups that are dominated by vendors, ONUG intends to set the IT network integration agenda from here on out.

Our CTO, Anders Wallgren, recently sat down to take part in the "B2B Nation: IT" podcast — the series dedicated to serving the IT professional community with expert opinions and advice on the world of information technology. Listen to the great conversation, where Anders shares his thoughts on DevOps lessons from large enterprises, the growth of microservices and containers, and more.

Late in 2015 PwC and Iron Mountain published a report on "How organizations can unlock value and insight from the information they hold". The report was based on an exhaustive survey of 1800 top executives at medium and large companies. The results were quite deflating for the supporters of big data and analytics. 43% of the organizations surveyed said they got "little tangible benefit from their information" and a further 23% said they "derived no benefit whatsoever". That's 2 out of every 3 organizations reporting disappointing impact from their data-related initiatives.

The journey of big data doesn't end with the analytics of it.It goes up to the level of actionable intelligence, where the analysis of big data is put to use of decision making.

Opera Software today brings a new power-saving mode. The browser-maker claims it can add 50% to your laptop's battery life — but is that credible? Err, NO! Certainly the underlying software tweaks will make some difference, but the tests Opera's quoting are skewed and unscientific. I can't help thinking this is just part of the company's strategy to get bought by the Chinese for more than a billion dollars.

Retailers now swim in more data than they know what to do with. And they're working overtime to digest that data — collected from e-commerce transactions and via merchandising, CRM and POS systems — to glean useful insights. Many are turning to predictive analytics in an effort to use cutting-edge data science to forecast trends and personalize messaging. Data even plays a role in brick-and-mortar stores, where new metrics allow retailers to study in-store behavior at a level of detail never before possible, says Andy Wong, a partner at digital retail consultancy Kurt Salmon Digital.

Retailers now swim in more data than they know what to do with. And they're working overtime to digest that data — collected from e-commerce transactions and via merchandising, CRM and POS systems — to glean useful insights. Many are turning to predictive analytics in an effort to use cutting-edge data science to forecast trends and personalize messaging. Data even plays a role in brick-and-mortar stores, where new metrics allow retailers to study in-store behavior at a level of detail never before possible, says Andy Wong, a partner at digital retail consultancy Kurt Salmon Digital.

The Force awakens, is it Agile or are we just going through Scrum motions? Michael Nir speaker and Agile coach shares the expert best practices; Too much Scrum might lead us to the dark side of the force. Being Agile rather than doing Scrum — focus on what we want to achieve – getting the right products that our users want quickly, using fast feedback loops, employing continuous removal of waste

We are less than a week away from SAPPHIRE NOW, excitement is mounting, and the SAP Cloud for Analytics demo station is an area on the show floor you won't want to miss! You'll hear a lot about SAP Cloud for Analytics and may want to learn more or have specific questions about the product. Whichever…

Over the past 12 months, I've been digging in the data trenches. OK, mostly I've been sitting next to the smarter people digging through the trenches and oversimplifying what they were doing in reports to management. Very few IT projects are truly unique — and the ones that sound unique often fall into relatively predictable buckets. Lucky for you, I've decided to come up for air and share the top eight types of projects I've seen over the past 12 months.

"Disruption" isn't the same as "stupid," but they sometimes sound similar. At least, they do when uttered by a certain strain of Silicon Valley entrepreneur. This thought struck me while listening to a Valley exec at an enterprise software conference. He stumbled through PowerPoint ("How do you people use this app? I'm a Keynote guy"), agonized over how he could "possibly get used to Exchange after running his startup on Gmail" (his company had recently been acquired by a large software vendor), and generally made it clear that he had no idea how real companies work.

Executive Editor Ken Mingis, Apple expert Michael deAgonia and Multimedia Content Editor Keith Shaw mostly fawn over Apple's just-updated MacBook, then debate whether trade shows like Interop and CES are on their way out.

Cloud-based NCLC (No-code/low code) application builder platforms empower everyone in the organization to quickly build applications and executable processes that broaden access, deepen collaboration, and enhance transparency for all team members. Line of business owners (LOBO) and operations managers know best their part of the business and their processes. IT departments are beginning to leverage NCLC platforms to empower and enable LOBOs to lead the innovation, transform the organization, and build the infrastructure for lasting productivity and agility.

InfoQ interviewed Adam Tornhill, author of Your Code as a Crime Scene, about software evolution and mining social information from code and how to use this to increase the understanding of large codebases, how to create a geographical profile of code, and the benefits that can be gained from techniques like mining social information and geographical profiling.

Company aims to drive adoption of modern and analytic database platforms with the new SharePlex release

The Azure Storage Type Provider brings statically typed access to Azure storage data sources: Blob, Table and Queue. Isaac Abraham, maintainer of the project, recently presented how to interact with these data sources using the type provider.

Aaron Bedra focuses on describing a system as a series of models that can be used to systematically and automatically generate input data and ensure that a code is behaving as expected. Bedra discusses property based testing and how it can help one build more resilient systems and even reduce the time needed to maintain a test suite. By Aaron Bedra

A cursory look at some of the numbers on data breaches can be misleading. For instance, while there were just 400 reported cases of data breaches over the course of 2014, compared to 500 per year from 2010 through 2013, this is not indicative of a better cyber security environment. Experts believe that the trend is toward fewer worse incidents and more moderately severe data breaches. Furthermore, attacks are growing in terms of sophistication and funding, meaning many breaches are not even discovered, or are discovered only after the fact. Here are the realities regarding today's data breaches.

This list was started in 2012, updated in 2014 and also very recently according to the author. It was compiled by, and broken down by degree (master / bachelor / certificate / doctorate) and location (online / on-site.) You can find other classes/training here — some of them are free. We will soon publish a list of data science and machine learning courses and data camps, so sign-up with us to receive the most up-to-date information. Below is a small selection. You can find the most comprehensive and updated version here, or download an older version (Excel spreadsheet). 

Jonathan Graham presents how to implement our own versions of the Clojure functions reduce, count, filter, map and pmap. The pace starts gently for those with little Clojure experience to follow, but then dives deep to provide a full understanding. By Johnathan Graham

InfoQ is currently attending the Integrate 2016 event in London, where Microsoft Integration technologies take center stage. The event is a hosted by BizTalk360 in collaboration with Integration related Microsoft Product Groups.

Dell announced a major new release of its award-winning SharePlex database replication and near real-time data integration solution. Continuing its evolution beyond traditional Oracle-to-Oracle replication capabilities, the latest release of Dell SharePlex enables users to replicate Oracle data directly to SAP HANA, Teradata, or EnterpriseDB Postgres.

I had the opportunity to speak at TDWI in Chicago today. It was a tremendous venue and a well organized event. Thanks to the TDWI team. I spoke on the topic of machine learning and the big data…

How does your organization work to prevent insider threats? In this episode of Cyber Beat Live, listen as leading cybersecurity experts discuss pressing data security questions while describing how companies can reorient their security posture to thrive in an age in which trust seems inadequate.

MIT Lincoln Laboratory has been a world leader in interactive supercomputing since the 1950s. In 1955, TX-0, the first fully transistor-based computer, was built to support a wide range of research at the laboratory and the MIT campus, and became the basis for the second-largest computing company in the world, Digital Equipment Corporation. In 2001, the laboratory developed Parallel Matlab, which enabled thousands of researchers worldwide to use interactive supercomputing for high-performance data analysis. In 2008, the laboratory demonstrated the largest single problem ever run on a computer, using its TX-2500 supercomputer, a part of the system called LLGrid.

Hello Everyone,  The Chief Data Officer role is disruptive and becoming more essential across all industries. As such, the Chief Data Officer Summit, San Francisco, will be discussing the ever evolving challenges and opportunities presented by data.  Are you looking for networking opportunities? Want to be energized by fresh ideas? Looking for an insight into how other companies in the market are utilizing data?  Join us on May 26 & 27 to hear use cases and benchmarking practices from those leading data initiatives at The World Bank, Amazon, Bing, PayPal, P&G, US. Department of Commerce and Kaiser Permanente. See the full schedule here. 

Guest blog post by Jason O'Rawe — ODSC data science team contributor. Data science is an interdisciplinary endeavor, and it serves the purpose of extracting insight from varying sources of information. Various communities come together at Data Science Conferences to share their knowledge and promote innovation. It is not surprising, then, that the tools showcased by data scientists at ODSC East are myriad, but what are the most valued and popular programming languages in a data scientist's tool-box? A 2014 KD nuggets poll(1) suggest R, Python, SAS and SQL are among the top contenders.

There are many ways to choose features with given data, and it is always a challenge to pick up the ones with which a particular algorithm will work better. Here I will consider data from monitoring performance of physical exercises with wearable accelerometers, for example, wrist bands. The data for this project come from this source: In this project, researchers used data from accelerometers on the belt, forearm, arm, and dumbbell of few participants. They were asked to perform barbell lifts correctly, marked as "A", and incorrectly with four typical mistakes, marked as "B", "C", "D" and "E". The goal of the project is to predict the manner in which they did the exercise.

Get your questions answered in real-time during this one hour event. Register here  We create, interpret, and experience stories every day, whether we realize it or not. Our brains are constantly receiving input and stringing things together in order for us to make sense of the world. While our brains create countless stories, only the few great ones stay with us. These make us cry, laugh, or embrace a new perspective. Understanding how our brains interpret the world can help us become better storytellers.

Dell helps create a showcase of successful women who can become positive examples to young women in a variety of fields.

Two of the most risk averse industries are health care and financial services. Yesterday I wrote about how banks are increasingly using public IaaS cloud services. A new study this week finds that health care are warming to the cloud too. Two years ago HIMSS Analytics and Level 3 found that 22% of health care providers planned to use cloud for back-office functions. This year, the number more than doubled to 46.7%. Just over one in three respondents said they have some sort of patient engagement tools hosted in the cloud. Another popular use case for cloud was Health Information Exchanges.

Whether we're discussing the impact that the tsunami of big data is having on organizations or the cloud application takeover of traditional on-premises applications, the common foundation of such trends is an increasing demand for data. More accurately, there is a need for data that has been integrated and translated into a business context for analysis. That demand is making effective data integration — already a key component of data warehouse environments — even more important to business success. Data integration involves taking data — often from multiple sources — and transforming it into meaningful information for business executives, data analysts and other enterprise users.

Finding insights in an ocean of data has become one of today's most pressing business challenges, and software vendors are rushing to help. The latest is Adobe, which has added a host of algorithms in its cloud services to help brands uncover patterns and put them to work. Adobe's Creative, Document and Marketing Cloud services already use data science to help brands hone their message to customers, and the algorithms announced Wednesday add more capabilities.

There's no question about it: open source drives big data. Some may forget how the big data software we use or write about every day actually gets made. But here at the Apache: Big Data North America conference in Vancouver, British Columbia, the well of innovation that is the open source community is on full display. Open source software is fundamental to big data, says Roman Shaposhnik, who runs the Apache Incubator project for the Apache Software Foundation (ASF), the main sponsor of this event. "In a way, open source has won in the enterprise," says Shaposhnik, whose day job is director of open source at Pivotal. 

Businesses can benefit enormously from analysis-derived rules that enable understanding why certain events occur and the corresponding actions to take. Learn more about a widely used six-phase methodology for building predictive analytics models that can reveal hidden rules for meaningful business impact.

Heating-fuel delivery to rural customers in the US can be greatly inefficient when deliveries are made to half-full tanks. The practice can be costly for both the supplier and the customer. Take a look at one solution that utilized Internet of Things sensors and analytics to monitor fuel levels and enhance delivery schedules.

While the initial results are encouraging, there is very little data as to how containers perform in real-world production environments.

April 27th was the anniversary of the death of Karl Pearson, who contributed to statistics the correlation coefficient, principal components, the (increasingly-maligned) p-value, and much more. Pearson was one of a trio of founding fathers of modern statistics, the others being Francis Galton and Ronald A. Fisher. Galton, Pearson and Fischer were deeply involved with… The post Eugenics Journey to the Dark Side at the Dawn of Statistics

Finding insights in an ocean of data has become one of today's most pressing business challenges, and software vendors are rushing to help. The latest is Adobe, which has added a host of algorithms in its cloud services to help brands uncover patterns and put them to work. Adobe's Creative, Document and Marketing Cloud services already use data science to help brands hone their message to customers, and the algorithms announced Wednesday add more capabilities.

Forrester Research, Inc. has just released the Forrester Wave for Big Data Hadoop Cloud Solutions. IBM placed in the leaders category with IBM BigInsights on Cloud.

Deriving business value from data, then presenting it to key stakeholders, is an exercise in merging data and design for an audience. You see, an analysts' real job is not to find key insights, but to effectively persuade her audience to buy into the recommendations. After doing deep research and analysis, finding the right story is vital to effective communications. All this pressure to "be creative" with how you approach data may draw up feelings of impending dread, but that need not be part of the process. 

The rift between the Open Data Platform Initiative (ODPi) and the Apache Software Foundation (ASF) is on the mend, thanks in part to a peace offering by ODPi, an admission of being indelicate, and a $40,000 check. It may not pacify everybody in the Apache Hadoop community who feel threatened by ODPi's presence, but at least it's a start. With its financial commitment, ODPi becomes a gold sponsor in the ASF, which manages 350 open source projects, about 10 percent of which could be considered "big data" projects.

Flight is perhaps humanity's oldest dream, brought from concept to reality over the course of millennia. Today, businesses can chart a flight plan for their data, taking it where it needs to be even in cloudy skies and uncertain conditions. Find out how modern approaches to data and data architecture are helping companies deliver high-value cargoes safely and securely.

R is a powerful system for creating data visualizations. In fact, R gives you so many options for creating charts that it can be hard to know the best way to communicate effectively. To help you present your data as effectively as possible using R, there's a new (and free) e-book available to download: Effective Graphs with Microsoft R Open. Written by the mother-daughter team of Naomi Robbins (author of Creating More Effective Graphs and Forbes contributor) and Joyce Robbins (sociologist and R programmer specializing in data visualization) The book gives plenty of examples of common (and some not-so-common) data visualizations, with suggestions on how to customize them for the best effect.

R is a powerful system for creating data visualizations. In fact, R gives you so many options for creating charts that it can be hard to know the best way to communicate effectively. To help you…

Did you know that Big Data can help us predict who the future soccer stars will be? This is precisely what happened in Riyad Mahrez's case, PFA Player of the Year, who has conquered the English Premier League with Leicester City.

Linux has become a dominant OS for application back ends and micro-services in the cloud. Usage limits (aka ulimits) are a critical Linux application performance tuning tool. Docker is now the leading mechanism for application deployment and distribution and AWS ECS is one of the top Docker container services. It's more important than ever for developers to understand ulimits and how to use them in Linux, Docker and a service like AWS ECS. The purpose of ulimits is to limit a program's resource utilization to prevent a run-away bug or security breach from bringing the whole system down.

Dell announced a new release of SharePlex, its database replication and near real-time data integration product. The new release will be generally available on May 24. Among its features are added support for SAP HANA, Teradata, and EnterpriseDB Postgres. It enables users to replicate Oracle data directly to all three.

The really cool thing about big data and multiple data sources for today's advanced, web-based applications is that a variety of open source databases can provide specialized support for different application components. If you're involved in application development, discover why achieving polyglot persistence is the way to go.

Is your business able to manage and analyze your data? Learn how this IBM platform can allow you access to an expansive set of data and analytics services to help drive business decisions in an open and managed process.

Drone security is complicated because it encompasses so many different threats. This is creating challenges for programmers, lawmakers and the public.

GitHub has announced the Electron 1.0 milestone and a new pricing model including unlimited private repositories for paid plans.

By: Bala Deshpande, Conference Co-Chair, Predictive Analytics World for Manufacturing 2016 In anticipation of his upcoming Predictive Analytics World for Manufacturing conference keynote presentation, Changing the Way we Make Things: The Brilliant Factory, we interviewed Dr. Matteo Bellucci, Manager, Process System Lab at General Electric. View the Q-and-A below for a glimpse of what’s in store at the PAW Manufacturing conference. Q: What are the challenges in translating the lessons of predictive analytics from other verticals into manufacturing?

One of the biggest challenges that IT organizations face today is moving data between their own data center and public cloud computing services.

News: The global report by EY revealed a decline in value of deal despite an increase in volume.

A SolarWinds study found that 55 percent of companies didn't experience a data breach last year.

Open source is a disruptor that never quits, and it is seemingly penetrating and transforming every aspect of established data, analytics and application ecosystems. Give this podcast, recorded at IBM InterConnect 2016, a listen to learn how open source initiatives are transforming machine learning.

Fashion Forward Fiona. Sales Savvy Susie. No, these are not the names of the new Barbies. Buyer personas such as these have been the foundation that marketing plans are built on for a long time. But times have changed since marketing first started creating these buyer personas. In this age of big data, it makes sense that we should let the data shape more of our buyer personas. Who are You? That doesn't mean that intuition doesn't have a place, as each business has a clear idea of just who their customer or target is. But what we need to recognise is that our customers are human!

XOR Data Exchange launched a new platform called the Compromised Identity Exchange which enables breached entities to share compromised data and ongoing analysis on that data for increased fraud and identity theft detection across several entities. Credit bureau giant TransUnion is already onboard.

R. Kent Dybvig, professor emeritus of Computer Science at Indiana University now with Cisco, has recently open sourced version 9.4 of his formerly commercial Scheme compiler Chez Scheme.

Did you know that Big Data can help us predict who the future football stars will be? This is precisely what happened in Riyad Mahrez's case, PFA Player of the Year, who has conquered the English Premier League with Leicester City. During last year's Big Data Week Conference in London, Malta-based entrepreneur Valery Bollier correctly foresaw that the Leicester midfielder would be the player to keep an eye on. ''I'll give you a prediction for this season: there is a footballer, named (Riyad) Mahrez.

Oracle has announced that the and forges will be closed in approximately one year; project administrators have been advised to request all their project data so they can continue to operate elsewhere. The move seems to be aligned to other similar decisions in the market, after sites like Codehaus and Google Code also announced previous closures.

The Office of the National Coordinator for Health Information Technology (ONC), an U.S. Department of Health & Human Services agency, has launched its Move Health Data Forward Challenge for developers and IT experts. The contest is to create an API to the implementation specifications developed by the HEART Workgroup to allow individuals to share their health data anywhere they choose, such as with clinicians, clinical researchers, hospitals, or family members.

The FCC is attempting, with President Obama's backing, to force cable and satellite TV companies to allow subscribers to use any set-top box they want. If that happens, cable and satellite providers lose their ability to gather viewer data. But now FourthWall Media has a newly patented data collection technique that enables these companies to gather viewer data "from any set-top and DVR on the market."

Zoomdata's new application uses microservices and Docker containers to make it simpler for end users to interrogate large amounts of data.

Almost every company is using at least some cloud services today, and they're not just using packaged SaaS apps, PaaS services and IaaS virtual machines. Websites and custom apps are built using application programming interfaces (API) for everything from mapping and messaging, to analytics, fraud detection and speech recognition. Software-as-a-service (SaaS) offerings often offer APIs that let you work with them through third-party apps and services, or even build your own.

Accenture announced its plans to acquire OPS Rules, a boutique analytics consulting company led by David Simchi-Levi, a professor of engineering systems at MIT known for his work in supply chain and operations analytics. The acquisition will beef up Accenture's machine learning and optimization analytics as well as to develop new analytics.

I can't stress enough the importance of treating your technology and equipment properly. While we recommend that our clients always have equipment that's within warranty, when you aren't taking proper care of it or are neglecting it in certain ways, it can absolutely fail sooner. Your server is your lifeline. It's where you store all your important documents and data. All of the stuff that you work so hard on is stored on your server. It's the center of your network. When your server fails, work comes to a halt.

This entry was posted in News and tagged , , , , , , , , , . Bookmark the permalink.