Big Data News – 5 Oct 2015

Top Stories
The Strata+Hadoop World 2015 conference in New York this week was subtitled "Make Data Work," but given how Hadoop world's has evolved over the past year (even over the past six months) another apt subtitle might have been "See Hadoop Change." Here are three of the most significant recent trends in Hadoop, as reflected by the show's roster of breakout sessions, vendors, and technologies.

IBM says it has come up with a method to fuse a metal contact point to a carbon nanotube transistor, overcoming a barrier to nanotube CPUs.

Much of our lives already revolve around smart devices. But our reliance is about to become much stronger. The Internet of Things impact is set to expand as the market for network-enabled devices is expected to hit $7.1 trillion by 2020, according to an IDC study reported on ZDNet. The Internet of Things (IoT) is partially defined as the wireless connectivity of humans and objects to the Internet without the need for computer interaction. Frequent examples include wearable devices and lights that can be operated from a smartphone.

Intel continued with the soft launch of its open source big data application development platform TAP, adding system integrator and cloud service provider partners as well as a handful of healthcare system pilot programs.

What are the key factors to ensure the success of a big data project, and what are the factors that can contribute to failure? EMC's Global Services Big Data chief shares his perspective, gained from working with customers in the field.

Qubole, the big data-as-a-service company, announced that Station X, a leading developer of technologies that make large-scale human genome management and analysis easier, is using Presto on Qubole's cloud-based big data platform to power GenePool™, a powerful software-as-a-service solution for real-time analytics of genomic and medical information.

IT pros are concerned that unstructured data moving through its lifecycle will yield more data privacy breaches and complicate advanced content management systems.

NEW YORK CITY — At Strata + Hadoop World here yesterday, Hadoop distribution specialist Hortonworks unveiled a new tool called the Hortonworks Big Data Scorecard designed to help organizations develop a plan for jumpstarting big data projects. [ Related: Hadoop, in trouble? Only in Gartner-land ] "Hortonworks has always been committed to partnering with customers to make their big data projects as successful as possible," Herb Cunitz, president of Hortonworks, said in a statement yesterday. "We are leveraging our expertise to simplify big data and help our customers transform into data-driven enterprises. The new Hortonworks Big Data Scorecard will accelerate our customers big data vision and greatly propel business transformation."

In a policy statement issued today, the American Statistical Association (ASA) stated statistics is "foundational to data science"–along with database management and distributed and parallel systems–and its use in this emerging field empowers researchers to extract knowledge and obtain better results from Big Data and other analytics projects.

The purchase could give Apple the software it needs to take Siri, and potentially its automotive project, to the next level.

Predictive analytics sounds almost mystical, and in a way, it is. By studying complex, almost impermeable data about the past, you can chart trends that course well into the future. While predictive analytics can't predict the future with a degree of certainty, it can help you predict certain patterns of behavior across large groups of…

Our friends over at Narrative Science put together the infographic below to look at AI's history and how its impacted various industries over time.

Conair's CIO has forged a strong partnership with marketing and sales to enable easier and faster analytics of customer preferences to drive sales.

Global Business Intelligence (BI) and analytics software vendor, Yellowfin, has launched DashXML — a browser-based Java application that makes it fast and easy to create customized analytical functionality and applications.

Organisations are running into growing volumes of Big Data, which can become very costly and about as easy to manage as herding elephants! There's constant pressure on organisations to make better and faster decisions and it's important that data is easily manageable so that it's providing real value to the business.

Find out how Day 3 of the conference offered insight into how data scientists have benefited from the latest approaches to web-scale analytics, including open sourcing of the System ML machine learning library to help the Spark community.

Centrify Corporation, a leader in securing identities from cyberthreats, announced the expansion of its big data security solution with support for NoSQL — reinforcing Centrify's status as the only vendor to comprehensively address identity management for big data.

In weather prediction, the past holds the key to the future, allowing businesses to predict future changes in both the short term and long term. In particular, organizations can use market-level analytics to take stock of how weather affects consumer behavior, helping drive the business forward.

Paxata, provider of the Adaptive Data Preparation™ platform for the enterprise, announced a strategic relationship with Cisco which includes a jointly-developed solution, Cisco Data Preparation (CDP).

Hadley Wickham, RStudio's Chief Scientist and prolific author of R books and packages, conducted an AMA (Ask Me Anything) session on Reddit this past Monday. The session was tremendously popular, generating more than 500 questions/comments and promoting the AMA to the front page of Reddit. If you're not familiar with Hadley's work (which would be a surprise if you're an R user), his own introduction in the Reddit AMA post will fill you in: Broadly, I'm interested in the process of data analysis/science and how to make it easier, faster, and more fun. That's what has led to the development of my most popular packages like ggplot2, dplyr, tidyr, stringr.

Hadley Wickham, RStudio's Chief Scientist and prolific author of R books and packages, conducted an AMA (Ask Me Anything) session on Reddit this past Monday. The session was tremendously popular,…

Carnegie Mellon University and The Boeing Company (NYSE: BA) announced plans this week to establish the Boeing/Carnegie Mellon Aerospace Data Analytics Lab, a new academic research initiative that will leverage the university's leadership in machine learning, language technologies and data analytics. This is more evidence of the collision between big data and HPC spurring academic-industry collaboration. The goal…

The IBM i2 Summit for a Safer Planet illuminated how fraud and crime in the financial sector are highly profitable and very damaging to the industry. Organizations facing the challenges of staying one step ahead of perpetrators can apply an advanced solution that places layers of analytics across multiple channels.

Today the world generates vast quantities of data each day that can be used to enhance the quality of living of virtually anyone in the world. Information is power but also a tool for supporting development, knowledge sharing and social initiatives. Tracking natural disasters, crowdsourcing rainfall data and mapping out the night's sky are amongst a diverse range of open data initiatives.

Internet of Things deployments don't have to take a rocket scientist. Make IoT deployment as easy as 1, 2, 3 by using the IBM IoT Foundation to take advantage of the wealth of insights hidden in your data.

I recently caught up with John K. Thompson, General Manager of Advanced Analytics at Dell Software to discuss Dell's migration of its entire internal analytics system to Statistica (its own platform).

In this special guest feature, Supreet Oberoi, Vice President of Field Engineering of Concurrent, Inc. talks about how companies should change their perspective on their data strategies, and look at the process as building a data library as opposed to a data lake.

Nest has made available to developers the Weave protocol used to connect various IoT devices.

No one keeps a weather eye like the property and casualty industry does. In their incessant quest to mitigate and prevent natural disasters, insurance providers can be among the chief beneficiaries of weather analytics, gauging weather-related risks to protect insured assets and avoid incurring loss.

With respect to capitalizing on analysis of Internet of Things data for innovative industry solutions, the insurance industry is ripe with opportunity. And yet many C-suite insurance executives seem tentative when considering Internet of Things solutions. Now more than ever is an suitable time for insurers to make effective use of analytics, Internet of Things data and mobility.

"It's not a race!" I hear you cry. Well perhaps not, but cycling in London certainly is competitive, whether that's with other cyclists or against the thousands of other commuters which tackle London's roads and rail networks every day in a quest to shave a minute from their journey time. It was on one of these days during my regular jaunt down the A3 that I considered how the (unofficial) rules for cycling in London could be used as an aid to performing successful data 1. Never get overtaken by a Boris* bike = Leverage your technology and all data I don't care iof its Chris Froome or the Terminator riding a Boris bike, you simply can't get overtaken.

Predictive analytics isn't complete without geospatial analytics, which offers data dimensions that can provide a holistic view of business problems. At IBM Insight 2015, discover how geospatial analytics can help you understand your customers and your operations through time and space components.

Purchasing pre-populated lists for marketing is not a new concept in the business world. For decades marketers have been purchasing email and direct mail lists based on a variety of "check-the-box" criteria. For instance, all single family households in a 3 mile radius with $150k+ annual income, or all email addresses of attendees to a conference. Check, check, check – and out comes a list with the most recent information made available.

DevOps Enterprise, a conference focused on DevOps as it applies to the enterprise, will be held in San Francisco, between October 19-21. This 3-day conference is unusual in the DevOps community as most of the speakers have senior positions on very large enterprises such as Bank Of America, ING, Target or GM. InfoQ spoke with Gene Kim to learn more about this year's edition. By Joao Miranda

A new publication model for Universal Windows Apps reduces deployment sizes by up to 75% for small applications. And for some cases, build times have also been reduced by 30%.

InfoQ interviewed the authors of fifty quick ideas to improve your retrospectives about why they wrote the book and how ideas are described, when you can do retrospectives, what facilitators can do to establish safety, why facilitators should not be the ones who solve problems, celebrating successes, good practices for getting actions done, and the value that teams get from doing retrospectives.

It's more important than ever to think about how we can employ predictive maintenance to keep everything, from our air conditioners to our cars, in working order. Most people only know of two ways to maintain machines — reactive maintenance or preventative maintenance. But these aren't always the best or most cost-efficient processes.

Ahmed Sidky and Shannon Ewan talk about the goals of ICAgile and the design of the learning pathways and the difference between knowledge-based and competency based certification programs. They explore the goal of helping people deepen their Agile knowledge and pursue sustainable agility by scaling people not just processes and structures and discuss how the expert pathways were developed.

NEW YORK CITY — At Strata + Hadoop World here yesterday, Hadoop distribution specialist Hortonworks unveiled a new tool called the Hortonworks Big Data Scorecard designed to help organizations develop a plan for jumpstarting big data projects. [ Related: Hadoop, in trouble? Only in Gartner-land ] "Hortonworks has always been committed to partnering with customers to make their big data projects as successful as possible," Herb Cunitz, president of Hortonworks, said in a statement yesterday. "We are leveraging our expertise to simplify big data and help our customers transform into data-driven enterprises. The new Hortonworks Big Data Scorecard will accelerate our customers big data vision and greatly propel business transformation."

The calendar said October 1, but the tone of Maciej Ceglowski's keynote at Strata + Hadoop today was perhaps better suited for the holiday at the end of the month. Some may disagree with his apocalyptical vision of a big data world gone mad, but it's tough to ignore. In Ceglowski's view, the widespread abuse of big data technology across all levels of business and government threatens to bring about a Three Mile Island moment. While the data industry may couch itself in bucolic terms like streams, logs, lakes, silos, and clouds, Ceglowski equates data to "evil radioactive waste" that nobody knows how to contain. "A singular problem that nuclear power had was it generated these deadly waste products," the operator of the bookmarking site Pinboard said.




Attendees on Day 2 of Strata + Hadoop World were treated to a range of speakers from various industries, as well as important keynotes and interviews that focused on critical data-related topics. Read about these and other highlights from that day's sessions.

It cites statistics as one of three foundational communities in data science and emphasizes the importance of collaboration among all field's key disciplines ALEXANDRIA, VA, OCTOBER 1, 2015 – In a policy statement issued today, the American Statistical Association (ASA) stated statistics is "foundational to data science"–along with database management and distributed and parallel systems–and its use in this emerging field empowers researchers to extract knowledge and obtain better results from Big Data and other analytics projects. The statement also encourages "maximum and multifaceted collaboration" between statisticians and data scientists to maximize the full potential of Big Data and data science.

In less than a couple of weeks, a new edition of JAX London will be held at the Business Design Centre. Running from 12th to 14th October, this year's edition has 12 tracks, covering topics from Agile and Craftsmanship to Enterprise Development through DevOps, Cloud and deep-down Java. InfoQ talked to some of the speakers at the Enterprise Java track to get a glimpse of the contents of JAX London.

The data science revolution is bringing a shift in the direction of the cloud–and with it a sea change in the roles of IT professionals. Discover how focusing on data science can help you keep your career afloat in a changing business environment.

As September draws to a close, the anticipation for Teradata's PARTNERS 2015 in Anaheim, CA, October 18-22, is ramping up to a fever pitch. And since the theme is "Breaking Big," it's only fitting that we just broke some really big news: Monday's keynote speaker will be Robert Herjavec, and Wednesday's keynote speaker will be Mike Rowe.

First Insight, a leading provider of solutions that empower brands to incorporate the voice of the customer into the design and merchandising of new products, today announced the launch of InsightTargeting, a solution enabling retailers and brands to create highly targeted products and promotions through predictive analytics.

Data preparation specialist Paxata announced a partnership this week with Cisco Systems designed to advance the networking giant's data preparation capabilities on its emerging big data platform. The partnership in which Paxata's technology will underpin the Cisco (CSCO) data prep offering was unveiled this week at Stata + Hadoop World. The deal underscores the growing momentum of the data prep market as data sets grow in size and complexity. The Cisco platform will use Paxata's machine intelligence algorithms along with an Excel-like interface.

The Big Data and Analytics Hub is home to multitudes of great public sector blogs, podcasts, infographics and more. Here are some highlights from the month of September 2015.

Companies that insist on offering high-quality services and products have discovered how analytics can help realize this goal. This installment of Answers in Analytics describes how a leading medical technology company uses analytics to realize multiple business benefits.

Thirty states are using education data mining techniques to identify students in crisis and employ specialized programs to intervene before they drop out.

Puppet Application Orchestration makes it possible to model complicated apps and application stacks as Puppet code.

Metro Transit of St. Louis (MTL) operates the public transportation system for the St. Louis metropolitan region. The organization's mission is "Meeting the region's transit needs by providing safe, reliable, accessible, customer-focused service in a fiscally responsible manner." Meeting the Challenge to Provide Safe, Reliable Public Transport To ensure the safety of passengers and the proper use of public funds, MTL has always performed regular maintenance on its bus fleet. But lacking detailed data on how bus components were actually performing, the agency maintained vehicles retroactively. It replaced parts after they failed, or simply bought new buses.

Online Master of Science in Predictive Analytics Build in-demand skills for the growing analytics field Prepare for leadership-level career opportunities by focusing on statistical concepts and practical application. Learn from distinguished Northwestern faculty and from the seasoned industry experts who are redefining how data improve decision making and boost ROI. Build statistical and analytic expertise as well as the management and leadership skills necessary to implement high-level, data-driven decisions. Earn your Northwestern University master's degree entirely online. Learn more about how you can advance your career by earning your Master of Science in Predictive Analytics degree from Northwestern University.

Predictive product and customer profitability analysis techniques get a significant boost from big data and analytics. So much so that the lines blur between the phases of the traditional "Plan, Do, Study, Act" cycle outlined by Dr. Edwards Deming.

The Personalized Medicine Initiative (PMI), based out of the Life Sciences Institute of the University of BC, has deployed HDP and PHEMI Central Big Data Warehouse to collect, store and manage genomic and clinical data for Molecular You (MY). PHEMI is a Hortonworks Technology partner and in this blog, Richard Proctor, General Manager, Global Healthcare at Hortonworks interviews PHEMI's Roy Wilds, Dir. of Product Management, along with PMI's Chief Operating Officer and Co-founder of Molecular You, Rob Fraser, to discuss this groundbreaking work. Join the webinar on Thursday October 8 at 10 am Pacific time to learn more: Register. Richard Proctor, Hortonworks: Tell us more about Molecular You.

Why is Spark so badly needed by the data science community? Primarily, it offers an open platform for fast, powerful data access that is vital for organizations because they are increasingly using a wide variety of technologies to deliver analytics, and they are tied to a variety of workloads. Discover why data scientists, data engineers and application developers, in particular, are flocking to the Spark community.

Data analytics provide companies with the timely information necessary to develop dynamic infrastructure financing models that enable them to make quick, informed revisions to their strategies in response to market changes.

Today, advanced banking analytics offer financial institutions a range of powerful technological tools to help stop fraud and financial crimes.

The idea that data-at-rest (historical) and data-in-motion (immediate) are mutually exclusive no longer applies, thanks to a new toolset that handles both. Discover how organizations can have it all: the ability to stream data in real time as well as process historical data to highlight patterns in context.

Spark is taking over big data, but it's getting some help from its MemSQL friends.

This entry was posted in News. Bookmark the permalink.