Big Data News – 08 Dec 2015

Top Stories
Guest blog post by Mirko Krivanek: This is an interesting listing created by Bernard Marr. I would add the following great sources: DataScienceCentral selection of big data sets – check out the first itemized bullet list after clicking on this link Data sets used in our data science apprenticeship – includes both real data and simulated data – and tips to create artificial, rich, big data sets for testing models KDNuggets repository Data sets used in Kaggle competitions Source for the picture…

After more than 40 years of focusing primarily on software for large businesses, SAP is taking a bold step in a new direction: precision medicine. Targeting healthcare organizations, life sciences companies and research institutions, the German software giant on Tuesday rolled out SAP Foundation for Health, a brand-new platform based on its Hana in-memory engine that's aimed at helping such organizations uncover insights from patient data in real time. "Our strategy is very simple but very ambitious," said Dinesh Vandayar, vice president of personalized medicine for SAP. "Our vision is to create a health network enabling personalized medicine."

In an effort to better enable its cloud users to access video from any device at any time, IBM today said it is buying Clearleap, a 7-year-old Georgia-based company. Clearleap has focused on creating a technology platform that can securely deliver massive video libraries to traditional TV systems and multiscreen devices, according to the company's website. Financial terms of the deal were not disclosed. IBM said that Clearleap will be integrated into its cloud platform to provide enterprises with a fast and easy way to "manage, monetize and grow user video experiences.

It's been a busy year for MIT's Computer Science and Artificial Intelligence Laboratory (CSAIL). Researchers won the Turing Award, created groundbreaking algorithms to fix code and detect disease, and developed exciting new robots and artificial-intelligence systems. As 2015 comes to a close, here are a few highlights in seven key areas from the past 12 months…

Verily, formerly known as Google Life Sciences division, aims to use technology to understand and improve human health.

At the core of the OCI initiative is a draft specification for the base container format and run time submitted by Docker.

According to a just released Ovum survey, the U.S. is the "least-trusted nation" in regards to data privacy and secure data storage practices. This bodes ill for U.S.-based companies and data centers both in terms of attracting and retaining global customers, and in spending increases in order to meet data sovereignty and privacy requirements.

No doubt a lot of good can come from smart grid projects, whether they are part of a fleshed-out bigger picture vision or not.

Listen to Kim Minor and Andy Rice discuss how combining analytics and weather data could transform the insurance industry, and learn how a timely alert from your insurance company could help you protect your family and assets from damaging weather events.

At an SAP event, Under Armour's Ken Kendall explained how his company mines big data to to create a better consumer experience. Learn how in this TechRepublic video clip.

Guest blog by Khushbu Shah at DeZyre.com: With the demand for big data technologies expanding rapidly, Apache Hadoop is at the heart of the big data revolution. It is labelled as the next generation platform for data processing because of its low cost and ultimate scalable data processing capabilities. The open source framework hadoop is somewhat immature and big data analytics companies are now eyeing on Hadoop vendors- a growing community that delivers robust capabilities, tools and innovations for improvised commercial hadoop big data solutions.

Recently I had the pleasure of sitting down with SAS vice president of Big Data, Paul Kent to get his thoughts on the vibrant Hadoop ecosystem, and discuss how SAS and Cloudera customers are benefitting from our joint engineering work. The following blog includes excerpts from that conversation: David: When you think about Hadoop, what's the first thing that pops into your mind?

Enterprise IT needs to repurpose itself for a dynamic, commoditized infrastructure environment if it hopes to remain relevant to the business process.

by Edward Ma and Vishrut Gupta (Hewlett Packard Enterprise) A few weeks ago, we revealed ddR (Distributed Data-structures in R), an exciting new project started by R-Core, Hewlett Packard Enterprise,…

by Edward Ma and Vishrut Gupta (Hewlett Packard Enterprise) A few weeks ago, we revealed ddR (Distributed Data-structures in R), an exciting new project started by R-Core, Hewlett Packard Enterprise, and others that provides a fresh new set of computational primitives for distributed and parallel computing in R. The package sets the seed for what may become a standardized and easy way to write parallel algorithms in R, regardless of the computational engine of choice.

DataHero, a leading provider of Cloud BI, is launching its AdWords connector, giving marketers, digital marketing agencies and advertisers the ability to create a rich set of visualizations for each available Google AdWords report in only a few clicks.

How one of America's most respected newspapers is approaching digital design

Organizations should be aware of the hidden charges that must be considered when researching the best cloud backup solution for their business.

Cisco Systems is rethinking its collaboration tools from the bottom up, turning the lowly Spark text-messaging app into a cloud-based platform that includes videoconferencing and the core features of an enterprise phone system. Spark, which debuted last year as an app called Project Square, is central to the company's strategy of using cloud computing to deliver collaboration to everyone in an organization.

Questionable sources are replacing the mass media

SPONSORED: This sponsored post is produced by SessionM. The holidays are upon us — a great time of year for bad sweaters, chestnuts roasting on an open fire, and parties with friends and family. It's also a time when those friends and family might ask you (if you've been "good") what you want for a gift. Or when it comes to my wife, sometimes she doesn't ask me what I want — she just watches what I say and do leading up to the holidays, and invariably arrives at the thing I want or need most. So why is it that 'Holidays 2015' still features so many brands and retailers that do neither?

MapR is again beefing up its real-time efforts for big data with the release of MapR Streams. Here's how it works.

VB WEBINAR: Join Mark Sullivan as he talks to Doug Roberge from Kahuna and Michael Becker from mCordis. Don't miss out: access it on-demand right here. When it comes to sharing their message, marketers have an endless buffet of paid and owned platforms to choose from, but making sure they're selecting the right ones in the best ways can create indigestion if they don't forge long-term relationships. From Facebook Notify to Twitter to traditional advertising mediums, cross-platform publishers serve up an overwhelming smorgasbord of options.

Is your company trying to integrate Hadoop into its data ecosystem? Discover how you can use the power of 1–one project, one developer–to bring value to the entire organization using Hadoop.

Steady advances in chip technology coupled with ubiquitous wireless connectivity have made it easier to deploy cheap sensors nearly anywhere. However, enterprise-scale sensor systems sweeping up huge data volumes still require relatively complex compute, networking and storage infrastructure.

Recognizing that IT security by definition is a team sport, IBM is adding a set of open APIs to the IBM Security QRadar platform.

Is there a stronger fuel driving the travel industry than data? Consider how IoT powers the engines of travel and transport.

CEOs aren't always right, Yahoo's history certainly proves it

MapR, one of the three big vendors of the Hadoop open-source big data software, is today announcing MapR Streams, a new piece of software for sending many kinds of data around a company. MapR Streams is a type of publish-subscribe messaging system. It's comparable to a tool like Apache Kafka. "Performance is basically very similar to Kafka," MapR cofounder and chief executive John Schroeder told VentureBeat in an interview.

What wisdom lies in crowds, and how can we gain access to it? The Internet of Things may help us take advantage of emerging interactions between people, devices and data. To learn more, discover how the Internet of Things is connecting not only things, but also people–and ideas.

SAP's update is an attempt to make data more accessible and simpler to manage.

As organizations adapt to the Cognitive Era, a new solution offers definitive answers to questions relating to vital issues such as anticipating customer needs and generating business insights.

Dr. Patrick explores how cultural attitudes toward consumer-initiated health care are changing and how technology is enabling this transition.

Salience and polarity. These words sound like complex chemistry terms. In reality, salience and polarity help our team at BrightPlanet determine which pieces of content from the Internet are most relevant to your business. We leverage our technology partner Rosoka to help access salience and polarity in the data we harvest. When we harvest data…

If, as Niels Bohr maintained, an expert is a person who has made all the mistakes that can be made in a narrow field, we consider ourselves expert data scientists. After twenty years of doing what's been variously called statistics, data-mining, analytics and data-science, we have probably made every mistake in the book–bad assumptions about… The post Four Ways Data Science Goes Wrong and How Test-Driven Data Analysis Can Help appeared first on Predictive Analytics Times.

Seven companies make it into the Leaders quadrant in the Gartner Magic Quadrant report for Data Quality Tools 2015. See who's in, and why the research firm considers them ahead of the competition.

Confluent today unveiled a major new release of its commercial product based on Kafka, the open source data messaging platform that's quickly gaining momentum. With better security, new data connectors, and simplified integration, Confluent 2.0 promises to attract even more attention from large enterprises that are moving big data with Kafka. The adoption rate for Kafka has soared by 7x in just the past 11 months, according to Confluent…

In order to give our valued readers a pulse on important new trends leading into 2016, we here at insideBIGDATA reached out to all our friends across the big data industry to get their insights and predictions for what may be coming. We were very encouraged to hear such exciting perspectives.

News: Claims that combining technology means drastically shorter data crunching times

Analysis: SAP outlined its focus on making HANA the core of business transformation, bringing together the cloud, and big data.

The digital transformation of business via cloud, mobile and big data technology is fueling IT mergers and acquisitions, which are set to rocket past the record level set back in 2000 as the dot-com era crested. Enterprises are adopting cloud and mobile technology to stay competitive, and technology companies are scrambling to keep up with demand.

Microsoft kicked off 2015 with Windows 10, and the updates surrounding hardware, Office, mobile, and cloud continued to snowball throughout the year. Here's a look at 10 notable Microsoft moves in 2015.

MapR Technologies today unveiled MapR Streams, a new component of its integrated platform designed to move large amounts of data. MapR Streams uses the same publish-and-subscribe technique that underlies Apache Kafka, and is fully compatible with real-time streaming analytics applications such as Apache Storm and Spark Streaming. MapR built streaming data capabilities into its core MapR platform in response to its customers growing needs for real-time analytic functionality.

News: Development of next-generation maps aided by data from Audi, BWM and Daimler.

According to reports, approximately 90% of all data in the world was created in the last two year. This total amount is expected to double yearly – it's estimated that the world will have generated about 5,200 gigabytes of data per human being on earth. It's practically impossible to comprehend the amount of data which is produced every single day – and how so little of it is actually used.

News: Extends existing WANdisco deployment to eliminate outages across data centers.

The more analytical tools we try to use the less time is here to actually benefit from all of the information the tools supply. This is why it is essential to structure the data in a clear way, so it is easy to see the whole picture while being able to take a detailed look. I have tested a few of the dashboards and in this article I am explaining how to grow your online business with structured data.

Corporate IT continues to argue that public cloud security cannot be trusted. They believe, mistakenly, that they can keep data more secure than the public cloud.

This is an interesting listing created by Bernard Marr. I would add the following great sources: DataScienceCentral selection of big data sets – check out the first itemized bullet list after clicking on this link Data sets used in our data science apprenticeship – includes both real data and simulated data – and tips to create artificial, rich, big data sets for testing models KDNuggets repository Data sets used in Kaggle competitions.

Anybody who travels to a foreign country or reads a book or newspaper written in a language they don't speak understands the value of a good translation. Yet, in the realm of Big Data, application…

http://www.statslife.org.uk/history-of-stats-science/1190-the-timeline-of-statistics …

VB WEBINAR: This webinar is now available on demand. Listen right here for free. "The golden rule for every business man is this: 'Put yourself in your customer's place.'" That fantastic quote from American writer Orison Swett Marden has served as a vital tip for marketers on how to sell. With the volume of data available today, it's becoming easier to understand who buys your product and how to better serve them in the future.

In this podcast from SC15, Larry Jones from Seagate provides an overview of the company's revamped HPC storage product line, including a new 10,000 RPM ClusterStor hard disk drive tailor-made for the HPC market. Watch the video presentation.

In this podcast, Gilad Shainer from Mellanox provides an overview of new products for the SC15 conference. Watch the video presentation.

In this podcast, Ryan Baxter from Micron describes the company's new NVDIMM persistent memory products. Today Micron announced the production of 8GB DDR4 NVDIMM, the company's first commercially available solution in the persistent memory category. Persistent memory delivers a unique balance of latency, bandwidth, capacity and cost, delivering ultra-fast DRAM-like access to critical data and allowing system designers to better manage overall costs. With persistent memory, system architects are no longer forced to sacrifice latency and bandwidth when accessing critical data that must be preserved.

In this podcast, Andrew Jones from NAG discusses the lessons learned from over 40 supercomputing procurements. NAG has announced plans to launch an impartial HPC technology intelligence and analysis subscription service at SC15. "Developed in partnership with Red Oak Consulting, the NAG HPC Technology Intelligence Service will deliver technology insight and risk-reduction to help HPC buyers and users make better decisions and optimize their HPC investments." Watch the video presentation.

How Big Data Could End Up Helping Lower Income Families.

Deploying a lock-in free data platform is critical for an enterprise. By this, we mean using a non-proprietary code and implementing interoperability to eliminate the risk of being dependent on a single vendor for your current or future needs. Over two thirds of respondents to our survey agree that maintaining freedom of choice was a key criterion when it came to selecting the Hortonworks Data Platform.

Maarten Hermans is a sociologist and researcher at KU Leuven in Belgium and an avid hiker. He uses an Android app to track his location and elevation on his hikes, which means he can download his…

Maarten Hermans is a sociologist and researcher at KU Leuven in Belgium and an avid hiker. He uses an Android app to track his location and elevation on his hikes, which means he can download his hike data in GPS Exchange Format. With this data and a few R packages, he was then able to create interactive topological maps including his route and photos along the way.

What type of data should you be collecting?

96% of Companies Are Failing Miserably When it Comes to Marketing Data Insights At a time in our history when there is more data than ever before, the overwhelming majority of companies have yet to see the full potential of better data insights. PwC and Iron Mountain recently released a survey on how well companies are gaining value from information. The results showed a huge disconnect in the information that is available to companies and the actual insights being derived from it.

Companies need to have solid small cell backhaul strategies due to the intense timing demands under which these networks labor.

Many organizations are still struggling to feel confident in their ability to manage a breach and execute a response plan.

An IBM retail industry expert discusses the opportunities and challenges associated with improving the use and value of analytics at the IBM Insight 2015 conference–IBM's largest annual data and analytics event.

IDC reported on smartphone results for 2015 last week: It was a down year for the entire category

Risk difference between what Ballmer suggests and what Microsoft is already doing in providing a fast developer migration path should be negligible.

Psst… hey, can you keep a secret? I heard from a reliable source who shall remain nameless that there's a new Star Wars movie about to hit the theaters! Maybe even as soon as this month! OK, obviously this isn't a secret. In fact, I believe the omni-channel marketing campaign that is generating hype for this movie might be even more impressive than the film itself. Heck, C-3P0 helped me navigate traffic on my way to work this morning.

It's not enough to capture car data. The real challenge is connecting it to the rest of our digital lives.

Concerns still linger about the cloud, most of them related to the lack of visibility that occurs once data and applications cross the firewall.

6 Industries That Need to Get Big Data Down




One of the most time-consuming components of using deep neural networks is training the algorithms to do their jobs. It can often take weeks to train or re-train the network-and that's after you set it up. A startup called minds.ai today announced how it plans to tackle that problem: by allowing customers to train their neural network on its speed cloud-based GPU cluster. Deep neural networks have recently emerged as a competitive differentiator for organizations hoping to make sense of big data.

There's a lot of talk these days about transitioning to the cloud, by IT customers and vendors alike. Of course, I have thoughts on the subject, some of which are below. 1. The economies of…

How strategies change in ever-developing markets

Healthcare analytics can help healthcare providers make use of their vast quantities of data, enabling informed decision making and ever more personalized response even in an increasingly hectic healthcare environment. Discover six ways that analytics is making healthcare smarter.

We talk to Mondelez International's R&D Director of Premium Chocolate and Dairy, Asia Pacific

3rd-party data insights are critical to personalization strategies

In a recent interview on WGN, Bob Mariano, the CEO of Roundy's was asked the question "What makes a great grocery store?" His response focused on customer care: "A great grocery store is made up of great people that care about their customers and go out of their way to make them feel appreciated."

There's an abundance of big data technologies, but the pool of IT professionals with the advanced know-how to handle massive amounts of data is limited.

Designing mobile apps to capture and analyze IoT data is difficult enough, but a greater hurdle can be ensuring consumer privacy laws aren't violated.

Data flowing from the Internet of Things creates opportunities to analyze equipment performance and track the activities of drivers and users of wearable devices. But IoT data analytics requires significant IT provisions.

"Skip the PhD and Learn Spark, Data Science Salary Survey Says," claimed a recent report. On the surface, the data appears to make a case for this approach. Spark, or indeed any technology making waves right now, probably represents a very good way to get in-demand (and therefore well paid) roles. But as someone for whom a PhD has worked very well, I want to put the counter argument for their value in the world of data analytics. One that looks to the longer game. Firstly we should mention that PhDs are not purely about financial value.

Educators and experts tout data storytelling as a way to better engage business managers with data, but others caution about overcrowding visuals.

Counter cyber attacks before they occur by integrating advanced analytics with visual analysis tools. Don't stop at identifying cyber attacks; rather, identify cyber attackers as well to address cybersecurity threats at their source.

HarvestMaster, an agricultural data product provider, introduced a new field treatment applicator that can be controlled from within HarvestMaster's Mirus field data collection software using a plugin. This means farmers and field researchers can use the field applicator to precisely choose and apply treatments according to GPS and agricultural data analysis in the field, in real time.

VB WEBINAR: Join VB analyst Jon Cifuentes as he shares the most important takeaways from VB's recent report on mobile advertising — joined by Ford's former head of global social media Scott Monty who will share critical insights from the field. Register here for free. As our world becomes more smartphone driven (we'll reach 2 billion worldwide in 2016), it's imperative for marketers to take full advantage of mobile to reach maximum success for their brands. But for many marketers, coming up with a surefire mobile strategy is harder than it looks.

According to the A.T. Kearney 2015 Leadership Excellence in Analytic Practices study released last week, "Companies will need 33 percent more talent in analytics over the next five years, across all industries. That's in addition to the positions that remain to be filled today. Of the 430 executives surveyed for study, 43 percent said that at least 10 percent of their companies' digital and analytics positions remain unfilled."

It's no secret that the race is on in earnest to harvest more innovations. And that super-charged need for improvements, advancements and disruptors is leading to some highly creative ways to find them. The free Ideator, with its advisors and investors at the ready from companies such as Google, Facebook, SalesForce, LinkedIn and Goldman Sachs is a prime example.

Is your organization stuck at the edge of Hadoop adoption, searching for a path to broad use that doesn't hold back your most proficient users? Big data discovery technology aims to help you bridge the chasm between early adoption and majority use, bringing rank-and-file users into the fold without alienating the users in your leading edge.

As governments increase reliance on clean energy, provider analytics plays a crucial role in developing a global energy solutions. China and Germany are two examples of countries leading the way in renewable energy adoption and the potential economic impact of clean energy.

When we build predictive models, we often want to understand why the model behaves the way it does, or in other words, which variables are the most influential in the predictions. But how can we tell which are most influential? And more importantly for many applications, which variables are most influential and how influential at… The post In Predictive Analytics, Coefficients are Not the Same as Variable Influence appeared first on Predictive Analytics Times.

Hadoop, while beneficial in many IT environments, unfortunately, will not work with the massive amounts of data generated in most industrial environments. This contributed article by Alex Clark, founder and Chief Software Architect at Bit Stew Systems, asks what if we remove this from our industrial planning, what does the future industrial data architecture look like?

In this special guest feature, Rob Patterson, Vice President, Product Marketing at PTC, posits that as manufacturing organizations hone in on their IoT strategy, data's critical role within that strategy should be top of mind and central to decision making around how the IoT will affect their business going forward.

The world's a risky place. And yet the human race has become very adept at risk mitigation in the face of every nasty thing it might throw our way–or that we may hurl at each other. Discover how social sentiment, beliefs and practices that perpetuate environmental risk factors can play a significant role in either mitigating the risks or making them worse.

Can we use data to stop the largest global threat?

News: An initiative to encourage women in engineering has been derided as patronising.

Have you struggled in your data science function because of underlying data processing issues? Here is the list of 4 data processing architecture of top web companies to help you overcome those issues. Nextdoor – Offline Data Composition The first step was to define our SLA. We decided that the data needed to be fresh, but not up to the second. An SLA of a couple of hours was just fine for what we needed. Once we knew the schedule in which we would need to pull the data, we began defining a pipeline. Nextdoor has partnered with over 1,300 agencies across the US.

As the hype surrounding big data increases, so too does the number of investments

Conventional wisdom says that predictive modelers need to have an academic background in statistics, mathematics, computer science, or engineering. A degree in one of these fields is best, but without a degree, at a minimum, one should at least have taken statistics or mathematics courses. Historically, one could not get a degree in predictive analytics, data mining, or machine learning.

There are a lot of skills you need to have, just to be a competent IT professional. To start off, Apache Hadoop. It's entering its second decade now, but there's still a lot of room for improvement in architecture and development. And, Hadoop can be a fussy thing to work with, requires care and feeding by proficient techs.

I'm aware of how dated this makes me seem, but I still remember a time when information was printed on paper, tucked into folders, and filed into cabinets. I've spoken with Millennials who seem astounded at the fact there was ever a time when files weren't readily available on computers to be accessed and exchanged at the click of a button.

This entry was posted in News and tagged , , , , , , , . Bookmark the permalink.