Big Data News – 8 Oct 2015

Top Stories
One of the biggest forces to hit business in the past several years is big data. Leading players in every industry are mining vast amounts of data they amass from vendors and customers in search of new opportunities. But how do companies with fewer resources make the best use of customer and vendor data? How can they use big data to get a competitive edge?

Even though it feels too soon, too many of us already have big data debt or at least see the potential to accumulate it all too clearly. Companies are confusing adopting big data technology with creating a coherent big data strategy and in the process are creating big data debt. How do we get into big data debt?

Andy Rubin, the man behind Google's Android operating system, thinks artificial intelligence will define computing in the future.

These offerings for the first time signal that IBM is getting much more aggressive on the total cost of acquiring Power Series servers.

The enterprise is being consumerized, and the consumer is being enterprised. Moore’s Law does not matter anymore, the future belongs to business virtualization powered by invisible service architecture, powered by hyperscale and hyperconvergence, and facilitated by vertical streaming and horizontal scaling and consolidation. Both buyers and sellers want instant results, and from paperwork to paperless to mindless is the ultimate goal for any seamless transaction. The sweetest sweet spot in innovation is automation.

A statistic published earlier this year caught my eye. According to Gartner, through 2018, 70 percent of Hadoop deployments will fail to meet cost savings and revenue generation objectives due to skills and integration challenges. Skills and integration challenges. With all these vendors claiming to make big data easy and smooth, where are the difficulties remaining? Let's look at the first, upstream part of a big data project. Collecting data The whole concept of big data, or total data, and how to collect it and get it to the data lake can sound scary, but it becomes less so if you break down the data collection problem into subsets.

September was a whirlwind of activity around Labor Day, back to school and the kickoff of fall conferences. Here is a recap of the top financial stories published on the IBM Big Data & Analytics Hub in September. Catch up on what you may have missed.

Uploading data, ingesting data, getting insights from data — we typically associate all three capabilities with cloud workloads. Amazon's Wednesday keynote announcements at Re:Invent unveiled new services for doing all of the above — with creative wrinkles all around. Why sync data over the wire to Amazon, for instance, when you can instead mail it? And given how much data gets socked away in Amazon for analysis, how about a tool aimed at business folks, not IT personnel, for getting value from that data?

Guest blog post by Bernard Marr Hadoop — the software framework which provides the necessary tools to carry out Big Data analysis — is widely used in industry and commerce for many Big Data related tasks. It is open source, essentially meaning that it is free for anyone to use for any purpose, and can be modified for any use. While designed to be user-friendly, in its "raw" state it still needs considerable specialist knowledge to set up and run. Because of this a large number of commercial versions have come onto the market in recent years, as vendors have created their own versions designed to be more easily used, or supplied alongside consultancy services to get you crunching through your data in no time.

Bimodal Business is a key theme at this year's Gartner Symposium and ITExpo in Orlando. Will it change IT?

Teradata announced that it will host its first database on the Amazon Web Services cloud and it plans to add support for other public clouds soon.

Published Date: 2015-10-08 15:45:44 UTC Tags: Analytics, Big Data, Data Science, Open Data, Predictive Analytics Title: Is Data Right For The Creative Industries? Subtitle: Can it ever work on subjective material?

ell is reportedly in talks to merge with EMC, according to published reports. Rumors and uncertainty have swirled around $50-billion storage behemoth EMC. Facing the unscheduled but certainly upcoming retirement of longtime chairman and CEO Joe Tucci, pundits have wondered what form the EMC federation — including EMC, VMware, and Pivotal — would take without his leadership. The company has been under attack from activist investors such as Elliot Management, which want more value for shareholders. And there have been suggestions, firmly rejected by Tucci…

Amazon Web Services fourth annual Re:Invent conference got underway in Las Vegas this week. Senior vice president Andy Jassy bid directly for Oracle and SQL Server customers.

by Joseph Rickert Early October: somewhere the leaves are turning brilliant colors, temperatures are cooling down and that back to school feeling is in the air. And for more people than ever before,…

Obeya is a management approach that uses war rooms and visualization for managing projects. InfoQ did an interview with Malika Mir about why she decided to implement Obeya, how they are using Obeya to manage project portfolios, their experiences with Obeya and the benefits that they have got from it.

As healthcare providers collect more data on patients than ever, and plan to use to predict care episodes, healthcare need to understand the ethical implications, according to experts speaking at the Predictive Analytics World Healthcare conference in Boston Tuesday. Right now, the lines are blurry. "We had a case we saved someone's life by looking… The post As Health Apps and Predictive Analytics Take Hold, Standards are Needed appeared first on Predictive Analytics Times.

Recently released Git 2.6 brings many new features, improvements to performance and internals, and bug fixes.

is an open source platform that enables front end developers to autonomously build backend APIs using forms to drive their apps. The platform provides a single solution for creating both APIs and user interfaces for consumption by a front end javascript framework. InfoQ spoke with the founders of to learn more about the platform capabilities and the future they envision for it.

Microsoft PowerBI applications can now invoke Alteryx analytics applications directly within the Microsoft business intelligence application.

The team of clinicians and medical informatics experts led by Mike Draugelis, chief data scientist at Penn Medicine in Philadelphia, is busy these days. Using insights from a massively parallel computer cluster that stores a huge volume of data, the team is building prototypes of new care pathways, testing them out with patients and feeding the results back into algorithms so that the computer can learn from its mistakes. Mike Draugelis, chief data scientist at Penn Medicine.

GE is going full steam ahead in its efforts to become a digital company. This includes new emphasis on IoT and analytics.

This is the first in a series of blogs that examine how Fortune 500 companies use Attunity Visibility to manage their data more effectively.  Part One tells the story of a multinational pharmaceutical firm.

This is the first in a series of blogs that examine how Fortune 500 companies use Attunity Visibility to manage their data more effectively.  Part One tells the story of a multinational…

In this special guest feature, Mike Maciag, COO of Altiscale outlines a formula for working to improve ROI for Hadoop deployments.

A commissioned study conducted by Forrester Consulting on behalf of Radius revealed B2B marketing organizations that strategically use predictive analytics outperform other organizations that use more traditional data analytics approaches to improve marketing execution and business results.

Many government agencies are leveraging analytics and big data to gain actionable insights about a range of topics. However, challenges remain as agencies struggle to find qualified staff and projects strain IT infrastructure.

Successful mobile technology leaders come with certain traits that may be overlooked. They understand the reality that even the best solutions–designed and implemented perfectly–are going to face challenges at some point. When you look at mobile solutions, and all of the layers that make up the entire mobile user experience, you may find that technology…

Microservices are a trending topic for some time now and while we were talking a lot about concepts in the past there more and more real-life experience in the discussion, now. In this interview, Michael Bryzek, co-founder and former CTO of Gilt, is sharing some of his experience on working with microservices: How should we design our architectures and APIs to avoid ending up in a dependency hell?

Most Data Scientists like to get their hands dirty with data just as quickly as possible, but it is important to practice some delayed gratification and first dig into the details of the Data Science project before you start modeling. A Data Scientist who has the business in mind will attempt to determine what factors might get in the way of the business experiencing success with the project.

This article describes the concept of Big Memory and concentrates on its applicability to managed execution models like the one used in Microsoft's Common Language Runtime (CLR). A few different approaches are suggested to resolve GC pausing issues that arise when a managed process starts to store over a few million objects.

If you're like many of the people I know, the things you once enjoyed most about the Internet now make you feel overwhelmed or even left behind because you can't keep up–there's simply too much of everything.

In my previous blog I have explained about linear regression. In today's post I will explain about logistic regression.          Consider a scenario where we need to predict a medical condition of a patient (HBP) ,HAVE HIGH BP or NO HIGH BP, based on some observed symptoms — Age, weight, Issmoking, Systolic value, Diastolic value, RACE, etc..

Facebook's Lee Byron talks to InfoQ's Rags Srinivas about GraphQL that powers billions of API calls on the site.

The rise of SAP (and later Siebel Systems) was greatly helped by Anderson Consulting, even before it was split off from the accounting firm and renamed as Accenture. My main contact in that group…

Tools, Trends, What Pays (and What Doesn't) for Data Professionals The 2015 Data Science Salary Report is here. O'Reilly Media's third annual report delves deeper into responses from over 600 data professionals from a wide range of industries, two-thirds of whom are based in United States. The report analyzes data collected on demographic information, time spent on various data-related tasks, and the use/non-use of 116 software tools. The 2015 edition features a completely new graphic design of the report and our findings.

R. Python. Hadoop. Data Lake. Presto. Spark. What's Next? Discover the data science skills you'll need next at Teradata 2015 PARTNERS with insight into the latest open source technology and real strategies in development practices and methodologies from leading data practitioners. You'll hear from top brands including Target, GlaxoSmithKline, eBay, and Southwest Airlines in more than 200 sessions.

Join us for our latest DSC Webinar series to learn how to better access your data, provide more effective business intelligence and systems integration, improve distribution management and demand planning as well as promotional pricing. We will be discussing: Multi-source and high volume data blending Consumable data workflows to internal customers Predictive analytics, through a system that requires no coding. Guests from Sager Creek Vegetable Company, a division of Del Monte Foods, will discuss real world examples of how variable planning with data analytics provides better answers to planning around inventory management demand and more.

Don't miss the Predictive Analytics Innovation Summit when it returns to Chicago on November 11 & 12. This critically-acclaimed event will unite 200+ industry experts with 40+ leading speakers to address the key challenges they currently face and to provide delegates with the breadth of information required to develop actionable insight and propel their organizations forward.

Pivotal had a lot going on this year as a sponsor, presenter and exhibitor at Strata + Hadoop World – .  In case you missed anything, here's a recap:  * The Pivotal session at the event provided a unique look at new developments in the rapidly-changing world of the Internet of Things (IoT) and data science.

Predictive analytics and the whole data science realm has really become front and center in the news these days. Somewhere between the hype of Big Data solving the world's problems and data science proving to be the missing link in every analyst's toolbox, there's a little matter of actually making it work in your business…

Ever-observant Jen Q. marks National Cybersecurity month by questioning why more isn't being done to thwart cyber attacks by cyber criminals and why the hackers seem to have free reign with private networks. Analytics and intelligence may lead to the answers she's looking for.

Amazon today rolled out QuickSight, a new cloud-based business intelligence tool that can be used to build visualizations and perform ad-hoc analysis. The Web giant says QuickSight is both cheap (pricing starts at $9 per user per month) and fast, thanks to a parallel, in-memory engine dubbed SPICE. QuickSight isn't the first cloud-hosted BI and analytics tool on the market, but because it's backed by Amazon (NASDAQ: AMZN), it may soon become one of the most prominent. The new offering–which is only available in preview at the moment–looks to deliver many of the features of full-blown analytics tools at one-tenth of the cost, the vendor says. According to Amazon's QuickSight Web page, customers can start visualizing their data within a minute of hooking up data sources or loading a file.

Actian Corporation, a high-performing enterprise-grade SQL (Structured Query Language) analytics platform company, announced that Accanto Systems has selected the Actian Analytics Platform to power its well-known Intelligent Customer Experience Management (iCEM) offering.

Nearly three-quarters of U.S. government managers overseeing a growing number of big data projects are concerned they lack adequate computing, storage and networking infrastructure. Hence, federal agencies may find themselves in the position of collecting large data volumes while lacking the ability to analyze key data, and industry-sponsored survey found. Unisys Corp., (UIS) which according to the web site Washington Technology, ranked 39th in 2014 with federal contracts worth more than $529 million, reported this week that 93 percent of U.S. agencies responding to its survey said they had launched big data projects. These respondents also said advanced data analytics have improved the quality and speed of decisions.

Guest blog post by Dr. Vincent Granville All the regression theory developed by statisticians over the last 200 years (related to the general linear model) is useless. Regression can be performed as accurately without statistical models, including the computation of confidence intervals (for estimates, predicted values or regression parameters). The non-statistical approach is also more robust than theory described in all statistics textbooks and taught in all statistical courses. It does not require Map-Reduce when data is really big, nor any matrix inversion, maximum likelihood estimation, or mathematical optimization (Newton algorithm).

Originally posted by Matei Zaharia (creator of Apache Spark) in databricks Today, we're celebrating an important milestone for the Spark project — it's now been five years since Spark was first open sourced. When we first decided to release our research code at UC Berkeley, none of us knew how far Spark would make it, but we believed we had built some really neat technology that we wanted to share with the world. In the five years since, we've been simply awed by the numerous contributors and users that have made Spark the leading-edge computing framework it is today. Indeed, to our knowledge, Spark has now become the most active open source project in big data (looking at either contributors per month or commits per month).

Teradata Corp. (NYSE: TDC), the big data analytics and marketing applications company, announced today it is making its Teradata Database, a leading data warehousing and analytic solution, available for cloud deployment on AWS to support production workloads.

In the increasingly patient-centric world of healthcare, predictive analytics has taken firm hold as a means for healthcare organizations to improve patient outcomes. Predictive models for determining patient responses to medication, lowering hospital readmission rates, assessing risk of disease breakout, and other uses are being implemented for superior disease management and care delivery. Similarly, in…

Next year, Teradata will offer its eponymous database as a service on the Amazon Web Services cloud. It will be the first foray onto a public cloud for Teradata, which has come a long way from its proprietary hardware roots. It wasn't that long ago that all Teradata (NYSE: TDC) deployments required an investment in proprietary hardware by the customer. The company sold its flagship Teradata Database–desired by large companies for its fast analytic processing on large datasets–as a pre-configured appliance.

Tracking 'What is Hadoop?' is getting more complex as the potential components of Hadoop systems increase — and core elements such as HDFS are augmented by possible alternatives.

Get ready for AWS business intelligence (BI): it's real and it packs a punch! Today's BI market is like a perpetual motion machine — an unstoppable engine that never seems to run out of steam….

by Andrie de Vries In one of John D. Cooke's blog posts of 2010 (Parameters and Percentiles), he poses the following problem: The doctor says 10% of patients respond within 30 days of treatment and 80% respond within 90 days of treatment. Now go turn that into a probability distribution. That's a common task in Bayesian statistics, capturing expert opinion in a mathematical form to create a prior distribution. John then discusses how this level of information is highly valuable in statistical inference.

Has your data strategy wandered away? Are you struggling to get it back on track? Discover how two travel and transportation companies reined in their programs, helping them become leaders in the insight economy.

Government use of data came under scrutiny after revelations of extensive information gathering by the NSA. Now that the NSA stories have faded from much of the public conscience, lingering concerns about how the government uses private data sources continue to dog public officials.

Published Date: 2015-10-07 14:26:05 UTC Tags: Analytics, Banking, Banking & Markets, Data Science, Predictive Analytics Title: Achieving Success In Retail Banking With Analytics Subtitle: Analytics could help retail banking get past some of its current challenges

Big data has been a common phrase in the tech industry as companies of all types collect staggering amounts of customer. Data is also a high-profile topic as recent security breaches that leaked sensitive personally identifiable information to the public. Aside from the security issues of protecting all that data, companies are quickly finding that, while they can collect massive amounts of information, it's another thing to actually organize and analyze it. Data isn't relevant only to IT departments, either.

Published Date: 2015-10-07 14:18:13 UTC Tags: Analytics, Big Data, Data Warehousing Title: Big Data Analytics — it's Big but is it Clever? Subtitle: How to have a data strategy that integrates all the Data, without all the hassle

The new release of the Accenture Cloud Platform offers a suite of cloud migration and management tools.

Google asks Germans for too much information about the inner workings of their premium sports car, so Porsche moves to Apple's CarPlay, according to a report.

KANSAS CITY, Mo. — At the request of his probation officer, Tyrone C. Brown came to a community auditorium here in June and sat alongside about 30 other mostly young black men with criminal records — men who were being watched closely by the police, just as he was. He expected to hear an admonition… The post Police Program Aims to Pinpoint Most Likely to Commit Crimes appeared first on Predictive Analytics Times.

Guest blog post by Mirko Krivanek Below is a Python for Visualization cheat sheet, originally published here as an infographics. Other cheat sheets about Data Science, Python, Visualization, and R, can be found here. Here are additional resources Infographics Dashboards R Python Excel Visualization Cowplot (see illustration at the bottom) Enjoy!

Visual Studio debug engine documentation is now available online, along with two samples. This debug engine, codenamed Concord, is Visual Studio's new debug engine that originally shipped in Visual Studio 2012.

The 2016 presidential election campaign has been exciting already with major candidate movements in polls on both sides of the party lines. The political system still relies on the use of manual polling. The question remains: can Web data be used as an alternative to understanding public opinion online? To help answer this question, BrightPlanet and Rosoka…

It's not that statisticians and data scientists had a falling out, precisely. Still, the relationship isn't all that it should be to speed big data projects to fruition. To pave the way to better relations between the two fields, and to help both benefit from increased collaboration, the American Statistical Association released a new policy statement.

The Alteryx advanced analytics tools suite comprises three products: Alteryx Designer, Alteryx Server and Alteryx Analytics Gallery.

IBM Cognitive Business Solutions will help clients accelerate their time to value on big data and advanced analytics solutions.

At AWS re:Invent, Splunk announced a new release of its app for Amazon Web Services. The enhanced version converts AWS CloudTrail, AWS Config, Amazon CloudWatch and Amazon Virtual Private Cloud Flow logs into intuitive dashboards for simplified operational and security intelligence. Customers can monitor user activity, resource changes, topology and network traffic flows.

Teradata announced today that it's making its data warehousing and analytics product, Teradata Database, available for cloud deployment on AWS early next year. Initially, Teradata Database on AWS will be offered on Amazon Elastic Cloud Compute – or EC2 – instances in supported AWS regions. You'll find those on a listing in the AWS Marketplace in 2016.

In this special guest feature, Chris Surdak, JD of HP discusses observational bias and how it can work to taint the result of big data analytics.

Mass shootings are now sadly a very common occurrence in America. This visualization by Andy Kriebel uses Tableau public to look at gun violence related events in the US.

Innography released U.S. patent purchase trends for 2015 through July from the company's Patent Market Tracker, a standalone data set and analytics service. The results show roughly a 10 percent yearly increase over the last several years. Here's what they found in patent sales and purchasing trends.

Sally Elatta talks about the Agility Health Check tool, with examples of where it has been used, the way teams and organisations can use the information collected and how the tool itself is evolving in response to market demand.

All businesses are at the mercy of data quality challenges. From the moment you capture your first lead, you'll be fighting a battle against data decay. The bigger the database gets, the more problems the business can encounter, and it isn't easy to single out a particular cause.

A disruptive technology can be defined as an innovation that will help to develop new value networks and markets. However, these new technologies will eventually disrupt the existing value network and markets, displacing earlier technologies. Disruptive technology will often force companies to change the way that they approach their businesses, and if an organization cannot adapt, they will likely risk becoming irrelevant or losing market share.

Source: blog – Big Data and Hadoop skill could mean the difference between having your dream career and getting left behind. Dice has  quoted, "Technology professionals should be volunteering for Big Data projects, which makes them more valuable to their current employer and more marketable to other employers."

Today a study will come out saying that Spark is eating Hadoop — really! That's like saying SQL is eating RDBMSes or HEMIs are eating trucks. Spark is one more execution engine on an overall platform built of various tools and parts. So, dear pedants, if it makes you feel better, when I say "Hadoop," read "Hadoop and Spark" (and Storm and Tez and Flink and Drill and Avaro and Apex and …). The major Hadoop vendors say Hadoop is not an enterprise data warehouse (EDW) solution, nor does it replace EDW solutions. That's because Hadoop providers want to co-sell with Teradata and IBM Netezza, despite hawking products that are increasingly eating into the market established by the big incumbents.

This entry was posted in News. Bookmark the permalink.