Big Data News – 13 Aug 2105

During a professional golf tournament, fans know it's not possible to see every golfer, every swing and every putt in real time. For the 2015 PGA Championship at Whistling Straits in Kohler, Wis., this week the PGA beefed up the fan experience on PGA.com and within its mobile app, so fans can see more of the action, whether they're on the course or watching from home. "The way we cover [golf], there's only so much you can cover," says Gary Treater, general manager of business operations for Turner Sports, which operates PGA.com. "As new technology becomes available, it allows you to provide fans with a better experience at home via second screen or using data to tell a story while at the course."

 

Published Date: 2015-08-13 14:28:33 UTC Tags: Customer Insight, Cyber security, Analytics, Predictive Analytics, Technology Title: How to Employ a More Frictionless Approach to Customer Authentication Subtitle: Can caller verification really become "frictionless"?

 

Google brings its own Cloud Dataflow and Cloud Pub/Sub big data services to Compute Engine and App Engine.

 

IBM Watson is opening its big data analytics ecosystem to new ideas and businesses, including a new way to help you win your fantasy football league.

 

There's one thing that is common among all industries: risk management. It doesn't matter if you're in banking or insurance you want to minimize the potential risks, and manage situations that arise. We've explained how open source intelligence (OSINT) can strengthen your current efforts. Now, we're going to talk about how OSINT can be specifically used […] The post Using OSINT to Improve Risk Response in Your Risk Management Framework appeared first on BrightPlanet.

 

IBM dropped another cool $1 billion for imaging software company Merge Healthcare last week, continuing a string of acquisitions from the beginning of the year. Last quarter, IBM announced the acquisitions of Cleveland Clinic spinoff Explorys and population health analytics company Phytel on the same day, rocking the annual HIMSS conference with the announcements. Around the same time, IBM also announced the formation of the Watson Health unit and teamed up with Apple, Medtronic, and Johnson & Johnson. The intent was clear. Gain access to as much medical data, as quickly as possible, to feed the Watson engine. The acquisition of Merge Healthcare would appear to be just another tidbit in the grand scheme of things.

 

The British newspaper The Guardian has open sourced Grid, their image management service. Grid utilizes numerous modern web-based technologies including AngularJS, Amazon Web Serivces, and ElasicSearch using ECMAScript 6 and Scala. Build by a small developer team over the past 11 months, it is currently used in production and available under a liberal open source license. By Jeff Martin

 

Hello from the newest analyst serving Forrester Research's CIO role. My name is Paul Miller, and I joined Forrester at the beginning of August. I am attached to Forrester's London office, but it's… …

 

Some people stress the need for agile training with certification, as it helps to select candidates and lays a foundation for an agile transformation. Others are against certification, in their opinion they don't reflect people's abilities and skills properly and people who have no certifications might be better candidates than others who have. Are you pro or against agile certification?

 

At QCon San Francisco, we offer two days of workshops (Nov 19-20). Workshops focus on developing the technical skills that leverage technologies you heard about from our expert practitioners during the conference sessions. Here is a glimpse at some of the experts you can learn from QCon SF workshops.

 

There's an old axiom that goes something like this: If you offer someone your full support and financial backing to do something different and innovative, they'll end up doing what everyone else is doing. So it goes with Hadoop, Spark, and Storm. Everyone thinks they're doing something special with these new big data technologies, but it doesn't take long to encounter the same patterns over and over. Specific implementations may differ somewhat, but based on my experience, here are the seven most common projects. [ Also on InfoWorld: Apache Hive brings real-time queries to Hadoop as the perfect partner for an enterprise data warehouse. | Get a digest of the day's top tech stories in the InfoWorld Daily newsletter. ] Project No. 1: Data consolidation Call it an "enterprise data hub" or "data lake." The idea is you have disparate data sources, and you want to perform analysis across them. This type of project consists of getting feeds from all the sources (either real time or as a batch) and shoving them into Hadoop. Sometimes this is step one to becoming a "data-driven company"; sometimes you simply want pretty reports. Data lakes usually materialize as files on HDFS and tables in Hive or Impala. There's a bold, new world where much of this shows up in HBase — and Phoenix, in the future, because Hive is slow.

 

There's an old axiom that goes something like this: If you offer someone your full support and financial backing to do something different and innovative, they'll end up doing what everyone else is doing. So it goes with Hadoop, Spark, and Storm. Everyone thinks they're doing something special with these new big data technologies, but it doesn't take long to encounter the same patterns over and over. Specific implementations may differ somewhat, but based on my experience, here are the seven most common projects. [ Also on InfoWorld: Apache Hive brings real-time queries to Hadoop as the perfect partner for an enterprise data warehouse. | Get a digest of the day's top tech stories in the InfoWorld Daily newsletter. ] Project No. 1: Data consolidation Call it an "enterprise data hub" or "data lake." The idea is you have disparate data sources, and you want to perform analysis across them. This type of project consists of getting feeds from all the sources (either real time or as a batch) and shoving them into Hadoop. Sometimes this is step one to becoming a "data-driven company"; sometimes you simply want pretty reports. Data lakes usually materialize as files on HDFS and tables in Hive or Impala. There's a bold, new world where much of this shows up in HBase — and Phoenix, in the future, because Hive is slow.

 

The Design and Implementation of the FreeBSD Operating System is a long awaited update to a successful and authorative guide to the FreeBSD kernel. The second edition covers all major improvements between FreeBSD version 5 and 11 and, according to the publisher, it has been extensively rewritten for one-third of its content, while another one-third is completely new.

 

The React team has released entirely new devtools for the JavaScript library — including a new version for Firefox. Jared Forsyth said "The current version of the devtools is a fork of Blink's "Elements" pane, and is imperative, mutation-driven, and tightly integrated with Chrome-specific APIs. The new devtools are much less coupled to Chrome, and easier to reason about thanks to React.

 

Retail is a market which has been growing exponentially in the past few months. Reports suggest that the world retail market will grow by 5.5% to reach US $23.8 trillion in 2018. As the number of retailers increase and social media starts picking up considerable vogue in the marketplace, the need for involvement of analytics in retail has increased manifold. Big Data has answered the call effectively with its implementation across many retail outlets in the world. On the other hand, many Indian retailers have been hiding from the forefront of analytical technology for a very long time. Reputed large retail brands have been ignorant towards using this expertise to their advantage. Therefore, customer complaints about the unavailability of certain products are frequent in this scenario.

 

Retail is a market which has been growing exponentially in the past few months. Reports suggest that the world retail market will grow by 5.5% to reach US $23.8 trillion in 2018. As the number of retailers increase and social media starts picking up considerable vogue in the marketplace, the need for involvement of analytics in retail has increased manifold. Big Data has answered the call effectively with its implementation across many retail outlets in the world. On the other hand, many Indian retailers have been hiding from the forefront of analytical technology for a very long time. Reputed large retail brands have been ignorant towards using this expertise to their advantage. Therefore, customer complaints about the unavailability of certain products are frequent in this scenario. This has affected labor productivity in Indian retail. According to a McKinsey study conducted in 2010, the labor productivity in India is only 6% of that in the United States. The perfect solution to this problem lies in applying Big Data solutions to the marketplace.

 

Retail is a market which has been growing exponentially in the past few months. Reports suggest that the world retail market will grow by 5.5% to reach US $23.8 trillion in 2018. As the number of retailers increase and social media starts picking up considerable vogue in the marketplace, the need for involvement of analytics in retail has increased manifold. Big Data has answered the call effectively with its implementation across many retail outlets in the world. On the other hand, many Indian retailers have been hiding from the forefront of analytical technology for a very long time. Reputed large retail brands have been ignorant towards using this expertise to their advantage. Therefore, customer complaints about the unavailability of certain products are frequent in this scenario. This has affected labor…

 

If you’re like most homeowners, you probably sneak a peek at your ‘Zestimate’ from time to time to see how your home’s value might have changed. Getting a Zestimate is very easy and straightforward for users, but behind the scenes, there’s a hefty amount of data science that goes into the equation. The Zestimate is a core part of Zillow’s offering, and is critical for the company’s business model. The figure is an estimated market value that’s based on a number of public and user-submitted data, including physical attributes, like location, lot size, square footage, and number of bedrooms and bathrooms. Historical data like real-estate transfers and tax information is also factored in, as are sales of comparable houses in a neighborhood.




With massive amounts of data flooding into businesses all the time, management and analysis of it isn't easy. Data Lakes are increasingly being used by enterprises to unify Big Data from various… …

 

Google is taking the wraps off its Dataflow hosted cloud service while announcing a batch of partnerships and third-party developers as part of an effort to reduce the operational hurdles associated with traditional data analytics systems. In announcing general availability of Dataflow, its big data pipeline model launched in June 2014, Google revealed four new Dataflow service integrators: Clear Story, Salesforce, SpringML and Tamr. It also announced software development kit (SDK) runners from DataArtisans and Cloudera. The latter announced in January it would team with Google to run Dataflow on Apache Spark. Google Cloud Dataflow…

 

Two Google big data toolsets have finally moved out of beta and into full commercial release, adding to its cloud portfolio a data analysis framework and a service for managing data streams in real-time. Google Cloud Dataflow, which could serve as a possible replacement for Hadoop, provides a framework for fusing different sources of data within one processing pipeline. Google Cloud Pub/Sub is the company's service for managing data streams in real time. The two services fill out Google's roster of cloud-based data analysis tools, joining Google BigQuery, a commercial service for analyzing large sets of unstructured data.

 

The concept of "liquidity" isn't just restricted to the financial sector. Thanks to the Internet of Things, the physical world can become liquefied as well making it easier than ever for organizations to enter into the IoT ecosystem.

 

To stay on top of the latest trends and technology in the dynamic big data market, check out these recent Pivotal Data webinars. Pivotal Webinar Replays Customer Spotlight: How WellCare Accelerated Big Data Delivery to Improve Analytics (http://goo.gl/L9Iqjt) – In this webinar, speakers from WellCare, Attunity and Pivotal discuss how WellCare uses Attunity Replicate to offload data quickly and easily from its SQL Server and Oracle systems into Pivotal Greenplum Database to support real-time reporting and analytics. IoT: How Data Science-Driven Software is Eating the Connected World (https://goo.gl/HEzfCc) – The Internet of Things (IoT) will forever change the way businesses interact with consumers and each other.

 

We're coming to the tipping point for public cloud adoption, and it's going to have big consequences for data warehousing, BI, and analytics. It's no longer a question of if we'll move analytics to the cloud, but rather of when. There's also a big question of what enterprise class data integration will look like in a hybrid cloud. In terms of the Innovation Curve, we've moved from "early adopters" to the "early majority." According to a recent Gartner survey, this year saw a 50 percent jump in the portion of respondents who said they plan to run mission critical applications on the cloud, from about 30 percent in each of the previous four years to 45 percent this year. Many companies even Fortune 1000 are mandating that all new infrastructure will be in the cloud.

 

The NYU Stern Master of Science in Business Analytics is an advanced business degree, which teaches experienced professionals how to understand the role of evidence-based data in decision-making and to leverage data as a valuable and predictive strategic asset. Those interested in the program should have a minimum of 5 years of professional experience and may come from a broad scope of sectors including financial services, communications, consulting, health and pharmaceuticals, manufacturing, energy, IT and nonprofit. Our part-time format limits office leave to approximately 5 weeks over the course of the program. Between modules, students complete approximately 20-25 hours of work per week on pre- and post-module tasks. These assignments are conducted online through the program's online distance learning platform, adding another layer of flexibility to the course structure. © 2015 Leonard N. Stern School of Business

 

Strata + Hadoop World: 3 days you can't miss Early Price ends Friday, August 14 Hello DSC Member, Strata + Hadoop World is September 29-October 1 in New York. Selling out last year with 5,500 attendees, the defining event of the big data movement is only getting bigger. The all new program includes: 11 tracks with 200+ sessions in Data-driven Business, Data Innovations, IoT & Real-time. More than 250 speakers already lined up

 

Originally posted on Data Science Central Interesting article posted here. I've listed some of the most popular below. To find out about those not listed here (Redis, REVENDB, Riak, Perst, Voldemort, Terrastore, NeoDatis, MyOODB, OrientDB, InfoGrid, DB4objects), read the original article. Source for picture: 21 NoSQL databases (must read) Open Source NoSQL Databases MongoDB – This highly scalable and agile NoSQL database is a amazing performing system. This open source database written in C++

 

Organizations may frequently fall into the trap of trying to get an advanced sales performance management (SPM) solution in operation as soon as possible. However, initiating an implementation stage too soon, too hastily or at the expense of best practices can cause companies to overlook critical considerations. These recommendations from SPM experts can help organizations avoid overlooking five common pitfalls during the implementation stage.

 

by Andrie de Vries This week at JSM2015, the annual conference of the American Statistical Association, Joseph Rickert and I gave a presentation on the topic of "The network structure of CRAN and… …

 

Published Date: 2015-08-12 15:14:08 UTC Tags: Data Science, Analytics, Predictive Analytics, Technology, Big Data Title: How Far Away Are We From Data-Driven Smart Cities? Subtitle: There is a clamour for them, but how near are they?

 

Frege, named after the German mathematician Gottlob Frege, is a purely functional, strongly typed language for the JVM that is so similar to Haskell that "most idiomatic Haskell code will run unmodified or with only minimal, obvious adaptions". InfoQ has spoken with Ingo Wechsung, Frege's creator. By Sergio De Simone

 

Originally posted on Data Science Central Orientation In both semantic model standards Topic Maps and RDF/OWL and in many other NoSQL approaches to solve efficiently the problem of how to represent relations and relationships one major stumbling block is raised beyond all efforts: the namespace. It is a language problem, the babel we have in our civilized world is transferred into our IT systems. But machines do not have to understand our language, we do. Good news for everyone, there is an alternative way of thinking on modelling data: token based, fully symmetrical, bidirectional linking, single instance centric, data-type agnostic. namespace agnostic, fully contextualized, structure-free, and many more novelties…. Hence AtomicDB Data Model or as I call it AIR Atomic Information Resource Data Model The Entity-Attributes 'Silo' Structure The problem here is that from a semantic point of view, similar diagrams are in need from users that want to express business processes but when we reach the implementation stage software engineers have to marry business requirements with the technical constrains of the database system hence the ER diagram you see. Generally speaking this is known as "The Model", a conceptual view of the user on data. The ER version of the model has several limitations, due to the architecture of RDBMS.

 

Guest blog post by Jean Villedieu Fraud detection is all about connecting the dots. We are going to see how to use graph analysis to identify stolen credit cards and fake identities. For the purpose of this article we have worked with Ralf Becher, irregular.bi. Ralf is Qlik Luminary and he provides solutions to integrate the Graph approach into Business Intelligence solutions like QlikView and Qlik Sense. Third party fraud in retail Third party fraud occurs when a criminal uses someone else's identity to commit fraud.

 

What do transformational data leaders have in common? Each has found a way to do three things: (1) Make data a priority, (2) develop from within and (3) free data from silos within the organization. Such leaders face challenges common to many but often develop unique approaches driving transformational actions.

 

A new analytics suite from Broadbean called BDAS looks to use big data to help HR and recruiters hire the best talent for companies.

 

by Shawn Rogers Innovative organizations are reviewing their data management strategies to identify where and how cloud solutions should play a role. An array of offerings and technology advancements are enabling companies to disrupt traditional data management paradigms in favor of new ways to create value. A great example is cloud-based analytics. Research by Enterprise Management Associates (EMA) identified…

 

RebelLabs published their Developer Productivity Report, the result of a survey started in March 2015, where they polled the Java development community on Java performance and performance testing methods. To see how these numbers line up with a real world experience, InfoQ spoke with Kirk Pepperdine, CTO at JClarity and well-known performance expert. By Matt Raible

 

Dan Tousignant, Agile Executive Coach and Trainer at Cape Project Management, proposed a matrix to help organizations choose their Agile approach.

 

Technical debt quantification tools attempt to quantify the existing technical debt in a software product. However, the present set of quantification tools suffers from various limitations such as limited or no support for quantification of all technical debt dimensions, generalized absolutization, and missing interest component. Hence, quantified cost and effort must be interpreted with caution.

 

Twice as many "things" vs. people connected to the Internet today. 9 billion connected devices by 2018. The Internet of Things (or IoT) refers to the notion of extending the communication revolution to objects a new reality where objects are interconnected and tech-enabled. When objects connect to the Internet, they become smart and can take actions based on their environment and track surroundings to help another device or "thing" make a decision. The question is not whether the "Internet of Things"…

 

A market basket analysis or recommendation engine is what is behind all these recommendations we get when we go shopping online or whenever we receive targeted advertising. The underlying engine collects information about people's habits and knows that if people buy pasta and wine, they are usually also interested in pasta sauces. So, the next time you go to the supermarket and buy pasta and wine,

 

When it comes to big data visualization tools, there’s no shortage of players. Tableau, Qlik, Spotfire, and Microstrategy are established incumbents with big followings. But there’s a fresh crop of visualization tools making waves, including one from Zoomdata that’s helping to change how we think about big data analysis. Here are three ways that Zoomdata is helping to change the field of big data analytics and visualization: 1. Micro Queries and Data Sharpening Zoomdata’s flagship analytics and visualization tool embraces the concept of “micro queries.” Instead of composing a query, hitting the “go” button, …

 

Hewlett-Packard fleshed out its big data strategy this week with the roll out of a new version of its analytics platform at a company event along with an “accelerator” program that would give developers access to APIs on its Haven big data platform at a discounted rate. The new addition to the Haven platform called “Excavator” targets analysis of high-speed streaming data from a variety of sources, including the Internet of Things. It is also designed to improve SQL and machine log analytics along with Hadoop performance, HP said Tuesday (Aug. 11). Excavator has been integrated with Apache Kafka, a distributed messaging system for data streaming,

 

Learning more about your customers is valuable to any business. Since customer data has traditionally been stored as transaction records and other structured formats, there can be huge challenges…

 

This entry was posted in News. Bookmark the permalink.