Big Data News – 18 Jul 2016

Featured Article
This is the first in a multi-part series on launching successful data products. At Juice, we've helped our clients launch dozens of data products that generate new revenue streams, differentiate their solutions in the market and build stronger customer relationships. Along the way, we've learned a lot about what works and doesn't. In this series I'll take you through what you need to know to design, build, launch, sell and support a data product.

Top Stories

Analytic groups within an organization have a bewildering array of tools and technologies available to them, from relational databases, search plus advanced statistical modelling, to name a few. The majority of these tools have been designed to allow one or more types of questions to be answered easier than before, yet these organizations are still facing the same fundamental problems: Analytic Backlog – there is a list of analytics that need to be created, and only the high priority ones appear to get completed.

List: As Softbank acquires ARM, CBR looks at some of the biggest acquisitions of UK companies.

by Chris Radkowski, Director of Solution Management, SAP GRC and Security In the consumer world, user experience and ease of use have emerged as paramount. Recent studies have shown that ease of use is driving brand loyalty across many industries, especially travel and retail. It makes sense that customers come back to applications and websites (more…)

Over the course of the past 3 months I've conducted in-depth research with senior analytics professionals in South Africa. The research group has been spread across a number of industries and a variety of company sizes. One of the key topics that has come up time and again is that of how to organise for analytics success. I suppose there are myriad ways this concept – 'organise for analytics success' – can be interpreted.

Ovum, a leading technology analyst firm, has published an in-depth report, Ovum Decision Matrix: Selecting a DevOps Release Management Solution, 2016–17. The report focuses on the automation aspects of DevOps, Release Management and compares solutions from the leading vendors.

Developed by the Apache Software Foundation, which specializes in open source software and has taken a particular fancy to big data analytical tools, Spark is an in-memory distributed processing and analytical platform. Spark originated as a class project at the University of California at Berkeley to fill in some gaps that existed in the big data technologies of the time. That was 2009. Since then, it has matured into a fully-functional platform that is utilized by many organizations across various industries. It is used to build big data analytics applications using the most popular languages, such as Java, Python, Scala, and R. Spark Versus MapReduce While Spark is gaining momentum, MapReduce continues to be the workhorse of the big data world within the Hadoop ecosystem.

To register, click here. Title: 4 Steps to improve your Search Technology and Boost Sales, BI and User Experience Date: Tuesday, July 19, 2016 Time: 09:00 AM Pacific Daylight Time Duration: 1 hour Summary Visitors who search within e-commerce sites are two to three times more likely to convert compared to those who don't. Improving search relevancy and user experience can significantly boost your bottom line. Frequently, users will search for a product, don't find it, and leave because of poor search technology/algorithms. This webinar addresses these issues.

MultiView, a leader in digital publishing solutions for associations and digital marketing solutions for B2B marketers, announced the broad market release of VisitorView, a data analytics solution designed specifically for B2B companies.

Turi, formerly Dato, announced the launch of GraphLab Create 2.0, Turi Distributed and Turi Predictive Services, machine learning products that allow data scientists and software developers to add machine learning features to applications faster.

A critical component of any IoT project is what to do with all the data being generated. This data needs to be captured, processed, structured, and stored in a way to facilitate different kinds of queries. Traditional data warehouse and analytical systems are mature technologies that can be used to handle certain kinds of queries, but they are not always well suited to many problems, particularly when there is a need for real-time insights.

"We formed Formation several years ago to really address the need for bring complete modernization and software-defined storage to the more classic private cloud marketplace," stated Mark Lewis, Chairman and CEO of Formation Data Systems, in this SYS-CON.tv interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.

Qubole, the big data-as-a-service company, announced the launch of its Cloudera Migration Program to help enterprises expand their use of big data by gaining the advantages of the cloud.

'Outlier' is a term that comes from statistics and data analytics. Math.com defines an outlier as "a value that lies outside (is much smaller or larger than) most of the other values in a set of data," and it gives a sample of values for an example. If you have the values 25, 29, 3, 32, 85, 33, 27, and 28, both 3 and 85 are your outliers."

In an age of digital disruption, great customer experience has become do or die. Digital technologies such as Analytics, Mobile, Cloud, Gamification, Cognitive computing, Artificial… …

Paxata, provider of the enterprise-grade self-service data preparation platform, announced the results of an industry study published by TDWI and sponsored by Paxata about the self-service data preparation market and its emerging role in accelerating the transformation of data to information.

Student data derived from multiple sources can feed a rich repository for cognitive systems delivering actionable insight to teachers that helps them personalize education. But the sensitivities around data privacy can challenge this objective. Discover how an understanding of student data assessment and individual students can fuel a cognitive system that supports both the teacher and the student.

Efficiently monitor trading-related activities and complex trading scenarios by using IBM Surveillance Insight for Financial Services, a solution designed to boost your ability to safeguard your company's reputation. Surveillance Insight's sophisticated analytics can help you generate proactive surveillance results by combining data from different sources, including both structured and unstructured data such as chat transcripts, email communications, voice recordings, social media and trade transactions.

For law enforcement, body-worn cameras are becoming an increasingly common tool. But with hundreds of hours of video, can agencies take full advantage of this information to help stop crime?

News this week included 5G spectrum, PC sales, TelePacific's SD-WAN, an FDIC hack, and digital transformation difficulties.

Fifty billion connected devices and still no winning protocols standards. HTTP, WebSockets, MQTT, and CoAP seem to be leading in the IoT protocol race at the moment but many more protocols are getting introduced on a regular basis. Each protocol has its pros and cons depending on the nature of the communications. Does there really need to be only one protocol to rule them all? Of course not. In his session at @ThingsExpo, Chris Matthieu, co-founder and CTO of Octoblu, walk you through how Octoblu solved this problem by building an open source, cross-protocol IoT M2M instant messaging platform utilized by thousands of users and companies to allow disparate devices to communicate seamlessly with each other and other platforms.

There's a certain class of data problem that is elegantly addressed by NoSQL databases, which is why the market for NoSQL databases is growing faster than the overall market. The market is led by the Big Four, including Couchbase, Datastax, MarkLogic, and MongoDB, but there's a long tail of other players in the NoSQL market, including some older products that are still going strong. If you read this newsletter, you're probably aware of the some up-and-coming NoSQL players like Aerospike, Basho, and Redis, as well as specialized products like the Neo4j's graph database, Splunk's log database, Elastic's search engine, and Sqrrl's security-focused NoSQL.

Ten years. As I sit here and type the words, I still can't believe it's already been that long. Ten years since Hadoop entered the lives of data and data management professionals like myself and ushered in an unprecedented (and still ongoing) era of change, investment, exploration, doubt, and discovery. To say the time has flown by would be the very definition of an understatement. To say the Hadoop experience has changed the lives and careers of everyone in its path would be an equally insufficient characterization. For me personally, and for many others, Hadoop Summit has been a major part of that experience — an annual inflection point enabling us to truly take stock of the technology impact.

Forces at work in data management today have led to the advent of the chief data officer job. This and more is discussed in a Q&A with Joe Caserta.

In this edition of the Knoyd-blog we will take a look at movie locations in San Francisco. Using the Google Places API and IMDb API we selected places in "The Golden City", which every movie fan should visit, while they are in town. The original dataset was downloaded from SF OpenData site, which provides many datasets about San Francisco.

Microsoft will release a Windows 10 Anniversary Update later this year. Check out some of the new innovative features that will be included.

As it is with any major project, the best way to do Big Data/IoT is to break it into smaller, more manageable components.

In this monthly feature, we'll keep you up-to-date on the latest career developments for individuals in the big data community. Whether it's a promotion, new company hire, or even an accolade, we've got the details. Check in each month for an updated list and you may even come across someone you know, or better yet, yourself! Henry Morris The International Data Corporation (IDC) has named Dr. Henry Morris an IDC Fellow for big data, analytics, and cognitive software research.

Using drone technology can bring down aircraft inspection time from two hours to not more than 15 minutes, according to Intel.




Ian Meyers is a Solutions Architecture Senior Manager with AWS Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost-effective to analyze all your data using your existing business intelligence tools. A cluster is automatically backed up to Amazon S3 by default, and three automatic snapshots of the cluster are retained for 24 hours. You can also convert these automatic snapshots to 'manual', which means they are kept forever. Snapshots are incremental, so they only store the changes made since the last snapshot was taken, and are very space efficient.

An ROC curve is a plot that compares the trade off of true positives and false positives of a binary classifier under different thresholds. The area under the curve (AUC) is useful in determining how discriminating a model is. Together, ROC and AUC are very useful diagnostics for understanding the power of one's model and how to tune it.

At the useR! conference last month, I was pleased to be able to give a couple of talks about the ways that Microsoft is using and integrating R. In my first talk, Hear, See, Move, I shared how data… …

DMWay Analytics, leading innovator of predictive analytics automation solutions, announced today that Eric Siegel, one of the top 10 data science thought leaders, will serve as an advisory board member. Eric Siegel and Gil Nizri at the Predictive-Analytics-World conference, San Francisco, 2016 Recognized as one of the world's most influential personas in predictive analytics, Siegel… The post Predictive Analytics World Founder Eric Siegel Joins DMWay Analytics Advisory Board appeared first on Predictive Analytics Times.

Gaurav Dhillon knows a thing or two about integration. In his twenties, he co-founded Informatica and helped thousands of enterprises deal with the challenges of application and data integration in the client-server world. Now, as CEO of San Mateo, California-based SnapLogic, Dhillon is tackling the integration challenges IT shops face in the new world of cloud. (Insider Story)

For more than 40 years, statistical offices around the world have used SAS to influence economic growth, plan for needed infrastructures and improve the quality of life for citizens.

Amazon Web Services has bought Cloud9, a popular web-based developer environment that recently aligned itself with the Google Cloud Platform. Cloud9 is a browser-based IDE (integrated development environment) with a fairly rich feature set for building and deploying applications. Because it runs in a browser, developers can pick up their work from any machine, and Cloud9 has tools that let developers collaborate on projects. Along with Codenvy, it was one of the few remaining popular, independent cloud IDEs. "While the cloud IDE space is hot, as a market, IDEs are not an easy way to make money," said IDC analyst Al Hilwa. "The technology is better used as a sweetener to make broader platforms more attractive to developers."

All businesses are at the mercy of data quality challenges. From the moment you capture your first lead, you'll be fighting a battle against data decay. The bigger the database gets, the more problems the business can encounter, and it isn't easy to single out a particular cause. Often, data simply goes out of date over time — a natural ageing process that affects all business data. But there are other reasons for poor accuracy: spelling mistakes, accidental duplicates, or incorrect entries in fields, to name but three.

This entry was posted in News and tagged , , , , , , , , , , , , , . Bookmark the permalink.