Big Data News – 1 Oct 2015

Top Stories
It's easy to think most of the big, urgent questions around Hadoop are technical ones: What's so special about Spark vs. MapReduce? What are the data governance tools like? But judging from the turnout at a session held at the Strata+Hadoop World 2015 conference in New York yesterday, the most urgent questions may be the simplest: What's the best way to get started? How do you demonstrate to the rest of the company that Hadoop is worth the effort?

Apple, IBM and Microsoft have all made multiple acquisitions of analytics startups in the last year. Here's an overview of who they bought and why it matters to IT.

Guest blog post by Bernard Marr With an ever-growing number of businesses turning to Big Data and analytics to generate insights, there is a greater need than ever for people with the technical skills to apply analytics to real-world problems. Computer programming is still at the core of the skillset needed to create algorithms that can crunch through whatever structured or unstructured data is thrown at them. Certain languages have proven themselves better at this task than others. Here's a brief overview of 10 of the most popular and widely used. Fractal landscape simulation requires a lot of computing (this one possibly produced with MATLAB) Julia Julia is a relative newcomer, having existed only for a few years, however it is quickly gaining popularity with data scientists praising both its flexibility and ease of use. Although designed as a "jack of all trades" language, able to cope with any sort of application, it is thought to be particularly efficient at utilizing the power of distributed systems such as Hadoop, frequently used in Big Data. Crowd-sourced data science website Kaggle is currently running a competition which doubles as a tutorial on getting started with Julia — it will show you how to use it to create algorithms designed to detect text characters, such as roadside graffiti, in Google Street View images. SAS The SAS language is the programming language behind the SAS (Statistical Analysis System) analytics platform, which has been used for statistical modelling since the 1960s and is still popular today after many years of updates and refinements. Although unlike many of the other languages mentioned here it isn't open source, so it isn't free, there is a free University Edition designed for learners, available here.

When it comes to BI, it still largely operates like it did at a company I worked for in the '90s. A businessperson comes up with an idea, it requires some data, and she's forced to ask IT not only to provide the data but often the report as well. Usually this is because the data is in a form that makes writing a SQL query too difficult for the average Excel user.

Before the digital revolution, in the not so distant past, marketers focused on creating compelling ads for the Sunday circulars, producing television commercials, and perhaps come up with some killer creative for a billboard or two. The job of the marketer has quickly evolved to keep up with technology and marketers are now crunching statistics, targeting individuals versus the masses, and trying to navigate through thousands of channels to reach their target audience.

The various online reports about the end of Hadoop as a big data framework bring to mind Mark Twain's notable quote about the reports of his demise being an exaggeration. Hadoop is very much alive, and numerous organizations continue to make it a key component of their big data and analytics initiatives.

The use of analytics in healthcare is gaining momentum as the industry shifts to a value-based delivery model. Hospitals and health systems are searching for the ability to identify patient activity, reduce cost, and increase the level of engagement of both the physician and the patient. This is where predictive analytics enters into the equation…

GE customers echo CEO Jeffrey Immelt's contention at Minds + Machines conference that Predix industrial analytics cloud is "about no unplanned downtime."

Not doing any user experience testing on your enterprise apps? Do it now. Want to make sure your apps are successfully used by your employees? Consider applying usability analytics to your evaluation process. Here's how.

Via its new columnar data store, called Kudu, Cloudera wants to enable the deployment of faster types of data analytics on top of Hadoop.

With Azure, Microsoft stakes claim against Amazon and Google, touting its ability to serve as a more muscular cloud partner for the enterprise data center.

With the aim of simplifying Hadoop use, Hortonworks rolled out three new products to coincide with Strata+Hadoop World in New York City.

by Joseph Rickert I have been a big fan of R user groups since I attended my first meeting. There is just something about the vibe of being around people excited about what they are doing that feels good. From a speaker's perspective, presenting at an R user Group meeting must be the rough equivalent of doing "stand-up" at a club where you know mostly everyone and you are pretty sure people are going to like your material.

This instance of Bluemix is managed via Relay, which lets IBM push updates to Bluemix from a centrally managed service running in an IBM data center.

At Salesforce's recent Dreamforce conference, the company announced an upcoming IoT platform that will allow for the ingestion of real time data and turn it into actionable tasks across its suite of cloud based services.

By: Lorie Nelson, Senior Product Manager for Teradata's Travel and Hospitality Data Model As a child I was sure there must be a book of life that explains what we need to know about each other, the planet and our purpose in the world. I asked every adult I knew or came in contact with if they knew of such a book but no one had an answer. Initially, I was disappointed until the idea came to me, "I will write my own book." And, this began my first data collection project – the beginning of my fascination with data.

Clipboard.js is a lightweight library to copy text to the clipboard without using Flash.

Databricks, the company founded by the creators of Apache Spark, released the findings of a survey of more than 1,400 respondents from the Spark community to identify how organizations and users are utilizing the data analytics and processing engine.

Apple / iPhone — 225,000 records exposed — malicious outsider Malware was discovered in iPhone's that have been "jailbroken". This has affected nearly 225,000 iPhones and the login credentials have been comprised. here. This breach actually was announced late last month but we are including in the September blog post since it wasn't mentioned in our previous post. Grupo Financiero Banorte — 20,000 records exposed — malicious outsider Grupo Financiero Banorte is a Mexican banking and financial services holding company with headquarters in Monterrey and Mexico City, Mexico — the third largest in the country. Not much information has been posted about this data breach however Mexico's National Transparency, Information Access, and Data Protection Institute (INAI) recently announced an impending fine of almost $2 million to be levied against Grupo Financiero Banorte.

Felix López, Alvaro Videla discuss about RabbitMQ and messaging architectures, both from a theoretical perspective and a practical one, analyzing a product that runs integrated inside GMail, Comcast, and other architectures.

If mobile BI solutions are going to enable organizations to drive growth and profitability, technology or technical know how alone won't be the only ingredient for success. It starts with leadership and our team's talent and passion will be the determining factor. There's no simple blueprint for success given the resource constraints and competing priorities…

Developer fatigue is the overwhelming frustration felt by developers who are under pressure to keep current with a flood of new languages, libraries, frameworks, platforms and programming models. JNBridge offers a way to help alleviate developer fatigue by allowing you to mix the libraries you know with code written in the language you are learning.

Google has released Android Studio 1.4 with support for vector graphics, a theme editor, templates for the Design Support Library, and easier connection to a Firebase account.

QCon San Francisco is coming up on November 16-18th and registrations are up 49% over last year. QCon's annual practitioner-driven conference is designed for top-level software developers and influencers. A full schedule and list of talks is available at

A master data initiative is meant to deliver a unified and well-integrated source of cross-organizational data that is reliable and up-to-date, while eliminating silos and redundancies across the organization. Master data management (MDM) software solutions comprise an essential piece for trustworthy analytics that lead to better decision-making.

Introducing Tamr Catalog Beta Catalog All of Your Enterprise Metadata   Introducing Tamr Catalog beta A free tool to catalog enterprise metadata, regardless of type, platform or source. Tamr Catalog is a lightweight, user-friendly web application that offers organizations a better way to discover, organize and communicate about their data assets, many of which are considered "dark data," or data that's been processed and stored but is not often used. With Tamr Catalog, your organization can create a centralized repository of metadata and the knowledge that clusters around it.

We have two announcements today:  1. Upcoming Webinar  You are invited to our upcoming Pivotal webinar (hosted by Data Science Central) on using association techniques in the context of fraud detection.  Developed to find low-support and high-confidence malicious domain associations, these methods aid in the detection of coordinated network intrusions, like watering hole attacks. The session will also demonstrate a scalable and operationalizable framework to detect domain associations by analyzing the web traffic of users in any organization.  To register, go to  2. 

The NYU Stern Master of Science in Business Analyticsis an advanced business degree, which teaches experienced professionals how to understand the role of evidence-based data in decision-making and to leverage data as a valuable and predictive strategic asset. Those interested in the program should have a minimum of 5 years of professional experience and may come from a broad scope of sectors including financial services, communications, consulting, health and pharmaceuticals, manufacturing, energy, IT and nonprofit.

Jan Koehnlein presents the making of the XRobots game combining Lego Mindstorms with LeJOS, image recognition with OpenCV, augmented reality, Xtend, Xtext with Xbase, Eclipse, Orion, Jetty, JavaFX.

The rapid growth of mobility and the Internet of Things (IoT) continues to drive the need for real-time data analysis, intensifying demand for faster insight and action in the enterprise. In response to this demand, VoltDB announced Version 5.6 of its SQL in-memory operational database, a fast data platform that combines streaming analytics with transactions to support mission-critical, real-time applications.

Ben Straub discusses how automating communication tasks with chat robots can have a feedback effect on people and their culture, and how it can be applied to organizations.

Modern software increasingly operates on data in near real-time. There is business value in sub-second responses to changing information and stream processing is one way to help turn data into knowledge as fast as possible, Kevin Webber explains in an introduction to Reactive Streams.

Everyone uses data. At work, at home, even at the grocery store. But people don't often realize the full potential of the available data, especially since it seems so scattered and inaccessible. In reality, things like social media and daily news stories can be turned into structured, usable data for people across many industries — […] The post Unlimited Possibility Found in Online Data Sets & Data Sources appeared first on BrightPlanet.

This eMag offers readers tactical approaches to building software experiences that your users will love. Break down existing silos and create an environment for cross-collaborative teams: placing technology, business and user experience design at the core.

Dato, creator of the popular machine learning platform GraphLab Create, announced today toolkits and training for developers building Intelligent Applications.

There's no faster path to budgetary oblivion than implementing technology for technology's sake. In today's super-heated big data environment, it's easy to get all worked up over technologies like Hadoop without carefully considering the business justifications at the same time. At the Strata + Hadoop World conference today, Cloudera co-founder Mike Olson did his best to steer the conversation to real-world Hadoop solutions. Cloudera's chief strategy officer "Iron" Mike Olson kicked off today's Strata keynote extravaganza with a reminder about something he said on the same stage a year ago. "I made a prediction that Hadoop would disappear," Olson said.

Book review and interview with Steve Smith and Matthew Skelton, authors of "Build Quality In", a collection of experience reports (including their own) on Continuous Delivery and DevOps initiatives, by authors ranging from in-house technical and process leaders to external consultants implementing technical and organizational improvements.

As a strategic sponsor, IBM was represented in full force at Strata + Hadoop World 2015 in New York, New York. Day one proved to be a buzz of activity that included IBM data science experts getting a hands-on lab course on practical data science underway, IBM spokespeople discussing offerings and strategies and the IBM exposition floor pedestal hopping with activity during the opening reception.

Latest #askSAP Community Call Unveils How SAP Is Executing on its BI Strategy and Roadmap, with Plenty of Tips for Deriving Insight from Big Data Did you miss the recent #askSAP Analytics Innovation call on Big Data Insight with dashboards and visualizations? You can now replay the session at any time to take advantage of…

Sharing-economy companies like Uber and Airbnb may come to typify the future of business. But they couldn't operate successfully without effective data analysis techniques.

By banishing "bankers' hours," mobile technology has transformed the banking industry. Learn how one New Zealand bank is using digital tools, real-time data and a reimagined strategy to strengthen customer relationships.

Organizations looking to analyze big data typically have to pull it together from various systems, making data integration a fundamental component of a big data analytics platform.

A measurement error will undermine the good efforts of your data science team and exacerbate quality problems. Learn how to eliminate or reduce these errors.

Oracle propose a new OSS project within OpenJDK to focus on porting the JDK to popular mobile platforms such as iOS, Android, and Windows Mobile. Oracle plans on contributing build system, Hotspot and JDK source changes required to target mobile platforms with a version of Java SE.

The latest version of MongoDB finds the NoSQL database running on a new WiredTiger storage engine. Better performance and data compression are among MongoDB 3.0's touted benefits.

A company that runs what's been called a "dating site" for underappreciated algorithms took home the award for top startup today at the Strata + Hadoop World show in New York City. The companies Sense, Timbr, and BlueTalon also won awards in the Startup Showcase. Algorithmia was founded in 2013 with the goal of helping to save algorithms from languishing on the academic vine. "Algorithms are being developed all the time, but are not getting into the hands of people and applications that could benefit from them," the company's co-founder and CEO Diego Oppenheimer writes on the company's blog.

Teradata Corp. said this week it would accelerate the open-source rollout of Presto, the SQL-on-Hadoop framework, by offering connectivity drivers for free. Teradata (TDC) said Monday (Sept. 28) it would provide at no cost ODBC (Open Database Connectivity) and JDBC (Java Database Connectivity) drivers with the goal of making the Presto SQL query engine ready for primetime in the enterprise. The drivers provide connection and implementation protocols for transferring queries and results between applications and databases. Teradata announced in June it would make a major investment in the Presto framework originally developed at Facebook to power interactive queries against its massive data warehouse.

Replacing departmentalized information silos with a centralized data lake is a great way to make a large organization's records accessible to its users in theory, but tends to prove near impossible to implement in practice. That gap between need and availability is what Peaxy Inc. hopes to fill with the help of the $15 million investors poured into its coffers this morning. The startup is one of several that emerged over recent years to tackle the lack of viable options for sharing information among the disparate parts of a globally distributed enterprise, a challenge that is only becoming more pressing as analytics grow increasingly important to decision-making. They all offer variations of the same concept…

WANdisco (LSE: WAND) the leading provider of continuous-availability software for global enterprises to meet the challenges of Big Data, today announced a new partnership with industry-leading storage systems vendor EMC Corporation (NYSE: EMC).

by Andrie de Vries A few weeks ago I wrote about the Jupyter notebooks project and the R kernel. In the comments, I was asked how to resize the plots in a Jupyter notebook. The answer is that the…

Intel (INTC)'s Andy Grove famously wrote and spoke about the difference between ordinary change and a strategic inflection point (SIP) with significant impact to the health and survival of an organization. He defined strategic inflection points by the magnitude of impact on a business, quantifying them mathematically as a 10X change that the business has been accustomed to. Grove also noted that SIPs aren't only technology driven, but can be precipitated by new or shifting competition, new channels of distribution, social and cultural shifts, regulatory changes, and so on. In his book, Only the Paranoid Survive, Grove wrote…

Published Date: 2015-09-30 17:11:18 UTC Tags: Analytics, Predictive Analytics, Sports Analytics, Sports Performance, Sports Technology Title: Data Science At The Rugby World Cup Subtitle: How is the RWC and rugby in general using data in their sport?

Choose a solution that can deliver deep insights into data without reinventing the wheel. Standardization can help you move large quantities of data across multiple systems, allowing you to take advantage of data no matter its source.

ASP provides a software-based approach that can be used to secure hypervisors, and extended to secure a variety of emerging technologies.

The DataStax Cassandra engine is now Spark-certified. The move is one of several for the database on a possible upswing, further evidenced by a new deal with Microsoft.

In order to use Big Data effectively, a holistic approach is required. Organizations are now using data analytics at every level, and roles that previously would have had no need to concern themselves with data are now required to have some degree of understanding in order to leverage insights. Ensuring that data is presented in such a way as to be understood and utilized by all employees is, however, a challenge. Most Big Data actually yields neither meaning nor value, and the sheer volume coming into businesses can be overwhelming. Companies are therefore increasingly moving away from simple 2D Excel charts, and replacing or supplementing them with powerful data visualization tools. Sophisticated data visualization is a tool that supports analytic reasoning.

Guest blog post by Bernard Marr, first published here. The field of Big Data requires more clarity and I am a big fan of simple explanations. This is why I have attempted to provide simple explanations for some of the most important technologies and terms you will come across if you're looking at getting into big data. However, if you are completely new to the topic then you might want to start here:What the Heck is… Big Data? …and then come back to this list later. Here they some of the key terms: Algorithm: A mathematical formula or statistical process run by software to perform an analysis of data.

Talend unveiled its data integration platform, Talend 6, that simplifies moving code developed for Hadoop to Apache Spark via graphical tools.

In this article, we are going to examine new features added to iOS and OS X El Capitan main programming languages: the recently open sourced Swift, which extends pattern matching syntax, adds feature availability and protocol extension, and overhauls error handing; Objective-C, with new interoperability features as generic collections.

Guest blog post by Nilesh Jethwa A Quick chart illustrating the top Data Science keywords. The data was grabbed from this article. Most popular Data Science keywords – Viz

Interestingly, most site owners out there do not actually think about security. That is a really bad thing because of the fact that websites are constantly being attacked by hackers. Every single website owner out there needs to think about security. Right after you create a blog you have to know what has to be done. Remember that even low profile sites can be attacked thousands of times per minute. You need to be protected at all times.

MTConnect is an open, HTTP- and XML-based communications standard used to enhance interoperability and information sharing between manufacturing equipment, devices and software applications. The MTConnect Student Challenge intended to find creative uses of the standard in manufacturing and it's open to all U.S. college students. The deadline for ideation submissions has been extended to December 15, 2015. Cash prizes total $33,000.

In advance of Strata, ClearStory Data announced new enhancements to its Apache Spark-native Intelligent Data Harmonization capabilities in a data democratization move enabling business users "to access and blend disparate data and be more self-reliant in reaching deeper insights."

Effective shopper marketing means personalizing your message. Discover how one retail beauty business underwent a total makeover of how it targeted and interacted with consumers, creating a holistic consumer experience.

The journey to data driven business transformation can be confusing and challenging. At Hortonworks, we understand this, and are offering a number of tools that will help companies map out their journey to fully utilize the value of their Big Data. The journey begins with understanding the opportunities unique to your business, and understanding how the maturity of your organization enables or inhibits your ability to strategically pursue Big Data programs aligned to your business goals. Hortonworks Big Data Maturity Model: At the top of our maturity model are businesses that are transforming through Big Data.

Published Date: 2015-09-30 10:56:22 UTC Tags: Analytics, Big Data, Chief Data Officer, Data Science, Data Warehousing Title: The Evolution of Business Intelligence Subtitle: Achieving success in the third generation of BI

What are the differences between data science, data mining, machine learning, statistics, operations research, and so on? Here I compare several analytic disciplines that overlap, to explain the differences and common denominators. Sometimes differences exist for nothing else other than historical reasons. Sometimes the differences are real and subtle. I also provide typical job titles, types of analyses, and industries traditionally attached to each discipline. Underlined domains are main sub-domains. It would be great if someone can add an historical perspective to my article. Source for the picture Data Science First, let's start by describing data science, the new discipline. Job titles include data scientist, chief scientist, senior analyst, director of analytics and many more. l

Businesses around the world are involved in a multitude of projects at any given time. As Data Scientists come into the business fold, it becomes more important with each passing day to have both parties — "the business" and "the Data Scientist" — begin to define successful strategies of working together.

TDWI's Fern Halper shares 10 ways to graduate from reports and dashboards and advance to big data visualization and predictive analytics.

Predictive modeling shouldn't be a solo activity. Businesses looking to get the most out of their data scientists should ensure they're working collaboratively to build analytical models.

No data science experience is required to generate big data reports, thanks to this BI middleware that sits atop Hadoop.

Data is becoming more and more important in today's business environment. It can be related and linked to products, sales, customers, significant events and many other categories to provide a rich, objective view of a business' performance. Data however is often maintained in individual silos and databases yielding separate and inconsistent versions of the truth. Most companies do not even realise their data is unnecessarily replicated and duplicated.  Revealing this truth however and identifying these relationships has traditionally been costly, error prone and time consuming.   Graph databases are trying to solve this problem.

For all the hype surrounding Hadoop, the primary way that organizations of all sizes continue to interact with data is SQL.

MapR has announced that its distribution of Hadoop can natively support JSON via MapR-DB, which is the SQL engine that MapR runs on top of Hadoop.

Ryft today unveiled Ryft ONE, a single, open 1U platform at Strata. The company said it can slash operational costs by replacing hundreds of high-end servers and analyzing both historical and streaming data "including video and image collection, at speeds 100x faster than the conventional big data infrastructure."

Every year, common themes emerge at Strata. This year those themes are self-service/data democratization, fast data, streaming architectures and analytics at the edge. In a nutshell, big data has evolved into something far more powerful.

MapR announced at Strata today the addition of native JSON support to MapR-DB, the top-ranked NoSQL database. It's being billed as the industry's first in-Hadoop document database and it's built to leverage continuous analytics on real-time data.

Riverbed CTO Hansang Bae says Project Tiger will become the foundation through which storage and security services are provided in the branch office.

MongoDB announced its free MongoDB University app for iOS, which streams course videos and complete quizzes while helping students prepare for MongoDB certification exams. The app can be used both on and offline.

For years now, many of us have touted the power of analytics beyond mere decision-making. Analytics have the power to birth innovation and disruption on a massive scale – and in many cases, have done so already. It is with interest, then, that I note Adobe Analytics's release of a "creative canvas" for data analysis.

In a new twist on consumer returns for their data and support, a coupon and cash back website/app called iConsumer plans to reward customer loyalty by issuing stock to them. iConsumer filed with the SEC in September, and once SEC-qualified, the company will begin giving customers freely tradeable stock in this new kind of public company. "Congress made crowdfunding a startup possible. The SEC's rules made it practical. We turned it inside out by making every customer a shareholder," said founder Robert Grosshandler, who is also the creator of the e-philanthropy site iGive.

Bromium unveiled Packer Attack, an open source tool that helps security researchers see what's happening inside encrypted and encoded malware.

This entry was posted in News and tagged , , , , , , , , , , . Bookmark the permalink.