Big Data News – 29 Sep 2015

Top Stories
At the Strata + Hadoop World show in New York City today, Hadoop distribution specialist MapR Technologies announced the addition of native JSON support to its MapR-DB NoSQL database, packaged with its MapR Distribution including Hadoop. MapR Chief Marketing Officer Jack Norris says MapR-DB becomes the first in-Hadoop document database, allowing developers to quickly deliver scalable applications that also leverage continuous analytics on real-time data.

Big Data's autumn confab kicks off today and there's already a bunch of news.

Elastic Integration Platform update aims to trim processing overhead in big data systems making greater use of Hadoop.

Storing data in Hadoop generally means a choice between HDFS and Apache HBase. The former is great for high-speed writes and scans; the latter is ideal for random-access queries — but you can't get both behaviors at once. Hadoop vendor Cloudera is preparing its own Apache-licensed Hadoop storage engine: Kudu is said to combine the best of both HDFS and HBase in a single package and could make Hadoop into a general-purpose data store with uses far beyond analytics.

An expansion of Microsoft's Azure Data Lake will include a new analytics service, and it will be ready for public preview by the end of the year.

Alpine Data announced Alpine Touchpoints, a completely new way to make Advanced Analytics accessible across an Enterprise.

The pace at which the world creates data will never be this slow again. And much of this new data we're creating is unstructured, textual data. Emails. Word documents. News articles. Blogs. Reviews. Research reports… Understanding what's in this text — and what isn't, and what matters — is critical to an organization's ability to understand the environments in which it operates. Its competitors. Its customers. Its weaknesses and its opportunities.

Just ahead of the opening of Strata + Hadoop World in New York City tomorrow, Cloudera today unveiled a new open source project to enable real-time analytic applications in Hadoop and an open source security layer for unified access control enforcement in Hadoop. The first project, Kudu, is an in-memory store for Hadoop that supports high-performance sequential and random reads and writes, enabling fast analytics on changing data.

The Internet of Things (IoT) has the promise to make everything more intelligent and efficient. Smart grids, smart meters, smart refrigerators and smart cars are just some examples that get mentioned in just about every article that gets written about IoT. But while compelling applications and innovations can come from the IoT, CIOs continue to have two legitimate major areas of concern when thinking about how the mechanics of IoT will affect their organizations: storage and security. Handling the sheer quantity of data It's a well-known fact that it's difficult for the human brain to accurately understand really, really large numbers.

InfoWorld's annual Best of Open Source Software Awards, affectionately known as the Bossies, presents an opportunity to step back and look at the big picture — one that this year included over 90 winners in six categories. All of them were handpicked by InfoWorld writers and contributors who gave their thumbs-up based on real-world IT and/or programming experience: Bossie Awards 2015: The best open source applications Bossie Awards 2015: The best open source application development tools Bossie Awards 2015: The best open source big data tools Bossie Awards 2015: The best open source data center and cloud software Bossie Awards 2015: The best open source desktop and mobile software Bossie Awards 2015: The best open source networking and security software

Cities are crowded, and in the years ahead, the transportation industry will have to adapt as more markets have less room for private cars.

Continuum Analytics' free Anaconda software has long been known for its Python-powered analytics capabilities, but on Monday the company unveiled a new offering designed specifically for enterprises. Launched at Strata + Hadoop World in New York, the enterprise version of Anaconda is designed to enable corporate data science teams and in-house developers to explore and solve complex data problems in a packaged, easy-to-deploy, open data science stack. The software offers easy GPU and multicore integration, Continuum said, along with scalable browser-based visualization via data shading. Also included is a framework that insulates the organization from back-end changes and allows users to write data and analytic queries once and then deploy them anywhere.

At the Strata + Hadoop World show in New York City today, Hadoop distribution specialist MapR Technologies announced the addition of native JSON support to its MapR-DB NoSQL database, packaged with its MapR Distribution including Hadoop. MapR Chief Marketing Officer Jack Norris says MapR-DB becomes the first in-Hadoop document database, allowing developers to quickly deliver scalable applications that also leverage continuous analytics on real-time data. [Related: MapR 5.0 Hadoop supports real-time applications ]

Adding JSON to MapR-DB bolsters the trend of moving Hadoop from ETL to real-time analytics. This means vendors are placing their products and services one step ahead of what their enterprise customers.

Being successful at big data analytics today typically requires a collection of business, computer, and statistical skills. It's rare to find somebody who possesses all three–that's why data scientists are called sometimes called unicorns. Now a French company named Dataiku says it's democratizing analytics by helping individuals who possess pieces of the skill puzzle to come together and work collaboratively. Dataiku was founded 2.5 years ago by a group of four data enthusiasts who saw a need for better tooling. Where most products on the market focus on a particular piece of the big data solutions, the company saw its flagship product, called Data Science Studio (DSS), as an integrated development environment (IDE) for building big data services.

Microsoft had its sights set squarely on big data when it introduced its Azure Data Lake earlier this year, and on Monday it broadened that effort with new tools designed to make big data processing and analytics simpler and more accessible. First, what Microsoft originally called Azure Data Lake has now been renamed Azure Data Lake Store, offering a single repository for data of any size and type — including unstructured, semi-structured and structured — without requiring application changes as data scales. Data can be securely shared there and made accessible for processing and analytics. It can be acquired in real-time from sensors and devices for Internet of Things (IoT) applications, for example, or from online shopping websites, all without restrictions on account or file size.

Hadoop and NoSQL databases share some similarities, but the platforms typically live within different levels of the big data spectrum. Now MapR Technologies is working to break those barriers down by adding a major piece of NoSQL functionality to its Hadoop distribution. At the Strata + Hadoop World conference today, MapR Technologies announced that it's now supporting the storage of JSON documents in MapR-DB, the enterprise NoSQL data store that ships with its Hadoop distribution. As a spruced up version of the HBase key-value store, the MapR-DB has always provided some NoSQL functionality, particularly for serving large amounts of randomly accessed data, which is not something HDFS is particularly good at doing.

Data Scientists and business innovators of all types often want to test an idea with real data.  And often they wait on DBAs who must wait on developers to write the necessary code for data integration.  When the data set is finally ready, it often remains out of sync thanks to duplicative update processes. Today Big Data integration gets a little easier for all the non-coders. 

We all know that Data Lakes run the risk of becoming swamps. Without balanced processes for managing and tracking data, enterprises can fail to extract any real value from Hadoop. In fact, things can get downright muddy. With this challenge in mind, we are particularly pleased today to announce Attunity Visibility 7.0 software, which extends data usage analytics and monitoring to the data lake so that you can keep your data ecosystem clear and healthy.

Published Date: 2015-09-29 11:38:50 UTC Tags: Analytics, Big Data, Chief Data Officer, Data Science, Visualization Title: Communicating Analytics Insight With Data Visualization Subtitle: Data viz is allowing people to understand analytics

Here's a retail passion killer if ever I heard one: 'On average, it takes 150 taps to buy something on your mobile phone.' 1 Not only that, but: '50% of potential customers abandon the mobile checkout process because they find it's too difficult'1, and: 'On Black Friday, mobile accounted for 60% of online traffic but only 24% of sales.'2 Clearly, no matter how intense the urge to own those outrageous Jimmy Choos, this egregious wallpaper, or that ostentatious timepiece, if the purchase process is too long and complex your prospective customers are going to love you and leave you. They'll cut and run to one of your leaner, fitter, more accommodating rivals without a backward glance. Mobile commerce could be worth $1 trillion by 2017 3 It's not just about securing the online basket, though. Mobile phone payment may be a massive retail opportunity and a force for good, but there are other meaty problems we need to get to grips with. First, businesses are becoming harder to pin down.

Here's a retail passion killer if ever I heard one: 'On average, it takes 150 taps to buy something on your mobile phone.' 1 Not only that, but: '50% of potential customers abandon the mobile checkout process because they find it's too difficult'1, and: 'On Black Friday, mobile accounted for 60% of online traffic but only 24% of sales.'2 Clearly, no matter how intense the urge to own those outrageous Jimmy Choos, this egregious wallpaper, or that ostentatious timepiece, if the purchase process is too long and complex your prospective customers are going to love you and leave you. They'll cut and run to one of your leaner, fitter, more accommodating rivals without a backward glance.

Even though mobile apps are nothing special anymore, there is still quite some movement in this area. In this article, Katie Stanfield highlights some of the trends we might encounter in the near future if we think about developing mobile applications: App developers and companies will have to keep in mind topics like Big Data and app analytics, Internet of Things or enterprise app stores.

According to a study conducted by Oracle expert and blogger David Njoku, the top forty most commonly searched Oracle errors on Google have generated over half a million hits in the course of one month. There is not much of a pattern to the list; some of the errors are there because they occur frequently, but are fairly simple to manage, while others are there because of the level of difficulty involved with resolution.

Technology leaders are moving towards Continuous Delivery and Agile to accelerate business innovation.This post talks about the importance of cultural shift in successful Continuous Delivery implementation in the organizations.

Hortonworks, Inc. (NASDAQ: HDP), the leader in Open Enterprise Hadoop, today announced that the Hortonworks DataFlow (HDF™) support subscription is now available.

Interana, the behavioral analytics solution for event data at scale, today announced that Tinder has successfully used the company to better understand how its tens of millions of users interact with the popular social app, and to make adjustments to improve the quality of matches for users.

If you've developed a useful function in R (say, a function to make a forecast or prediction from a statistical model), you may want to call that function from an application other than R. For…

Career choices are hard to make. How do you make the call to step out of your comfort zone, move on from a team you have built, disrupt your family life… how do you know when it's the right opportunity? Find a company that is customer-first focused (this is especially important to me as I lead support organizations), has an incredible technology, and has a tremendous market opportunity. Hortonworks made my decision easy. Support alignment: At Hortonworks, Support is aligned under Development. Typically a company's maturity is determined by the alignment of support to the go-to-market team.

A visualization tool and predictive analytics package targeting law enforcement and public safety applications is said to be the first tool of its kind to combine real-time social media and Internet data feeds to help predict and reduce crime. Hitachi Data Systems said Monday (Sept. 28) its visualization tool along with its "predictive crime analytics" supports the company's smart city efforts. That initiative seeks to advance municipal public safety efforts through the application of predictive and other advanced analytics along with better access to video data. The Hitachi package also includes a video management platform. Hitachi's visualization suite is a hybrid cloud-based platform that integrates disparate data and video assets from public safety systems such as 911 computer-aided dispatch, license plate readers and gun shot sensors. These data are integrated in real time and then presented geospatially.

An expansion of Microsoft's Azure Data Lake will include a new analytics service, and it will be ready for public preview by the end of the year.

Vaughn Vernon in his new book Reactive Messaging Patterns with the Actor Model shows how this model can simplify enterprise software development. After an introduction to the basics of the actor model and tutorials on Scala and Akka the rest of the book is a patterns catalogue describing most of the patterns in the book Enterprise Integration Patterns from an actor model perspective.

This article is the fifth in an editorial series with a goal of directing line of business leaders in conjunction with enterprise technologists with a focus on opportunities for retailers and how Dell can help them get started. The guide also will serve as a resource for retailers that are farther along the big data path and have more advanced technology requirements.

This is the final blog in my three part series, and addresses convincing the various lines of business and departments that the needs of the 'We' outweigh the needs of 'Me'. Example One A Consumer Packaged Goods company was trying to determine revenue, market share, and cost for some of their global brands. However, the…

At the inaugural HashiConf conference, held in Portland, USA, HashiCorp announced the release of a new distributed scheduler platform named 'Nomad' that is capable of scheduling containers, VMs and standalone applications; and a new application delivery tool named 'Otto' that builds upon the existing Vagrant tool by enabling the management of remote application deployments.

Accenture (NYSE: ACN) is launching the Accenture Connected Analytics Experience, an immersive and collaborative analytics capability that makes data more accessible and engaging, helping insight-driven businesses make faster, more informed decisions.

Chalk "predict death" off the list of "what computers can't do." A supercomputer at Beth Israel Deaconess Medical Center in Boston, Mass., can decipher the likelihood of a patient's imminent demise with uncanny accuracy. Patients at the hospital are linked up to the machine, which leverages all available patient data — doctors visits, lab results, medications and vital signs — to generate a rapid diagnostic assessment. This alone would be a tremendous tool for helping doctors provide better treatment, but the real magic comes from applying machine learning jujitsu to an ever-growing body of patient data.

Our guest blog today is from Don Brown, COO and Founder of Rocana, Hortonworks Technology Partner, talks about our partnership, mainstream Hadoop adoption and the importance of global IT Operations management. Our partnership with Hortonworks is another exciting step on the path to mainstream adoption of Hadoop as the critical platform for modern, global-scale IT Operations management. Hortonworks' emphasis on a platform that scales with the demands of big data applications is a great fit for the IT Operations market and for customers looking for more reliable, extensible, analytics, and limitless solutions.

Although systems engineering is the method of choice for organizations that want to manage the development of smart products, ever-increasing consumer demands require a new way to deal with this complexity. Continuous engineering enterprise capabilities are now available to help companies speed delivery of increasingly sophisticated and connected products.

With cybersecurity month around the corner, cyber threats and privacy take center stage in public sector news for the week of 21 September 2015.

By: Clarke Patterson, senior director of product marketing, Cloudera Early this summer, Teradata and Cloudera jointly announced the Teradata Appliance for Hadoop with Cloudera, an engineered, ready-to-run appliance that comes with enterprise-ready Cloudera Enterprise, in addition to our existing software integrations. Today, at Strata + Hadoop World at New York, we are excited to announce the ability for customers to now order the Teradata Appliance for Hadoop with Cloudera. Over the last couple years, we have certainly seen the maturation of Hadoop and the shift from using Hadoop as a proof-of concept technology to an enterprise-ready platform. However, the time, skillsets, and resources needed is hard to come by, and not every organization has the ability to hire the best talents in the market to plan, deploy, and manage Hadoop clusters, let alone support and maintain the platform post-production.

Social media analytics can help travel companies provide better service and forge deeper relationships with customers.

When it comes to big data and analytics, you can expect the unexpected. A wide range of companies is applying insights in ways that may surprise you. Here are four examples featuring an unusual mix of companies where the only common denominator is their success with analytics.

In a world that creates 2.5 quintillion bytes of data every year, it is extremely cheap to collect, store and curate all the data you will ever care about. Data is de facto becoming the largest untapped asset. So how can organizations take advantage of unprecedented amounts of data? The answer is new innovations; and new applications. We are clearly entering a new era of modern data application I would like to take the opportunity to share my Hadoop journey in the past 10 years, and discuss where I see the Hadoop technology going in the next decade. To celebrate ten years of Hadoop with me, please go to http://hortonworks.com/10yearsofhadoop/ for more information. The post Where is Hadoop and YARN technology going in the next 10 years? appeared first on Hortonworks.

The recent proposal to add non-nullable references to C# by Microsoft's Mads Togersen sparked quite a debate in the .NET community. The reactions were diverse, ranging from praise to preferring status quo.

From the time we were little, sharing has been an important concept. And even though sharing can be a double-edged sword in the business world, open source initiatives serve to prove its value. When considering open and unified analytics platforms, make sure they embrace three elements: inclusion, community and contribution.

Just ahead of the opening of Strata + Hadoop World in New York City tomorrow, Cloudera today unveiled a new open source project to enable real-time analytic applications in Hadoop and an open source security layer for unified access control enforcement in Hadoop. The first project, Kudu, is an in-memory store for Hadoop that supports high-performance sequential and random reads and writes, enabling fast analytics on changing data.

Today's banking environment and ever-changing digital technologies require financial institutions to embrace banking customer analytics with a new view: understanding customer behavior in a digital world has immense value.

As many organizations are rapidly learning, predictive analytics delivered on cloud-based platforms has the potential to transform the way organizations conduct business. To take advantage of new possibilities, leveraging enterprise-scale, cloud-based predictive analytics for new opportunities requires the right mix of analytics solutions and cloud computing technologies to fit the needs of organizations.

At the Strange Loop 2015 conference, Pam Selle introduced streams in JavaScript, showing what they're good for and how developers can use them.

Our mobile devices continue to enhance their utility and conveniences through always-connected personal computing contexts. But what if the applications we rely on for communications, gathering information and scheduling our daily lives could do a lot more by providing information we can immediately act on? Discover how cognitive computing capabilities and operational analytics in mobile applications are taking mobile applications to the next level.

COBOL 5 on zOS promises to offer significant performance improvements over previous versions. Unfortunately, migrating shops may run into a number of issues, due in part to the changes in version 5,…

Today Microsoft has announced the Generally Availability of Azure HDInsight, with Apache Hadoop 2.6, available on Ubuntu Linux clusters. Azure HDInsight is a Hadoop managed service in the cloud and uses the Hortonworks Data Platform (HDP). This release is a direct result of the commitment that Microsoft has to Open Source. Microsoft has worked along with Hortonworks® in the community to contribute towards Apache Hadoop and related projects, including Apache Ambari. HDInsight on Linux, built with the core HDP platform utilizes Apache Ambari to provide seamless and comprehensive deployment, management and monitoring of Apache Hadoop platform components in Azure.




Every student in every school should have the opportunity to learn computer science. Code.org is a non-profit dedicated to expanding access to computer science, and increasing participation by women and underrepresented students of color in this field. They believe computer science should be part of the core curriculum, alongside other courses such as biology, chemistry, or algebra. We at Quantopian believe in Code.org's vision to bring computer science to every student. To help them achieve this goal, we have decided to donate all revenue generated by our live stream ticket sales for QuantCon 2016 to them. QuantCon 2016 will feature a stellar lineup including: Dr. Emanuel Derman, Dr. Marcos López de Prado, Dr. Ernie Chan, and more. It will be a full day of expert speakers and in-depth tutorials.

In this special guest feature, Mark Marinelli of Lavastorm Analytics poses a thesis that when many people think of 'big data' they imagine beautiful visual dashboards in Tableau or Qlik, but the big data world is much bigger than visualization technology.

Microsoft had its sights set squarely on big data when it introduced its Azure Data Lake earlier this year, and on Monday it broadened that effort with new tools designed to make big data processing and analytics simpler and more accessible. First, what Microsoft originally called Azure Data Lake has now been renamed Azure Data Lake Store, offering a single repository for data of any size and type — including unstructured, semi-structured and structured — without requiring application changes as data scales. Data can be securely shared there and made accessible for processing and analytics. It can be acquired in real-time from sensors and devices for Internet of Things (IoT) applications, for example, or from online shopping websites, all without restrictions on account or file size.

When Azure Data Lake was first revealed, it was plain Microsoft was set on making Azure a welcoming environment for enterprise big data applications. Today the service gets a new name, Azure Data Lake Store, and is being outfitted with the kinds of analytical tools that Hadoop users have come to expect — and a few new ones of Microsoft's own invention. The analytics system for Data Lake Store — named appropriately enough Apache Data Lake Analytics — stores data in the same HDFS format as Hadoop itself, but also allows data to be pulled in from other Azure sources, such as Azure SQL Database and Azure SQL Data Warehouse.

Black swan events such as natural disasters or global recessions are nearly impossible to predict until it's too late. But now, big data and advanced analytics can help businesses detect subtle early warning signs so they can take evasive actions and be in the best position to ride out a crisis. Disrupter analytics, for example, measures an organization's capacity to handle improbable scenarios. This type of analysis can be used to periodically stress test a company to gauge its level of readiness. "[Preparation] is really all about imagining the unimaginable, understanding what's going to blow up … .

Companies utilizing the latest retail trends, including NFC technology, wearables and geolocation, are set to collect the data points necessary to provide the customer service today's customers expect.

Cloud financial management enables companies to share data throughout the organization for better collaboration, easier application upgrades and better compliance.

Cloudera is filling the gaps in its Hadoop portfolio with two new products. RecordService provides security management across multiple Hadoop data access apps, while Kudu combines fast analytics and data updates, slims workloads.

Hadoop has been evolving from its batch-oriented roots into a more real-time system for some time. That evolution gained momentum today with Cloudera's announcement of Kudu, a new  in-memory data store for Hadoop designed to support real-time analytics on fast changing data. Historically, Hadoop users have had to make compromises with the data store when developing real-time analytic applications. They could power fast sequential scans with HDFS or power fast random access with HBase, which is essentially a NoSQL database built into Hadoop. But if they wanted to do both-which is often the cast with real-time analytics on data from the Internet of Things (IoT)-then it required developing convoluted architectures.

Search engines are now an integral part of people's everyday lives. We are used to having access to information at the click of a button. However we rarely think how much work goes into this ability to search for information. Search engine software has become extremely advanced in recent years, now using complex algorithms to provide the most relevant information with predictive search and search suggestion capabilities.

Today, I'm excited to share that we have released the GA version of Hortonworks DataFlow (HDF), a new offering that directly addresses the unique big data needs of the Internet of Anything (IoAT). Hortonworks DataFlow is powered by Apache Nifi a top-level open source project made available through the NSA Technology Transfer Program. By making this technology a commercial offering, we now provide our customers the ability to connect, collect and curate data from a broad spectrum of connected yet disparate data sources – sensors, machines, geo-location devices, social feeds, connected cars, web clicks, server logs and more.

Creating digital products is different from building traditional telco products: the uncertainty is much higher, the way of creating value for the customer is totally different and lifecycle is much faster says Susana Jurado Apruzzese. Telefónica adapted Lean Startup to their processes, culture and organization to make it work.

Sure, Apache Spark looks cool, but does it live up to the hype? Is there anything you can actually do with it? Actually, there are some pretty cool use cases going on right now.

Pope Francis is most transformative of millions by touching individuals in their ordinary lives, so he was at his most […]

This is part of a three-post series on Kudu, a new data storage system from Cloudera. Part 1 is an overview of Kudu technology. Part 2 is a lengthy dive into how Kudu writes and reads data. Part 3…

As airlines and frequent flyer programs gather more intelligence on your day to day lifestyle, flying and financial position – they begin to build a data profile on your interests, goals, psychometric assessment, your motivations to engage with a brand at any given every point throughout the day, what has driven you to purchase in the past — and most importantly — where your thresholds are.

In this article, the authors discuss modularity and projectional editing concepts used to design programming languages, using a language workbench (LWB) like Jetbrains' MPS. They discuss how they used these techniques in three different domains: embedded-software development, requirements engineering, and insurance rules.

Paul Moreno shows how to federate AWS IAM permissions, roles, and users with a directory service such as LDAP or Active Directory with an Identity Provider. Using the open-source IdP software Shibboleth, he describes how this uses the AWS Security Token Service to reduce the need for long lived credentials for both the Web Console and CLI.

Viktor Klang explores fast data streaming using Akka Streams – how to design robust transformation pipelines with built-in flow control able to take advantage of multicore and go over networks. He discusses the traditional pitfalls and how they can be overcome, as well as how we can build reusable pieces of pipeline logic that can reuse a smorgasbord of transformations.

Jon Moore goes over some strategies for surviving in a jungle of partial failures. Each survival tip is explained through a concrete example, or "adventure story", from Comcast's TV experience.

Our friends over at the Northwestern School of Professional Studies developed an insightful new infographic (see below) that covers how big data analytics is used in sports stadiums.

Holistic big data strategies offer multidimensional benefits for companies, including potentially changing how employees and processes work.

Ryan Trinkle explores functional techniques for managing complexity, examines what makes them successful in pure functional programming, and proposes ways that they can be applied in any programming context.

Ransom Richardson presents the Talko service architecture, its implementation and operation in the cloud, why they are using Erlang for it and key things learned along the way.

Lyndsey Padget introduces the basic principles of RESTful APIs, terminology, design patterns, data, pitfalls, best practices, and others.

Spark is changing the face of big data for everyone. Matt Asay explains.

It's exciting collaborating with a company that established one of the most well known industry standards in software. Microsoft Excel has transformed the way people work with basic number crunching…

NEW YORK — If there is one thing Bill James wishes more people understood about sabermetrics — the method of empirical analysis of baseball that he had a prominent role in pioneering — it's that the data is not the point. The point is to use the data like a razor to cut through false convictions to find the truth. "The reason that understanding is so difficult to build in baseball is that there's an entire industry of people selling nonsensical ideas about the data all the time," James says. Whenever you find something that you do not know, that you could know, that's gold.

This entry was posted in News and tagged , , , , , , , , , . Bookmark the permalink.