Big Data News – 07 Sep 2016

Featured Article
Big data isn’t really about the data – it’s about the questions we ask, and how we ask them. The era of big data is evolving rapidly. Today it’s moving beyond the complexities of data analytics, and, like all things tech, it’s moving from the domain of a back-end priesthood of specialists to the front line of business users.

Top Stories
OpenStack is an interesting initiative — it has, perhaps more than any other open source initiative, polarized commentators. On one end are those who say that OpenStack is a "dead duck," that it will see no success and that it is hampered by too many conflicting commercial and governance drivers. On the other end of the continuum are those who suggest that OpenStack is, most likely, the best thing since sliced bread. OpenStack will, it would seem listening to these folks, deliver world peace and a cure for cancer. Of course the truth is somewhere in between these two extremes and OpenStack will be an important part of the technology landscape, alongside a host of other products and projects.

Workforce analytics provides business owners with a complete picture of their existing human resources, along with the critical insights necessary to make future decisions that drive business success. Let's find out some of the key benefits that workforce analytics brings to the table for businesses of all sizes.

In organizations that operate without a data warehouse or separate analytical database for reporting, the only source of the latest and up-to-date data may be in the live production database. When querying a production database, optimization is key. An inefficient query may pose a burden on the production database's resources, and cause slow performance or loss of service for other users if the query contains errors.

SYS-CON Events announced today that Numerex Corp, a leading provider of managed enterprise solutions enabling the Internet of Things (IoT), will exhibit at the 19th International Cloud Expo at Things Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Numerex Corp. (NASDAQ:NMRX) is a leading provider of managed enterprise solutions enabling the Internet of Things (IoT).

While at Hadoop Summit 2016, I had the opportunity to catch up with Justin Kestelyn, Technical Evangelism & Developer Relations at Cloudera, to discuss all the progress his company has made in the past year and what's in store for the future.

Identity Automation has moved to acquire 2FA, a provider of multi-factor authentication and single sign-on software.

Byte Night is Action for Children's biggest annual fundraiser; a national 'sleep-out' event. It started in 1998, when 30 friends slept out in London and raised ?35,000. Since then Byte Night has raised over ?7.3 million to tackle the root causes of youth homelessness. Byte Night is now the UK's largest sleep out event, with individuals and teams from the technology and business services sleeping out to raise vital funds to prevent youth homelessness. 

Don't wait for a failure and then fix it. Predict the failure and then prevent it. This was the key message I took away from the keynote address given by Dr. Norbert Gaus from Siemens, at the Teradata Universe conference, Hamburg. He was talking about the company's Sinalytics program, which they use to provide analytics on their 'Web of Systems'. Dr Gaus illustrated their philosophy with a great story about Renfe, the Spanish railway company. Renfe operate Siemens-built Valero E high-speed trains, which operate with as high as 99.9 percent punctuality.

Oracle will acquire LogFire, a provider of cloud-based warehouse management applications, with the aim of boosting the features of its supply chain management cloud offering. The Redwood Shores, California, software and cloud giant expects that the addition of the LogFire applications will complement the logistics functionality of its Oracle Supply Chain Management (SCM) Cloud by adding warehouse management capabilities. The financial terms of the proposed acquisition of the Atlanta, Georgia, firm were not disclosed.

In this post I will sometimes use a term "variable" for "feature"("predictor"") or"outcome"("predicted value""). The question of variable dependencies for a particular data is quite important, because it can help to reduce an amount of predictors used for a model. Or it can tell us what feature is not helpful for a model construction, although it still can be used for engineering of another predictor. For example sometimes it is better to compute speed than to use distance values.

Data industry groups are cautioning the government to take a middle course on a proposal to collect social media data about foreign travelers entering the U.S. under a visa waiver program. At issue is whether visitors from certain countries should be required to supply social media "identifiers" under the visa waiver program overseen by the U.S. Customs and Border Protection. The U.S. Department of Homeland Security (DHS) is considering whether and how to collect data requested for entering the U.S. without a visa for a stay of less than 90 days.

Huawei bands together with Alluxio to announce the release of their big data storage acceleration solution, integrating Huawei's FusionStorage with Alluxio's memory-speed virtual distributed storage system.

We live in a world awash with data. From sensor data to website data, to fitness data, nearly every aspect of our lives is quantified. And digging into the numbers helps us better understand ourselves, our neighbors, and our world. For organizations, this can yield a huge competitive advantage–if they can see and understand their data. And that's why many are adopting a culture of self-service analytics. In this culture, data plays a central role in every major conversation.

August was a slow month for tech news, but Microsoft continued to update its Azure cloud platform with a variety of new features, including a new type of instance for high-performance computing. Here's the breakdown of all the features you need to know about: A new instance type powered by Nvidia Tesla GPUs Microsoft announced the private beta of a set of new compute instance types to power applications that need a lot of parallel processing. The new N-series virtual machines are powered by Nvidia's Tesla GPUs and built for high-performance computing.

Microsoft recently announced a new Impala Connector for the Power BI Desktop (currently a preview, with GA expected early in 2017). Cloudera is also working with Microsoft's Power BI Engineering team to certify it against Impala to ensure it meets critical enterprise requirements such as security. The following Microsoft post about the new connector, by Power BI senior program manager Miguel Llopis, is re-published below for your convenience.

The dog days of summer are over, and it's time to get back to work. While you were out enjoying the sunshine and sipping margaritas, Amazon continued to update its cloud platform with new services like Kinesis Analytics, which lets users query streaming data with SQL. 

A number of trends point to the transition of the smartphone business from North America and Western Europe.

Big data has been a big buzzword for more than a few years already, and it's got some solid numbers to back that up, including $46 billion in 2016 revenues for vendors of related products and services. But the big data era is still just beginning to dawn, with the real growth yet to come. So suggests a new report from SNS Research, which predicts that by the end of 2020, companies will spend more than $72 billion on big data hardware, software, and professional services. While revenue is currently dominated by hardware sales and professional services, that promises to change: By the end of 2020, software revenue will exceed hardware investments by more than $7 billion, the researcher predicts.

In the future, we'll be surrounded with smart devices that anticipate our wants and needs by capturing and interpreting various streams of data in real time. This vision of so-called "insight streams" flowing across the Internet of Things will be built on emerging cognitive and machine learning technologies, and will prove to be a major disrupter to business plans. Getting on the right side of this coming wave of technological disruption will take a lot of hard work, good timing, and luck.

Within two years, a majority of enterprises expect to be running their workloads in the cloud. After getting past considerable concerns about privacy and security, companies are increasingly placing their faith — and their information and services — in the cloud. The level of enterprise workloads in the cloud is expected to go from 41% today to 60% by mid-2018, according to technology research firm 451 Research, which surveyed more than 1,200 IT professionals worldwide in May and June.

We've all seen the marketing hype surrounding the data lake. Data lakes are much like Michael Corleone at the end of The Godfather. Data lakes will answer all your questions and solve all your problems. However, as with Michael's pronouncement(s) at the end of The Godfather, there is a downside to this "offer" that marketers may think we cannot refuse. There is usually a set of stakeholders out there who are unfamiliar with Hadoop or the concept of a data lake or perhaps just not interested in changing the status quo of their organizations. As a data architecture, you are pitching a data lake like you do one of those mountain lakes on travel websites or George Clooney movies …

While everyone is anxious to move forward into flexible virtual and cloud environments, most critical business apps are rooted on legacy systems.

China's Qunar added a virtual distributed file system called Alluxio, which made HDFS go from slothful to speedy.

Analytics projects routinely fail, but often it's not the technology at fault, a new Harvard Business School study revealed.

I just left a sold-out Melbourne Hadoop Summit 2016 in Australia. This was the first Summit in Asia Pacific and I was excited by tremendous response from the global and local community, and from regional organizations and businesses.  The buzz was everywhere. We're proud to be the host and the organizer.   We couldn't pull…

Here at Silicon Valley Data Science, we have a slight obsession with the Caltrain. Our interest stems from the fact that half of our employees rely on the Caltrain to get to work each day. We also want to give back to the community, and we love when we can do that with data. In addition to helping clients build robust data systems or use data to solve business challenges, we like to work on R&D projects to explore technologies and experiment with new algorithms, hypotheses, and ideas. We previously analyzed delays using Caltrain's real-time API to improve arrival predictions, and we have modeled the sounds of passing trains to tell them apart. In this post we'll start looking at the nuts and bolts of making our Caltrain work possible.

by Lixun Zhang, Data Scientist at Microsoft For financial institutions currently using SAS, R is an alternative statistical software that is free and has been widedly used in academia and industry. Technical support for R is included with Microsoft R Server, which in addition to being 100% compatible with R, also has improved speed and the capability to work with large datasets. As corporations switch from SAS to R, they might need to rewrite some of their legacy SAS programs in the new language.

The SAP/Apple announcement took all the headlines, but SAP SAPPHIRE NOW featured many other announcements affecting everything from SAP hosting to data visualization. For SAP, 2016 will go down as the year of the partnership. The SAP/Apple announcement took all the headlines, but SAP SAPPHIRE NOW featured many other announcements affecting everything from SAP hosting to data visualization. While it's understandable that those flashy announcements will get the most attention, the implications of the new partnerships for big data might prove to be the most important revelation.

Box has unveiled, in collaboration with IBM, a Box Relay extension to the Box file synchronization and sharing service.

Here are six key definitions–and The Five Effects of Prediction–from my book, Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die (Revised and Updated, 2016). Note: A complementary copy of this book will be provided for all attendees at two Predictive Analytics World events coming to New York in October: PAW…

By: Jeff Deal, Program Chair, Predictive Analytics World Healthcare In anticipation of his upcoming keynote co-presentation at Predictive Analytics World for Healthcare New York, October 23-27, 2016, we asked Ken Yale, JD, DDS, Vice President of Clinical Solutions at ActiveHealth Management, a few questions about incorporating predictive analytics into healthcare. Catch a glimpse of his presentation, Predictive Analytics, Genomics, and Precision Medicine

Shippable is making it simpler for the IT administrator to automate the management of the application release cycle using a declarative language.

It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week. Top Articles from HCC An introduction to Ambari Views 2.4 new feature- Remote cluster configuration by:abilgi This article discusses this new feature. Ambari Views Server is the Standalone Ambari Server…

This April, Hortonworks launched a multi-phase initiative to streamline Apache Hadoop operations, and the 1.3 release of SmartSense marks the delivery of the second phase of that initiative, and that is to provide Consolidated Cluster Activity Reporting. Hortonworks launched SmartSense in 2015 to help customers quickly collect cluster configuration, metrics, and logs to proactively detect…

Companies that want to try simplifying the tangled mess of their internal workflows will be able to use a new tool from Box to help. Box Relay is a new product the enterprise storage company announced on Tuesday that's aimed at giving employees a way to manage and track the process of doing  repetitive work, like submitting expense reports and getting agreements approved.

SYS-CON Events announced today that Pulzze Systems will exhibit at the 19th International Cloud Expo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. Pulzze Systems, Inc. provides infrastructure products for the Internet of Things to enable any connected device and system to carry out matched operations without programming. For more information, visit

Speaking recently at the IIA GRC conference, I began by asking the audience to raise their hands to indicate if they or their departments had provided opinions on: Internal control effectiveness Risk management effectiveness, Compliance effectiveness or Loss management practices With  very few exceptions, internal controls were the sole focus. I began my presentation by…

Geospatial big data can include information from an assortment of sensors and data collection methods. Points and features with their associated attributes can be gathered using handheld or survey-grade GNSS, dedicated field computers or even smartphones. These data sets are small compared to other techniques, but they provide very high levels of precision and detail and can be updated rapidly. Mobile mapping systems combine lidar, imaging, GNSS and other sensors to capture large quantities of 3D information.

Unlock the potential in your data to create meaningful client connections with IBM Client Insight for Wealth Management with Watson. Analyze data from many sources with sophisticated, prebuilt industry-designed analytical models, personalize offers to match the changing needs of clients and leverage dynamic segmentation to provide deep client insights. Now is the time to outthink the status quo.

Every business in the world needs data to thrive. Data is what tells you who your customers are and how they operate, and it's what can guide you to new insights and new innovations. Generally, the more data you have, the more specific and accurate insights you'll be able to generate, which is why big data has become such a powerful tool (and buzzword) in recent years.

The success of next-generation data science initiatives depends heavily on teamwork from the right mix of application developers, business analysts, data engineers, statistical modelers and other specialists. Discover more about the composition of high-quality data science collaboration through the IBM Data Science Experience, the IBM DataFirst Launch Event and other opportunities for accelerating processes that put data to work in cognitive business environments.

Many organizations can capitalize on big data solutions and technologies to make use of expanded volumes of data for enhancing the critical decisions that drive successful business outcomes. And yet, a number of these enterprises can be inhibited from moving big data initiatives forward for a variety of reasons. Take a look at how creating big data centers of excellence can be the catalyst for transformational culture shifts that enable businesses to base vital business decisions on data-ingrained financial models.

Many organizations can capitalize on big data solutions and technologies to make use of expanded volumes of data for enhancing the critical decisions that drive successful business outcomes. And yet, a number of these enterprises can be inhibited from moving big data initiatives forward for a variety of reasons.

Here's this week's news in Data Science and Big Data. Don't forget to subscribe if you find this useful! Interesting Data Science Articles and News How Tech Giants Are Devising Real Ethics for Artificial Intelligence — Researchers from tech companies have been meeting to discuss the impact of artificial intelligence on jobs, transportation and even warfare. The Three Faces of Bayes —

We all got exposed to different sounds every day. Like, the sound of car horns, siren and music etc. How about teaching computer to classify such sounds automatically into categories! In this blog post, we will learn techniques to classify urban sounds into categories using machine learning.

As one heavyweight bulks up on the assets of a rival, a welterweight sheds its stock in pursuit of leanness. Consequently, a defining week for the channel is emerging, as the warring Dell and Hewlett Packard Enterprise prepare to solidify respective go-to-market strategies. In Dell's case, its upcoming multi-billion dollar acquisition of EMC will be finalised on September 7, creating the world's largest privately-controlled, integrated technology company in the process.

Enterprise IT executives expect 60 per cent of workloads to run in the cloud by 2018, as organisations reassess business models amidst accelerating adoption levels. Findings from 451 Research indicates that 41 per cent of all enterprise workloads are currently running in some type of public or private cloud, with enterprises most likely to use on-premises private cloud and software as a service (SaaS), each accounting for 14 per cent of all applications.

This entry was posted in News and tagged , , , , , , , , , . Bookmark the permalink.