Big Data News – 02 Nov 2016

Today's Infographic Link: Print vs. Web

Featured Article
O'Reilly Media's 4th annual survey of data professionals looks at tools, languages, gender, geographies, and a host of other factors that are predictors in terms of how much data workers can expect to earn. Where do you fit?

Top Stories
Kyle will be attending the Open Data Science Conference West in Santa Clara, November 4-6, 2016. We're going to have an informal meetup on Saturday at 7pm at The Bourbon Pub (4900 Marie P DeBartolo Way). There's no formal presentation, this is just a casual meetup. We'll congregate around the bar area. If you're attending the conference or simply in the area, drop by and say hello!

As I winged my way back home after attending OpenStack's Summit in Barcelona last week, I was struck by a growing suspicion that the event marked a turning point for the open source cloud initiative, and that its future is going to start looking different to its past. I've been following OpenStack since its inception six years ago. Back then the project, initially conceived by Rackspace and NASA, saw interest from a couple of different areas. Firstly there were a high number of startups which were founded with an aim of commercializing OpenStack. These startups, largely founded by people had been involved with the project either at Rackspace or NASA, secured copious amount of early-stage venture capital. Most of these companies have folded or been acquired (and generally subsequently shuttered) — Nimbula, Piston Cloud and CloudScaling are now consigned, in part or in whole, to history.

Cloud computing has helped many enterprises transform themselves over the last five years, but experts agree that the market is entering something of a second wave, both for public cloud and private cloud services built and hosted in corporate data centers. The cloud market will accelerate faster in 2017 as enterprises seek to gain efficiencies as they scale their compute resources to better serve customers, says Forrester Research in a new report. "The No. 1 trend is here come the enterprises," says Forrester analyst Dave Bartoletti, primary author of the research. "Enterprises with big budgets, data centers and complex applications are now looking at cloud as a viable place to run core business applications." Forrester says the first wave of cloud computing was created by Amazon Web Services, which launched with a few simple compute and storage services in 2006. A decade later, AWS is operating at an $11 billion run rate.

This tool makes Oozie migrations off Apache Derby (or any other supported database) easy, in addition to streamlining upgrades. The Apache Oozie server is a stateless web application by design, with all information about running and completed workflows, coordinator jobs, and bundle jobs stored in a relational database. Prior to Cloudera Manager 5.4, Oozie was configured to use the embedded Apache Derby database for this purpose by default. However, The post How-to: Use the New Apache Oozie Database Migration Tool appeared first on Cloudera Engineering Blog.

The idea behind ClearMetal came from time spent in a Stanford Business School immersion program at OOCL, a giant ocean shipping line in Hong Kong.

Many web applications have been built on an open source stack that included MySQL. Despite its limitations, MySQL managed to become the world's most widely used open source RDBMS. What limitations, you ask? Out of the box, MySQL does not scale all that well and, in particular, cannot handle a lot of simultaneous clients compared to commercial databases.

The question we ask in this post is whether it is actually possible to hack a voting machine.

New York, NY — In DMA's 2016 Analytic Challenge data scientists from today's leading data and analytics firms went head-to-head to determine who can best calculate a customer's lifetime value and teams from AdTheorent and DataLab USA emerged as co-winners. Sponsored by DMA's Analytics Community and EY (Ernst & Young LLP), the 2016 Analytic Challenge recognizes the top solutions to driving personalized customer interactions.  Influent50 was awarded the runner-up prize. All three winning companies were highlighted earlier this month at DMA's &THEN event in Los Angeles, where attendees gained a full briefing on the competitors and their solutions.   "An ever expanding sea of data is changing the way we communicate, do business and live almost every aspect of our lives," said Tom Benton, DMA CEO."

Cisco CEO Chuck Robbins pledged to apply AI, machine learning to drive analytics, automation and security deep into the data center, core networks.

Cisco Systems is finding its way into storage through its successful server business. On Tuesday, the company introduced modular systems that can be deployed with different combinations of computing and storage capacity. Though it's not Cisco's first foray into storage, the UCS S3260 Storage Server offers a density and a freedom of configuration that stands out against other systems, even competing on cost with public cloud services, the company says. The server was announced at the Cisco Partner Summit in San Francisco. It's the first entry in Cisco's S-Series, a line of systems designed to serve both enterprises and companies that provide cloud services to others.

Apple last week released the latest iteration of the MacBook Pro, and the reviews are not very encouraging to the folks in Cupertino.

It's more important now than ever that retailers have the right pieces in place to successfully capture online traffic and drive conversions.

Once a serious security breach is discovered, the whole "ignorance is bliss" thing that most firms seem to be operating on will be proven false.

Microsoft this week has open sourced the design specifications of servers and racks that make up its hyperscale Azure cloud data centers, contributing the information to the Open Compute Project (OCP). OCP was founded in 2011 and now includes member companies such as Facebook, Intel, Google, Apple, Dell, Rackspace, Cisco, Juniper Networks, Goldman Sachs, Fidelity and Bank of America, who share design specifications for hardware used in their data centers. OCP is meant to be an open-source community where member companies share how they buy and configure components used to make data center equipment. Microsoft joined OCP in 2014 and has contributed server and data center designs for its Azure cloud. This week the company announced that it will contribute Project Olympus, which are a series of hardware design specifications for "next-generation hyperscale hardware design," the company said in a blog post.

Find out why organizations are adopting predictive analytics in their quest to deliver an experience that keeps customers coming back.

Use data to drive the future of business–and your career. Follow our faculty's lead into the high-demand field of analytics. The shortage of data analysts in the U.S. represents a major career opportunity for you. Companies in all industry sectors are searching for knowledgeable, experienced analytics leaders who can help them succeed in the age of Big Data. Northeastern helps you to meet this growing market need–and expand your career prospects–by offering Graduate Programs in Analytics.

Recent years have seen a significant increase in the remote workforce as developments in tech have given employees the freedom to work anywhere, anytime.

The enterprise must implement digital transformation on its terms, not vendors', with an understanding of what must change and how success is defined.

The presidential race is as tight as it is controversial. Early voting data models by L2 Political illuminates which candidate is least popular with suburban voters in battleground and swing states.

MongoDB has made an update to its namesake database that natively supports graph analytics as well as a much faster implementation of SQL.




Here's how to turn some common open source objections into analytics advantage.

Survey finds pain points for secondary storage, average restore times, the growth of hyperconvergence and rate of cloud adoption in enterprises.

Version 2.2 of Ansible will support Cisco (ASA), Dell, F5 Networks, Nokia SR-OS, Pluribus Networks (Open Netvisor) and VyOS devices.

Version 2.2 of Ansible will support Cisco (ASA), Dell, F5 Networks, Nokia SR-OS, Pluribus Networks (Open Netvisor) and VyOS devices.

Discover some of the latest and greatest developments in the world of Apache Spark in this retrospective on Spark Summit Europe 2016. When you do, find out how modern developers are connecting their data sources to Spark in exciting ways, and learn how the cloud is providing access to Spark-powered sentiment analysis capabilities.

Why has IBM created its own distribution of Apache Hadoop and Apache Spark, and what makes it stand out from the competition? We asked Prasad Pandit, program director, product management, Hadoop and open analytics systems, at IBM to give us a tour of the reference architecture for IBM Open Platform with Apache Hadoop.

I recently received a reminder from a social network to reach out to one of my friends. I'd love to do that, but the friend has been dead for many years, making the task very difficult. The reminder did get me to thinking about how we, as a business community, should consider treating personal information when we learn someone has passed on. The data quality principle The Organization for Economic Cooperation and Development, OECD, established 8 basic principles for the processing of personal information. The Data Quality Principle states that "Personal data should be relevant to the purposes for which they are to be used, and, to the extent necessary for those purposes, should be accurate, complete and kept up-to-date."

Achieving balance between work and home life is an ongoing challenge for professionals across industries, but it turns out the IT world is doing pretty well in helping to make it happen. Roughly a third of the jobs listed in a new report entitled, "29 Best Jobs for Work-Life Balance (2016)" from recruiting site Glassdoor are within the IT realm, including the oft-celebrated position of data scientist at No. 3. Several HR positions earned rankings on Glassdoor's list, including corporate recruiter in the No. 1 spot. Strategy manager took No. 4; substitute teacher and library assistant were also on the list.

Version 2.2 of Ansible will support Cisco (ASA); Dell; F5 Networks; Nokia SR-OS; Pluribus Networks (Open Netvisor) and VyOS devices.

Investments in predictive analytics, dashboards, and KPIs are coming to CRM lead management offerings, according to Gartner's most recent Magic Quadrant report for the category. The finding is reflective of an increasing number of IT organizations aiming to deliver more and better intelligence for sales and marketing users.

Why do many organizations that invest heavily in analytics and hire data scientists to slice and dice the data end up frustrated? Here's how to make predictive analytics powerful. Keep on reading: Predictive analytics: the knowledge worker is the battleground

Establishing a digital governance plan can be a challenge, but with the right education and tools, the job can be made a lot simpler.

R 3.3.2, the latest update to the R language, was released today. Binary releases for Linux and Mac are available now from your local CRAN mirror, and the Windows builds will be available shortly. As a minor update to the R 3.3 series, this update focuses mainly on fixing bugs and doesn't make any major changes to the langauge. As a result, you can expect existing scripts and packages to continue to work if you're upgrading from R 3.3.1. This update includes some performance improvements (particularly in calculation of eigenvalues), better handling of date axes in graphics, and improved documentation for the methods package.

Organizations need to develop skills and technology to manage Big Data operations across multiple data centers, likely across large geographic areas.

Banks generate all kinds of reports, but not all of them can explain customer behavior or predict a customer's next move. Find out how IBM Customer Insight for Banking uses cognitive analytics to give you vital insight into customer behavior, allowing you to predict financial events, prevent churn and proactively engage your customers as individuals.

Customers' desires have changed dramatically through the years. Most notably, modern consumers expect organizations to know them–and, what's more, to anticipate their needs. In such an environment, traditional approaches to customer segmentation are giving way to new methods of engaging with customers. IBM's Client Insight solutions are tapping into the power of transforming customer segmentation through cognitive insight. As they do, they are helping organizations in several ways: Generating dynamic segments based on client behavior Applying industry-specific models to deliver actionable insights Predicting future lives and financial events of clients Using insights to drive personalized client offers In this video, recorded live on location at the Financial Services Sector (FSS) Forum, Marc Andrews, vice president of IBM Watson Financial Services, and guest speaker Jim Marous, owner of the Digital Banking Report and copublisher of the Financial Brand, trace the trends pointing toward the need for enhanced customer insight solutions for financial services: Not only is data collected on every aspect of life, but this data is also powerful enough to connect–and even interact–with customers. The trend toward financial inclusiveness is connecting financial institutions with previously disregarded or overlooked populations. The modern consumer can use a smartphone to make a down payment on a fully electric car that can park itself in a driveway.




Project CloudWAN uses network virtualization, microservices and controller technology hosted in the cloud to manage physical and virtual appliances.

We aren't doing enough to eradicate zombie apps from attacking in the first place.

A new Gartner Maverick report says that annual physical medical exams and primary care doctors are about to be disrupted by IoT medical devices and algorithm. Here's how IT must shift to accommodate the change.

Predicting the future is not a theoretical superpower. It is a skill we already rely on to make decisions, and like any other skill it can be rapidly improved with deliberate practice. Unsurprisingly, deliberate practice looks like making predictions within a domain and comparing the results to reality. This year I've built a culture of doing just that at Twitch. It's the most exciting work I've ever done.

Successful marketing automation strategies are highly dependent on big data. Brands should understand the role big data plays and have a clear strategy to collect and use it effectively.Why Marketing Automation Depends on Big DataBig data plays a key role in marketing automation. Here are some reasons big data is so important:

For many organizations, existing data-analytics infrastructures are too complicated, and it's difficult to find qualified job candidates to helm analytics projects.

Spreadsheets are excellent tools as far as they go–but how far can they truly go? If you're pushing your spreadsheet-based solutions beyond their viable limits, then they might be doing more harm than good. Discover what considerations you shouldn't ignore when using spreadsheets for statistical analysis, and explore alternatives that are designed to help you expand your limits without putting your organization at risk.

Microsoft is much more tightly connecting the hardware and software.

Sanovi Technologies is a provider of orchestration and visualization tools optimized for managing data protection.

New funds aimed to fuel growth by accelerating development, expanding sales and marketing, and growing international operations.

In recent years, the best-performing systems in artificial-intelligence research have come courtesy of neural networks, which look for patterns in training data that yield useful predictions or classifications. A neural net might, for instance, be trained to recognize certain objects in digital images or to infer the topics of texts. But neural nets are black boxes. After training, a network may be very good at classifying data, but even its creators will have no idea why. With visual data, it's sometimes possible to automate experiments that determine which visual features a neural net is responding to.

This entry was posted in News and tagged , , , , , . Bookmark the permalink.