Big Data News – 17 May 2016

Today's Infographic Link: Diversity in Tech

Top Stories
LinkedIn Dave Neitz, CIO of $1.3 billion construction and engineering firm CDM Smith, thinks the time has come for his industry, traditionally a late adopter of technology, to transform itself through digital tech. In a recent interview with executive recruiter and CIO.com blogger Martha Heller, Neitz explained how digital transformation can help solve the United States' trillion dollar infrastructure problem, and protect its cities from cyberattacks. 

Welcome back to my blogging adventure.  In my Cybersecurity Architecture series, we've spent some time discussing the value an analytic approach to the incident response process. In the last article, Conceptual Cybersecurity Architecture for analytic response, we started to drill into the solution space by giving a high level architecture to drive our discussion.  Let's… The post Cybersecurity Architecture: All about sensors appeared first on Hortonworks.

DDN, a high-end storage standard bearer, this morning joined the ranks of storage vendors offering new all-flash technology with the launch of Flashscale, a family of all-flash, scale-out and scale-up products designed for read, write and mixed workloads for enterprise data analytics, web scale cloud and HPC environments.

SAP has updated its flagship Hana in-memory computing platform with a raft of new features designed to make IT simpler while giving organizations a better handle on their data. The updates, announced Tuesday at the company's annual Sapphire Now conference in Florida, include a new hybrid data management service in the cloud and a new version of the company's Hana Edge edition for SMBs. [ The InfoWorld review roundup: AWS, Microsoft, Databricks, Google, HPE, and IBM machine learning in the cloud.

SAP has updated its flagship Hana in-memory computing platform with a raft of new features designed to make IT simpler while giving organizations a better handle on their data. The updates, announced Tuesday at the company's annual Sapphire Now conference in Florida, include a new hybrid data management service in the cloud and a new version of the company's Hana Edge edition for SMBs.

Want a pay boost? Pick up a new skill. Which one? Go, Scala, and big data skills like Apache Spark and Hadoop are all good places to start, according to PayScale, a salary-tracking site for IT and other industries. PayScale used its pay-tracking database to determine which job skills provide the largest average boost in pay, and presented the results in its 2016 Workforce-Skills Preparedness Report, "Leveling Up: How To Win in the Skills Economy."

The basic strategy of GRC professionals is similar to the old whack a mole arcade game in which players use a mallet to hit toy moles, which appear at random, back into their holes. A risk pops up? Whack it! Think a risk might pop up? Whack it with a control.  Ineffective control? Whack it…

On a regular basis, I get asked for examples of companies that have moved from traditional licensing models to subscription-based and cloud-delivered ones. The number of companies to have made this transition successfully is actually pretty limited — whether the barriers are technological, cultural or market perception-based, there are far more examples of cloud vendors disrupting traditional ones than there are of traditional vendors successfully transitioning to the cloud. Adobe, however, is one example of a company that has navigated this change with aplomb.

It looks like the virtualization industry will sustain itself going forward by shoring up its installed base rather than finding new deployments.

When thin clients are employed, the entire security posture of the organization changes.

When thin clients are employed, the entire security posture of the organization changes.

SAP and Microsoft have expanded an existing partnership to offer new products to users of the Azure and Office 365 cloud services, focused on better integrating the two companies' offerings.  Azure customers will be able to use SAP HANA in Microsoft's public cloud, expanding the reach of that popular relational database service. SAP is also integrating its services including Fieldglass, Concur and SuccessFactors with Microsoft Office 365, so users can get the benefits of Microsoft's communications, collaboration, calendar and document editing tools.

As you may have heard, Hadoop is 10 this year. In celebration, here are some posts we think you'll find interesting. Doug Cutting on Hadoop turning 10–The co-creator of Hadoop talks a bit about the tech's history, and what he sees in the future. A key theme is the importance, and inevitability, of open source technology. Know your business needs for Hadoop–Diving into the data-driven world can be exciting, but SVDS CTO John Akred stresses the need for solid business plans. The process can be overwhelming to consider, and there are pitfalls to avoid.

SnapLogic, the unified data and application integration platform as a service (iPaaS), introduced the Spring 2016 release of its SnapLogic Elastic Integration Platform.

Your team has built the data science models. You've showcased the results to the business and they are sold. Now you've got to make sure that your model scales to meet the rigorous performance targets. Should you buy bigger machines? Should you move to Hadoop? Big Data, advanced analytics and scientific computing bring exciting opportunities for businesses like yours to leverage. However, it also creates serious computational challenges. To effectively manage the increasing complexity, your Data Science team needs technologies that will easily scale and take advantage of the available processing power. In this whitepaper, you'll learn from our seasoned experts about the approaches to scaling your data science models.

John Mount Ph. D. Data Scientist at Win-Vector LLC Win-Vector LLC's Dr. Nina Zumel has just started a two part series on Principal Components Regression that we think is well worth your time. You can read her article here. Principal Components Regression (PCR) is the use of Principal Components Analysis (PCA) as a dimension reduction step prior to linear regression. It is one of the best known dimensionality reduction techniques and a staple procedure in many scientific fields. PCA is used because: It can find important latent structure and relations. It can reduce over fit.

Analysis: From machine learning to IBM Watson in the kitchen – CBR explains cognitive computing.

By: Steven Ramirez, Conference Co-Chair, Text Analytics World Chicago In anticipation of his upcoming conference presentation, Tips and Tricks on Developing High-performance Fuzzy Name Search Engine to Prevent Terrorism Financing at Text Analytics World Chicago, June 21-22, 2016, we asked Emrah Budur, Senior Software Engineer at Garanti Technology, a few questions about his work in text analytics. Q: What is your topic mainly about? A: The governments are enforcing the financial institution to avoid any kind of affiliation with the sanctioned entities.

Microsoft's forthcoming SQL Server 2016 adds better encryption, support for unstructured data queries including those on Apache Hadoop, and new features to enable the hybrid cloud. The updates are part of an overhaul that updates Microsoft's flagship database management platform for a new age.

Every few years the technology industry seems to be consumed with a shiny new object that gets hyped far beyond reality. At worst, the inevitable bursting of the hype bubble leads to the disappearance of the technology from relevance (remember Internet browsing on your TV?), but more often the hype subsides until a real but narrower focus for the technology is found. It's been a decade since Hadoop was first created as an Apache top-level project, and during that decade we've certainly witnessed a lot of hype about what it can do. The hype was driven in part by real needs in the market that were not being met.

In this contributed article by John Thielens, Chief Technology Officer and data scientist at Cleo, describes that in less than two years, the International Maritime Organization (IMO) will enforce a mandate for the electronic exchange of information related to cargo and shipping worldwide.

Maintaining quality data is no simple feat, as employees manually input flawed data or issues emerge during the data migration and conversion process.

SAP and Microsoft have expanded an existing partnership to offer new products to users of the Azure and Office 365 cloud services, focused on better integrating the two companies' offerings.  Azure customers will be able to use SAP HANA in Microsoft's public cloud, expanding the reach of that popular relational database service. SAP is also integrating its services including Fieldglass, Concur and SuccessFactors with Microsoft Office 365, so users can get the benefits of Microsoft's communications, collaboration, calendar and document editing tools.

The world of automated provisioning has come a long way in a short time. From hand deploying everything from temporary VMs to complex clustered systems, we have reached the point where the entire operations stack can be provisioned with the click of a button — provided the infrastructure has been put together to do so. This has the huge benefit of offering operations more time to work on projects that add value to the organization. That new system that marketing needs can now move forward because operations has the man-hours, for example. It also offers the surety that there isn't some magical individual on staff who holds all the critical information about a system.

The growing importance of business intelligence and data analytics applications in driving business decision making has made data integration's vital role in the enterprise crystal clear. From gathering data, transforming it into useful information and delivering it to the business users or processes that need it, data integration routines provide the crucial link between a variety of source and target systems.

Google's I/O developer conference starts Wednesday and some big announcements are expected during the opening two-hour keynote, likely around virtual reality, Android and the Internet of Things. In a change of pace, the show is being held outdoors at the Shoreline Amphitheater in Mountain View, Calif., and Google has advised the press to pack sunscreen along with laptops and mobile devices. Here are five questions I want company executives to answer during the course of this year's keynote.  What's the company's plan for the Internet of Things?

It's not often easy for line-of-business managers to get a real-time view of their budgets and spending, but a new app from SAP aims to change that. Based on SAP's Hana Cloud Platform, the app pulls data from core financial reporting systems and makes it searchable, so that line managers can do ad hoc spend analyses and other on-the-fly calculations. Called SAP RealSpend, the app lets managers drill down and perform a fine-grained analysis of actual and future spending. It can also deliver related forecast and budget plans.

Do you really need to go back to school and get another degree in order to establish yourself in a career as a data scientist? Maybe not. These nine free online courses can help you explore a range of topics, including Python, R, AI, machine learning, and Hadoop, before you commit to more advanced learning.

In the lead up to the inaugural Chief Data Officer Forum Africa I have been surveying the speaker faculty to get insights into where their focus lies. As well as their thoughts on the CDO role as it stands in South Africa right now. The results of this survey will give you some idea of what they will discuss at the event early next month. To give you some context 13 speakers have completed the survey to date.

Twitter will relax the 140-character limit, by not counting media or links. So says Sarah's single source, speaking secretly. So that might give us a whole extra 24 characters to play with (or 23, if you don't count the separating space). Big deal — why not fix the more pressing Twitter issues, like editability and spam? In IT Blogwatch, bloggers compose tweets of 164 characters. Your humble blogwatcher curated these bloggy bits for your entertainment.

Twitter to loosen its 140-character maximum, by a tiny bit — in future, it won't count media or links. At least, so says @sarahfrier's single, secret source. [Developing story: Updated 8:21 am PT with more comment] Such a change would give us a whole extra 24 characters to play with (or 23, if you neglect the separating whitespace). Why all the fuss? How about fixing the more important Twitter problems, such as editing tweets and nuking spam? Your humble blogwatcher curated these bloggy bits for your entertainment. They're not too long, natch.

News: Proposals from the Competition and Markets Authority (CMA) focus heavily on the impact of technology as a means to disrupt the traditional banking industry.

Riverbed baked several new features into the latest version of its SteelCentral application performance and network management suite, designed to help manage networking components and applications that live in the cloud and unified communications systems — as well as making the whole thing a little more intuitive. The new features, which are available for SteelCentral customers to download today, provide visibility into application traffic in Azure and AWS, as well as PaaS and containerized environments.

Supergiant is a container hosting platform built using Kubernetes for distributed, stateful applications. By Hrishikesh Barua

Google's I/O developer conference starts Wednesday and some big announcements are expected during the opening two-hour keynote, likely around virtual reality, Android and the Internet of Things. In a change of pace, the show is being held outdoors at the Shoreline Amphitheater in Mountain View, California, and Google has advised the press to pack sunscreen along with laptops and mobile devices. Here are five questions I want company executives to answer during the course of this year's keynote.  What's the company's plan for the Internet of Things?

Earlier this week, we hosted a Continuous Discussion (#c9d9) on Continuous Delivery (CD) automation and orchestration, featuring expert panelists Dondee Tan, Test Architect at Alaska Air, Taco Bakker, a LEAN Six Sigma black belt focusing on CD, and our own Sam Fell and Anders Wallgren. During this episode, we discussed the differences between CD automation and orchestration, their challenges with setting up CD pipelines and some of the common chokepoints, as well as some best practices and tips for implementing CD.

Stress Testing: A testing process designed to push an application's environment to its breaking point so that QA teams can gain an understanding of the upper limits of capacity within the system. Its purpose: Stress testing exposes issues that may not appear under normal or even expected conditions. It allows testers to determine the software's robustness and ensure that the system fails and recovers in an acceptable manner.

In Part 1 of this blog series, we looked at how hyper-personalisation is fundamental to delivering "Great" Customer Experience at each and every interaction. Businesses need to "see the world as customers do" in order to appreciate the full customer context — i.e. each journey has to be managed end-to-end rather than as a series of disconnected interactions. Yet nowadays consumers pretty much dictate where and how they liaise with service providers. Increasingly, most of this is done in the digital realm and anonymously. Most organisations currently have little to no visibility of these exchanges, especially in Social Media. Hence the question beckons: 'How can a business gain this end-to-end perspective while it controls only parts of the customer's buying journey?' Contrary to common practice of simply "buying a new tool", it is the underlying data and analytics ecosystem that enables an organisation to connect-the-dots.

Eighteen months after its acquisition of Motorola Solutions Enterprise business, Zebra Technologies Corporation (NASDAQ: ZBRA), a global leader in providing solutions and services that give enterprises real-time visibility into their operations, shares its vision for the company moving forward within a growing industry category, Enterprise Asset Intelligence (EAI).

Automation is a critical component of DevOps and Continuous Delivery. This morning on #c9d9 we discussed CD Automation and how you can apply Automation to accelerate release cycles, improve quality, safety and governance? What is the difference between Automation and Orchestration? Where should you begin your journey to introduce both?

Agile teams report the lowest rate of measuring non-functional requirements. What does this mean for the evolution of quality in this era of Continuous Everything? To explore how the rise of SDLC acceleration trends such as Agile, DevOps, and Continuous Delivery are impacting software quality, Parasoft conducted a survey about measuring and monitoring non-functional requirements (NFRs). Here's a glimpse at what we discovered and what it means for the evolution of quality in this era of Continuous Everything…

It's not often easy for line-of-business managers to get a real-time view of their budgets and spending, but a new app from SAP aims to change that. Based on SAP's Hana Cloud Platform, the app pulls data from core financial reporting systems and makes it searchable, so that line managers can do ad hoc spend analyses and other on-the-fly calculations.

It seems that a dynamic is emerging that may be a version of that old franchise wars tension between local governments and telecom players.

IT analytics company Nyansa today rolled out a public web portal, offering anonymized data pulled from instances of its flagship Voyance product in operation around the world. The idea behind Voyance Live, according to CEO Abe Ankumah, is to provide insight into common enterprise network problems and suggest possible solutions to IT departments. +ALSO ON NETWORK WORLD: Software audits: How high tech plays hardball + Washington nudges Verizon, striking unions back to bargaining table

In an interview, consultant Lakshmi Randall foresees changes in how data management is organized as the overall data landscape shifts.

Land O'Lakes picked Google to run the backend when it decided to launch a new application that connects a bunch of different cloud services to one another for the sake of improving farmers' decisions. It's something of a surprising choice for the decades-old company. Much of the company is built on Microsoft technology, said Teddy Bekele, the vice president of IT for Land O'Lakes's WinField division. While Microsoft's Azure cloud platform was in the running to host the new WinField Data Silo tool, Microsoft ended up losing out to Google Cloud Platform (GCP). It's a major win for Google, which has been trying to entice more large companies over to its cloud platform.




If you've been thinking about trying the big-data capabilities of Microsoft R Server but wanted to check out the documentation first, you're in luck: the complete Microsoft R Server documentation is now available on MSDN (and is accessible to anyone). There's lots to explore here, but a few highlights you might want to check out include: Getting Started with Microsoft R Server: what's included, launch it on Windows or Linux, and a basic R language tutorial The RevoScaleR Getting Started Guide, with an introductory tutorial on the big-data functions of Microsoft R Server Details on distributed computing with RevoScaleR, including modeling data stored in Hadoop and SQL Server Adding your own distributed algorithms with the RevoPemaR package Speeding up R "for" loops with parallel computation There's even more to explore at the documentation site linked below.

For consumer product companies to leverage sales and increase overall growth, professional development is a top priority for the VP of sales. These data-driven practices can help identify the right areas of focus and ROI opportunities.

For consumer product companies to leverage sales and increase overall growth, professional development is a top priority for the VP of sales. These data-driven practices can help identify the right areas of focus and ROI opportunities.

Data science takes collaborate teams of data scientists engaging in productive, open data development initiatives that can ensure strong workflow, governance, security and management. See why open environments are revolutionizing the data science landscape.

Data science takes collaborate teams of data scientists engaging in productive, open data development initiatives that can ensure strong workflow, governance, security and management. See why open environments are revolutionizing the data science landscape.

In our new monthly feature, we'll be keeping you up-to-date on the latest career developments for individuals in the big data community. Whether it's a promotion, new company hire, or even an accolade, we've got the details. Check in each month for an updated list and you may even come across someone you know, or better yet, yourself! Victor Lund Victor Lund has been elected President and CEO of Teradata. He is taking over for Mike Koehler who stepped down earlier this month. Prior to his appointment, Lund was a member of Teradata's board of directors since 2007.

Google I/O, the company's annual developer conference that takes place May 18-20 in Mountain View, Calif., can be a tough ticket to get — there's a lottery just to get the opportunity to buy one. Academic tickets are $300, and general admission is a weighty $900. So there are a lot of interested folks who might not have the time, money, or opportunity to get to Mountain View this year. Fortunately, Google's got a number of ways to watch the events of I/O unfold, most notably live video streaming, which you ought to be able to stream it on their site here.

Summary:  Proof of Concept projects are a popular place to start but they may be the wrong solution.  To ensure success focus on Proof of Value and alignment with the company's strategy.  Get the right executive sponsor and keep them involved.   If you Google 'Data Science Proof of Concept' you will find dozens if not hundreds of articles extolling the virtues of starting with a POC.  And yet it is a common experience that while a POC may lead to a larger implementation of what the POC was designed to demonstrate, this method of getting started with predictive analytics and Big Data is frequently not in the interest of the company as a whole. What Exactly is a POC Here's what we mean by a POC.  

LinkedIn developed Ambry as it couldn't find a storage solution that addressed horizontal scalability, availability, active-active data center config.

This blog focuses on moving streaming analytics outside the confines of the traditional data center. Moving streaming analytics closer to where data originates can be accomplished by leveraging an enterprise grade data movement application, married with an extremely lightweight streaming engine. This combination is being used by forward-looking organizations to solve usage cases in a… The post Moving Streaming Analytics Out of the Data Center appeared first on Hortonworks.

Enabling e-commerce capabilities isn't a trivial matter for companies small or large, but SAP has a new cloud tool it thinks will help SMBs in particular. Called SAP Anywhere, it's designed specifically for companies with 10 to 200 employees.

This entry was posted in News and tagged , , , , , , , , . Bookmark the permalink.