Big Data News – 3 Sep 2015

Top Stories
You've heard it before but we said it again – this time in our recent webinar. There's a new kid in town: the chief data officer. Why the new role? Because of an increasing awareness of the…

Editor's Note: While the story is fiction, the events are drawn from the experiences of the author and his colleagues. Michael is an analytics director. This evening we find him frazzled. As he pulls out of the TalkThree parking lot, contemplating his next move, his radio whimpers a slow rock ballad. Michael let down the […] The post Empathy and Data Science: A Fable of Near-Success appeared first on Predictive Analytics Times.

Moving data is tricky! …

Salesforce reveals Service Wave Analytics, a role-specific application built on its Wave Platform that gathers information for service representatives and their managers.

Salesforce reveals Service Wave Analytics, a role-specific application built on its Wave Platform that gathers information for service representatives and their managers.

Published Date: 2015-09-03 13:05:47 UTC Tags: Analytics, Big Data, Data Science, Data Warehousing, Open Data Title: What Are Data Lakes? Subtitle: What Do They Mean For Analytics?

What does your TV say about you and your voting intentions? You are more than what you eat! You're also what you wear, what you drive, and what you watch on TV. With that in mind, we asked more than 2000 people to tell us what television stations they watched as well as how likely they would be to vote for a number of candidates for president of the USA.

On August 28, 2015, the first CNUTCon was held in Beijing. At the conference, Kevin Huo, the founder of Geekbang & InfoQ China, announced that InfoQ has joined forces with domestic front-line IT technology companies to establish the CNUT Container Technology Club.

Federal authorities are marching ahead with a new framework for opening government data, a process that aims to consolidate department and agency datasets into a standardized format and make them accessible for the public. Christina Ho, deputy assistant secretary for accounting policy and financial transparency at the Treasury Department, recently provided an update on the rollout of the 2014 DATA Act, a sweeping bill that for the first time mandates a holistic system for making government spending data transparent and freely available.

Data gathering and analysis are now part of an array of tools used to fight wildfires in the US. Here's what it all looks like.

The volatility in the market requires that organizations become able to adapt to changing demand as fast as possible while gaining the highest value. To implement agile managers need to team up to remove impediments in the organization says Ahmet Akdag. An agility transformation is about learning to try, fail and learn.

Cameron Tongier of the US Fish and Wildlife Service Fire Management Branch spoke with InformationWeek from his temporary office near a fire line in Idaho. He's one of several front line wildfire managers we spoke with about the long arc of data analysis that leads up to daily situation reports for wildfire managers.

Cameron Tongier of the US Fish and Wildlife Service Fire Management Branch spoke with InformationWeek from his temporary office near a fire line in Idaho. He's one of several front-line wildfire managers we spoke with about the long arc of data analysis that leads up to daily situation reports for wildfire managers.

PaaS Semantic Interoperability Framework (PSIF) Loutas et al. defines semantic interoperability as "the ability of heterogeneous Cloud PaaS systems and their offerings to overcome the semantic incompatibilities and communicate" [Lou11]. The target of this framework is to give developers the ability to move their application(s) and data seamlessly from one provider to another. Loutas et […]

The Performance Index analysis we performed as part of our next-generation predictive analytics benchmark research shows that only one in four organizations, those functioning at the highest Innovative level of performance, can use predictive analytics to compete effectively against others that use this technology less well. We analyze performance in detail in four dimensions (People, Process, Information and Technology), and for predictive analytics we find that organizations perform best in the Technology dimension, with 38 percent reaching the top Innovative level.

A new in-memory query engine designed to boost interactive analytics capabilities on Hadoop has been added to SAP HANA along with other new cloud platform services. SAP HANA Vora software released this week aims to leverage and extend the Apache Spark execution framework to boost the performance of Hadoop. The query engine is designed to target distributed data to provide contextual awareness while improving "business process awareness" across enterprise applications and analytics, the company said Tuesday (Sept. 1).

There is data in motion, and then there is really big data in motion. The folks at LinkedIn gave us a compelling example of the latter today when it announced that it's using the distributed messaging system Kafka to process more than 1.1 trillion messages per day. Kafka of course was born at LinkedIn. Around the year 2010, the social media company found it was struggling to adequately move data through batch-oriented messaging systems, and so three LinkedIn engineers-Jay Kreps, Neha Narkhede, and Jun Rao-created a new system built on a distributed platform.

We all remember the day when the Big Boss at Oracle, Larry Elison expressed his feelings against the cloud (see here) and guess what? Oracle is now (finally) getting serious about the cloud — or at least, they try to. I mean it is now 7 years or more that companies like Microsoft, IBM and […]

Revolution R Enterprise (RRE), a version of the R statistics language produced by a company recently acquired by Microsoft, is making its way to Microsoft Azure in a technical preview. Speculation has abounded regarding how Microsoft would handle Revolution and its associated products, post-acquisition. One likely scenario was to offer R as a service — a cloud-hosted resource for scientific and statistical number-crunching. Now both Microsoft and Revolution are a step closer to doing exactly that.

There seems to be an implicit promise associated with the rise of big data analytics: By taking more measurements and calculations, that we can deliver deeper insights atop source data, and do so at quicker intervals than before. But this premise-that today's analytics can get us closer to that single version of the truth-may be harder to achieve than first thought.

    Dear DSC Member, Time and again, we hear how data scientists spend 80% of their time cleaning and preparing data, leaving only 20% of it for the juicy stuff like predictive analysis or pattern mining. Most conferences focus on that 20% we wanted to focus on the 80% instead. The Rich Data Summit is a one-of-a-kind event focused on wrangling, cleaning, blending, and enriching data.




The Big Data Innovation Summit is coming up soon and in celebration of Labor Day, get a $400 discount on all two-day pass prices with the code LABOR400. The summit returns to Boston next week on September 9 & 10, and the remaining places are filling fast. In addition to keynotes from Stephen Wolfram, the United Nations, eBay, Airbnb, and UPS on the Big Data Innovation track, we also have 60+ presentations across the following tracks: Big Data Analytics: Boston Red Sox, Berg Health, OrderUp Apache Innovation: StubHub, GoDaddy. com, Bank of America Data Science: Home Depot, Pinterest, Wayfair Analytics & Infrastructure: US Department of Energy, Cigna Healthcare, WordPress Cloud & Data Architecture: MapQuest, McGraw-Hill Education, IBM Check out this presentation from the Manager, Data Science & Analytics Engineering at Facebook from a previous summit, 'Data Science & Analytics for the Cloud Infrastructure' To secure your discounted pass, please contact Hayley Law at hlaw@theiegroup.com (+1 415 692 5378) or secure your discounted pass here. Happy Holidays! Innovation Enterprise

Updates to Google Apps offer a way to track recent document changes and content creation assistance driven by machine learning.

Vendors, the trade press and even popular media are talking about the world's rapidly expanding body of electronic data. Lots of data equals lots of value, right? The not-so-subtle message surrounding all this is that data is the new gold rush, so you'd better get your stake in the game now. Often, the data is […] The post Text Analytics: ROI Recipe Secret Ingredient appeared first on Predictive Analytics Times.

Published Date: 2015-09-02 16:50:50 UTC Tags: Analytics, Chief Analytics Officer Title: Analytics At Legendary Entertainment Subtitle: Matt Morolda, Chief Analytics Officer, discusses

Sometimes there's so much hype around analytics, big data and The Next Big Data Thing that it buries news about useful data projects offering real-world value. That's why I find the Computerworld Data+ Editors' Choice awards so interesting — they're examples of data use in the wild that are actually helping real businesses, government agencies and other entities do everything from fight credit-card fraud and find the best ways to treat hospitalized patients to identify potentially endangered species. Interested in finding a project that might be specific to what you do? I've collected and categorized them all in a searchable table below — click through any item in the first column to see a project profile. Or, you can browse the projects by starting with Finding ROI in a swirl of data analytics and then clicking through them in the navigation strip at the bottom of all the project profiles.

From the phenomenal increase in the number of calls that analytics professionals are getting from recruiters, to the fact that nearly every quantitative team is planning to hire this year, it is overwhelmingly clear that there has never been a better time to be a Quant.

Published Date: 2015-09-02 16:05:53 UTC Tags: Analytics, Banking, Big Data, Cyber security, Finance Title: How Can Finance Companies Use Analytics For Cyber Security? Subtitle: Could Big Data save firms hundreds of millions?

Guest blog post by Tony Agresta Technology to store, manage, search and analyze Big Data leaps to the top of the agenda for Financial Institutions as enterprise NoSQL databases come of age. Financial Institutions are focused on initiatives to survive in a world where regulatory pressure, risk mitigation and increasing volumes of data continue to pressure legacy infrastructures. Improved operational efficiency and revenue generation are at the forefront of the agenda. Specific areas of concentration vary across regions of the world.

Thanks to a deluge of sensor data, not only is our ability to forecast the weather more accurate than ever, but this information also can be used in exciting new ways. In this Internet of Things podcast, see how The Weather Company is helping businesses use weather data for competitive advantage.

At the Golang UK Conference, Peter Bourgon introduced 'Go kit', an open source microservice toolkit that can be used to facilitate and standardise the creation of Go-based services within the modern enterprise application stack.

One of the key points I raised was about how many folks were just slapping on Big Data badges to the same old same old, another was that Map Reduce really doesn't work they way traditional IT estates behave which was a significant barrier to entry for Hadoop as a new technology. Mark Little took this idea and ran with it on InfoQ about Big Data Evolution or Revolution? Well at the Hadoop Summit in Amsterdam this week the message was clear… SQL is back, SQL is key, SQL is in fact the King of Hadoop Part of me is disappointed in this. I've never really liked SQL and quite liked the LISPiness of MapReduce but the reason behind this is simple. When it comes to technology adoption its people that are key, and large scale adoption means small scale change Think about Java. A C language (70s concept) derivative running on a virtual machine (60s) using some OO principles (60s) with a kickass set of libraries (90s).

by Andrie de Vries Just more than a year ago I cobbled together some code to work with the (then) new version of Google Sheets. You can still find my musings and code at the blog post Reading data…

Originally posted on Data Science Central Step by Step Tutorial to Deploy Hadoop Cluster (fully distributed mode): Setting Hadoop in cluster requires multiple machines/nodes, one node will act as master and rest all will act as slaves. If you want Hadoop quick introduction please click here. In this tutorial: I am using 3 nodes, 1 master 2 slaves I am using Cloudera distribution for Apache hadoop CDH3U3 (you can use Apache hadoop(0.20.X) also) I am deploying hadoop on ubuntu (you can use other OS (cent OS, Redhat, etc))

Nice infographics created by the Technology Services Group. TSG have also produced a blog post to complement the infographic, which you may find useful. It talks around how much technology has shrunk over the years and yet its power has grown.

by Chris Delker After years of pursuing their elusive dream of flight and suffering setback after setback, Wilbur and Orville Wright considered themselves failures. In a moment of despair, Wilbur wrote: "If man ever flies, it will not be in our lifetime." But little more than a year later, Orville became the first person in history to achieve powered flight. What if they had given up? What if no one else had been seeking their vision of powered flight? Well, one thing is certain: Our world would look a lot different today. But they didn't quit. The Wrights and other innovators have demonstrated that persistence and patience, and accepting that failure is part of life, often lead to success.

In anticipation of his upcoming conference presentation, The Changing Face of Analytics at Federal Agencies: A View from the IRS at Predictive Analytics World for Government, Oct 13-16, 2015, we asked Jeff Butler, Associate Director of Data Management, IRS Research, Analysis and Statistics organization, a few questions about his work in predictive analytics. Q: How […] The post Wise Practitioner – Predictive Analytics Interview Series: Jeff Butler at IRS Research, Analysis, and Statistics organization appeared first on Predictive Analytics Times.

Smart grids are the ultimate no-brainer solution for the sensible use of electricity, the commodity that makes our hyperconnected world possible. These tools can intelligently allocate power, saving the planet's finite resources and cutting consumers' energy bills. However, the barriers to this must-have technology are considerable because the grid is a true legacy technology; it is ubiquitous, yet mostly invisible. The tale of building a smarter grid involves many characters – for example, utilities, governments and consumers – each with entrenched interests and motivations. That said, there's plenty of hope for a happy ending to the smart grids story.

There are many different team topologies that can be effective for DevOps. Each topology comes with a slightly different culture, and a team topology suitable for one organisation may not be suited to another organisation, even in a similar sector. This article explores the cultural differences between team topologies for DevOps, to help you choose a suitable DevOps topology for your organisation.

The way businesses present their products in the market usually relies to the trend showcased by the mainstream media. Different inventions were being introduced either as a means of advancing humanity or in some cases, possibly a trap — making our lives dependent to those things.

An open-source framework like Hadoop offers endless possibilities for development, and with a strong management group like Apache Systems behind it, one can expect increasing numbers of modules and technologies to integrate with Hadoop to enable your business to achieve its Big Data goals — and maybe even go significantly beyond what you can envision today.

So, your organisation just went through yet another restructure! You notice that the new structure does not look very different to the last one 6 months ago and again not vastly different to how it was 20+ years ago and likely to remain for the foreseeable future, with the exception of Heads that change! Source: England.nhs.uk By keeping the general organisation design this way the top level management can maintain span of control by establishing boundaries and rules of behaviour to ensure certainty that the organisation's resources are efficiently managed to provide best return on investment. This sort of organisation structure, generally recognised as mechanistic or bureaucratic is commensurate with a view that strategy is formed at the top of the organisation and the rest of organisation is seen as a means of implementing the strategy. While generally not visible in the organisation chart, other forms of design co-exist (e.g. matrix structures) in most organisations, to enable new product development, geographic integration and cross-functional coordination.

So, your organisation just went through yet another restructure! You notice that the new structure does not look very different to the last one 6 months ago and again not vastly different to how it was 20+ years ago and likely to remain for the foreseeable future, with the exception of Heads that change! Source: England.nhs.uk By keeping the general organisation design this way the top level management can maintain span of control by establishing boundaries and rules of behaviour to ensure certainty that the organisation's resources are efficiently managed to provide best return on investment.

Added by Bernard Marr on August 26, 2015

Recently I participated in a webinar panel with IT and marketing leaders on building alignment between the CIO and CMO, and surprisingly every speaker agreed: most of us are there or well on our way. The next frontier is building a collaboration-centric culture across the company, and data is the place you start. Just about every business function now needs to collect, connect and visualize data in order to better measure outcomes and prove results. Now one of the defining factors to sustaining a competitive advantage, data is also the most promising area your company can focus on to build bridges, dissolve siloes and see better business results.

Here is my top 7 list of daft things that some people say about Big Data. I think that Big Data does play a role in some businesses. I also think that some of the basic distributed file store and text search technologies can be usefully employed, in non-traditional indexing, counting and correlation. However, there is an awful lot of nonsense said about Big Data. So, onwards and upwards. Big Data is like currency If Big Data is currency, and for most of us, it isn't, then it's more like the hyperinflationary money of the Weimar Republic, rather than something you would take to the bank or try and buy the weekly grocery with. Big Data might have value, no doubt some of it does — it can't all be dross, can it?But, that doesn't make it a solid financial asset class — that's just dopey.

The updated Google Nest thermostat can now sense your presence from across the room.

Amazon has introduced a new mobile app monetization model dubbed Amazon Underground and linked with their own Amazon app store. The new model provides "actually free" apps to customers while developers are paid based on how long their apps are used.

Submitted by ColourFast. Enjoy! DSC Resources Career: Training | Books | Cheat Sheet | Apprenticeship | Certification | Salary Surveys | Jobs Knowledge: Research | Competitions | Webinars | Our Book | Members Only | Search DSC Buzz: Business News | Announcements | Events | RSS Feeds Misc: Top Links | Code Snippets | External Resources | Best Blogs | Subscribe | For Bloggers Additional Reading Data Scientist Reveals his Growth Hacking Techniques 10 Modern Statistical Concepts Discovered by Data Scientists Top data science keywords on DSC 4 easy steps to becoming a data scientist 13 New Trends in Big Data and Data Science 22 tips for better data science Data Science Compared to 16 Analytic Disciplines How to detect spurious correlations, and how to find the real ones 17 short tutorials all data scientists should read (and practice) 10 types of data scientists 66 job interview questions for data scientists High versus low-level data science

Save your office of finance from crisis by preparing your financial performance management for the future. Learn how to help your finance function reach its full potential by adopting best practices and avoiding pitfalls along the way.

A new tool from SAP will allow companies to analyze distributed Hadoop data alongside corporate data using the ERP giant's Hana in-memory computing platform. Announced on Tuesday, SAP Hana Vora is an in-memory query engine that taps the Apache Spark execution framework to deliver interactive analytics on Hadoop. [ Also on InfoWorld: Harness the power of Hadoop — find out how in InfoWorld's Deep Dive report. | 18 essential Hadoop tools for crunching big data. | Get a digest of the day's top tech stories in the InfoWorld Daily newsletter. ] By extending Hana's reach to include distributed data in the Hadoop ecosystem, the tool is designed to help data scientists and developers combine corporate and external data in their analyses. That, in turn, means incoming data from customers, partners, and smart devices can be integrated with that from internal enterprise processes, giving companies better context with which to make decisions, SAP said.

A new tool from SAP will allow companies to analyze distributed Hadoop data alongside corporate data using the ERP giant's Hana in-memory computing platform. Announced on Tuesday, SAP Hana Vora is an in-memory query engine that taps the Apache Spark execution framework to deliver interactive analytics on Hadoop. [ Also on InfoWorld: Harness the power of Hadoop — find out how in InfoWorld's Deep Dive report. | 18 essential Hadoop tools for crunching big data. | Get a digest of the day's top tech stories in the InfoWorld Daily newsletter. ] By extending Hana's reach to include distributed data in the Hadoop ecosystem, the tool is designed to help data scientists and developers combine corporate and external data in their analyses. That, in turn, means incoming data from customers, partners, and smart devices can be integrated with that from internal enterprise processes, giving companies better context with which to make decisions, SAP said.

IBM's recent embrace of Apache Spark is beginning to generate dividends in the form of open source contributions for a mainframe big data link to Spark. Big data software vendor Syncsort, Woodcliff Lake, N.J., said Tuesday (Sept. 1) it is contributing an IBM z System mainframe connector for Apache Spark that would allow easier access to mainframe data using Spark's analytics and Spark SQL. The company described its latest mainframe connector as being similar to the Apache Sqoop link it released as open source software last year. That connector allows Hadoop users to import and analyze data coming from the z System mainframe environment. The new Spark connector is designed to ease specifying the location of multiple datasets and associated metadata.

This entry was posted in News. Bookmark the permalink.