Big Data News – 16 Sep 2015

Top Stories
As an Internet-connected company, your corporate network has probably already been hacked and will continue to be under attack from cybercrime. The cybersecurity threat landscape is sophisticated and constantly evolving. Financial results, customer and employee identities, and your company's reputation are all at stake. Your organization needs to protect its operational business system's integrity, critical data assets and intellectual properties, whether on-premise or in the cloud. Safeguarding your computing and data resources can seem like a monumental task…

The IT culture was largely established around the high demands of enterprise reporting and corporate dashboards. These demands inspired IT to set up strict rules for governance, policies, and security. Enterprise reporting also helped drive the creation of new assets, like semantic descriptions, or universes. From one universe to fifty universes…these new assets proliferated providing

Cities around the world are using technology to improve the lives of their citizens. See what puts these eight communities ahead of the curve.

Ian Culling talks about the state of agile adoption, how organisations want to buy "the DevOps" and new features in the VersionOne product suite

Like most every other major company with a cloud platform, Salesforce is launching an Internet of things platform. It's called IoT Cloud, and it turns connected devices into useful customer data. In many ways, Salesforce has a leg up on the second half of that proposition, given the success of its CRM platform for companies making use of customer data. But does IoT Cloud mark a real turning point for Salesforce, or is it another upsell for existing customers? Here are four of the most crucial things we've gleaned so far about what's new and what's not.

Whenever you hear someone complain about developer productivity, just slap them. Having slogged through hundreds of open source projects each year for the past several years, I can assure you that developers are extremely productive. Every time we put together this package — InfoWorld's annual Best of Open Source Awards, aka the Bossies — I end up wishing developers were just a little less on the ball. Of course I had help. InfoWorld is fortunate to draw on many contributors who work as software developers, systems integrators, data scientists, security consultants, and networking gurus in real life. If there's an open source tool in their domain they don't already know, they're eager to dive in. Thanks to the hard work of these experts, our 2015 Bossies encompass some 100 winners in six categories:

An interview about recent developments in requirement definition and management, how agile teams handle requirements and which problems they face in their daily work, using interactive diagrams and prototypes for conveying requirements, how interactive prototyping can be used with a lean startup approach, and what the future will bring us in requirements definition and management.

IBM has announced LinuxONE, a Linux-only hardware portfolio which runs SUSE, Red Hat or Ubuntu distributions and adds support for different open-source tools such as Docker and Chef. This offering is targeted to both large enterprises and mid-size businesses.

Infographic based on the article: Originally posted on Data Science Central

BASICS: DEFINING BIG DATA AND RELATED TECHNOLOGIES. According to , Big Data is high-volume, high-velocity and high-variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making. Are there examples of Big Data in action at major global IT giants? It is evident from that Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques. It requires different techniques, tools, algorithms and architecture. Some of Big Data tools and technologies are Apache Hadoop, Apache Spark, R Language and Apache ZooKeeper. Facebook and Google heavily rely on Big Data.

Data storage licensing specifications released this week by a consortium of technology companies would boost tape storage capacity for archival and other long-term storage to as much as 15 terabytes. The LTO (Linear Tape-Open) Program, which offers different licensing packages and specifications for its open tape format, said Monday (Sept. 14) its latest specification would more than double tape storage capacity up to 15 terabytes per cartridge when compressed. It also specified tape drive transfer rates for large files up to 750 Mb/sec., or more than 2.7 terabytes of data an hour per drive.

Hadoop gave us a taste of the types of insights big data analytics can deliver, and provided a competitive advantage to early adopters. But as data volumes grow, as the time-to-insight window shrinks, and as technology continues to improve, companies who wish to remain competitive will have no choice but to adopt real-time analytics. We're on the cusp of a period of rapid growth in real-time analytics, according to industry experts. 

A new program from the Obama White House is looking to spend $160 million on so-called Smart Cities to improve communication and quality-of-life through technology and the Internet of Things.

Democratizing Data and Predictive Analytics While Ensuring Governance & Transparency  As organizations empower more users to fully leverage advanced and predictive analytics to "democratize" their data and bring insights to the masses through interactive visual dashboards, they also need to provide enhanced data transparency, collaboration, and governance to gain broader acceptance and trust of the data.

Pinterest has rolled out an open source solution aimed at making it easier to query large data sets when working with Hadoop. Terrapin, announced at Facebook's @Scale conference, was originally devised by Pinterest to replace the scalable Hadoop data store HBase. The idea was to provide a fast way to store and run key-value queries against large immutable data sets generated by MapReduce jobs or stored in S3 or HDFS volumes.

If big data has reached a point where it has simply become essential to everything else, why are so many companies still struggling to put their data to good use? The big data furniture Big data used to drive big page views and consulting fees, but now, argues Heudecker, "the various topics formerly encompassing big […] The post Your Big Data Strategy is a Bust appeared first on Predictive Analytics Times.

In 2013, the IRS paid out $5.8 billion in refunds for tax filings it later realized were fraudulent, according to a 2015 report by the Government Accountability Office. This news comes as no surprise to the Kentucky Department of Revenue, which is stepping up its own war against rising fraud cases with predictive analytics. (Insider Story)

At the Cincinnati Insurance Companies, analytics is bridging the gulf between IT and business units and leading by example to form common measures and objectives.

Eric Schmidt, the newly installed chairman of Alphabet, has written an article for the BBC about how AI technology is just starting and how Google sees its uses.

by John Mount (more articles) and Nina Zumel (more articles) When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's…

  by John Mount (more articles) and Nina Zumel (more articles) When fitting and selecting models in a data science project, how do you know that your final model is good? And how sure are you that it's better than the models that you rejected? In this Part 3 of our four part mini-series "How do you know if your model is going to work?" we develop out of sample procedures. Previously we worked on: Part 1: The problem Part 2: In-training set measures Out of sample procedures Let's try working "out of sample" or with data not seen during training or construction of our model. The attraction of these procedures is they represent a principled attempt at simulating the arrival of new data in the future.

The recent release of the 2016 Best Colleges rankings by U.S. News has Jen wondering if such rankings, and the institutions of higher learning themselves, should capitalize on analytics to understand the impact of the drop-out rate. Can analysis of real-time data help colleges and universities address student retention?

Published Date: 2015-09-15 14:32:09 UTC Tags: Analytics, Big Data, Chief Data Officer, Data Science, Data Warehousing Title: Big Data Is Dead – Long Live Fast Data Subtitle: Are we looking at a time when speed trumps size?

A McKinsey report finds growing demand for connected services and autonomous technology in vehicles.

Connected vehicles are generating a lot of positive buzz around their potential safety and reliability benefits; however, there are also concerns that they're not as secure as they should be. In this Q&A with Tim Hahn, chief architect, Internet of Things security at IBM, he addresses the challenges facing automakers as this technology becomes more common.

Hadoop is great for storing and processing large data volumes, but its limits become clear when integrating ever-increasing volumes of data. A new solution-described in detail at the upcoming Strata+Hadoop World conference-can help organizations overcome this limitation.

Salesforce tries to inject more analytics into customers' operations as they use CRM data and third-party applications.

Ninety-one percent of retailers are present on two or more social media channels. These outlets provide a wealth of information to retailers, but are they taking advantage of the opportunities that social media metrics offer?

Risk and strategy, much like chocolate and coconut, is a winning combination you can't argue with. There has always been a great amount of "noise" from consulting firms and experts on the fact that companies should integrate both risk and strategy, and over the past few months this topic seems to be an increasing trend (more…)

Rolls-Royce manufactures enormous engines which generate huge amounts of power, as they propel airplanes and ships across skies and oceans. It is an extremely high-tech industry where failures and mistakes can cost billions — and even human lives. It's no surprise then that the company — which split from its automobile manufacturer parent company following insolvency in 1971 — has wholeheartedly embraced Big Data.

Apple Music has changed my life. Already. I live in Norway, so it didn't roll out here until late on Thursday night. By Friday night, I felt like my life had substantially improved. You see I had lost my connection with music. When I was younger, music was such an important part of my life — most of my memories from my teenage years and 20's have their own soundtrack. Lately, not so much. I don't get on with Norwegian radio. Not many of my friends are serious music people. I went from buying new music constantly to pretty much never.

Marketers need high quality, complete, valid and accurate data to make good quality decisions, and to avoid the need to scrap a project and start again from scratch. The more data we have, the bigger the potential for waste, and the more expensive it is to put off the data quality initiatives we need.

Even after thoroughly defining your exact plans for your embedded analytics, don't forget that Business Intelligence is, to a large extent, the realm of the uncertain. The amounts and types of data we collect today would have been incomprehensible a few years ago, and there's no reason to believe they will remain identical in a few years' time.

How often do you hear about networking when it comes to analytics projects? Probably not enough, especially when it comes to manufacturing and IoT. This is what to consider.

After a year on the market, Salesforce Wave, the BI platform, gets an upgrade. But there are continuing question marks about its value.

In case you missed them, here are some articles from August of particular interest to R users.  Creating interactive time series charts of financial data in R. Many R books have been translated into Chinese.  A tutorial on visualizing current-events geographic data with choropleths. Revolution R Enterprise 7.4.1 is now available on Windows and Linux servers and in the Azure Marketplace. Zillow uses R to estimate the value of houses and rental properties. There's a new (and free) online course on edX for R beginners, sponsored by Microsoft and presented by DataCamp.  Mini-reviews of 5 new R packages: AzureML, distcomp, rotationForest, rpca, and SwarmSVM. The R Consortium's best practices for secure use of R.

EMEA Value Engineering, Discrete Industries The lightning pace of technology innovation and heightened competition is forcing organizations across industries to 'anticipate & act' rather than 'respond or react'. The Internet of Things (IoT) is a key protagonist of these disruptive forces and no CEO or CIO or CDO wants to miss the…

NoSQL database technology is built for processing speed and data flexibility. But it needs a fuller feature set and more skilled workers to make it big in enterprise applications.

by Andrie de Vries Next week the 2nd EARL (Effective Applications of the R Language) conference starts in London, from Monday 14th to Wednesday 16th of September. Last year's inaugural event was a huge success, with excellent speakers on a variety of topics.  I was especially impressed with the number of attendees from continental Europe. I had many interesting discussions with people traveling from France and Germany, in particular.

This entry was posted in News and tagged , , , , , , , , , . Bookmark the permalink.