Big Data News – 29 Apr 2016

Top Stories
The Internet of Things hasn't quite caught on with consumers. But developers are working on changing that by rolling out innovative projects. Here's a look at some on display at the recent Samsung Developer Conference.

Amazon showed off its dominance in the public cloud market on Thursday as the capstone to a better than expected quarterly earnings report. Revenue from Amazon Web Services during the first quarter of 2016 was up 64 percent year-over-year, showing the big money that's still out there as companies invest more and more in the public cloud.  Amazon's cloud platform generated revenue of $2.56 billion, putting it on pace to make $10 billion this year, in line with a letter from CEO Jeff Bezos sent to shareholders earlier this month. That's big money to go with Amazon's massive customer base, which includes names like Netflix, Time Inc. and Intuit. 

Omni comes from the word Omnis which can mean all or universal. Omni-channel is about true continuity of your experience and has become heart of Digital Transformation. Today, we live in an…

Arduino announced that Intel has released Arduino 101 real time operating system (RTOS) for hacking and studying purpose.

Google Cloud Platform (GCP) Conference 2016 A few weeks ago now VLDB attended Google Global Cloud Conference at Pier 48, overlooking the Giant's ball park, in San Francisco. The user conference was a platform to introduce products and services to its current community and convince the rest of us that Google is the answer. Google isn't seen to be a market leader in 'Cloud'.  In a recent research note, Deutsche Bank Investment analysts predicted that GCP is on a $400M run rate, which is roughly 20 times less than AWS's. But it's not about your revenue…

Guest blog post by Christopher Dole and other contributors, originally posted here. Created by SoothSayerAnalytics.  Deep Learning is one of the most revolutionary and disruptive technologies ever developed in Data Science.  Essentially, this is a class of algorithms inspired by how the human brain works, and it has the ability to automate and replace most of the world's jobs.  This is what enables self-driving cars to function and what allows Spotify to create very customized playlists and recommendations. 

Instagram's continuous deployment pipeline lets them push code faster to production, identify bad commits easily and always be release ready. Put together in an iterative manner over a period of time, the key principles behind it include a high quality test suite, quick identification of bad commits, visibility at each stage to improve buy-in from stakeholders and a working rollback plan. By Hrishikesh Barua

Big data and its conjoined twin analytics are the business buzzwords of the decade to be sure — and for good reason. Because of advances in technology and computing, we're generating more data than ever before.  A lot more.  And we're learning how to put it to good use. Whether you're the IT guy trying to convince your boss that analytics are where he should be investing, the boss just trying to understand it all, or the analyst trying to explain to… pretty much anyone what you do, these astonishing facts about the data we create, how we use it, and how much of it there is will amaze just about anyone. Less than 0.5% of all data we create is ever analysed and used.

What does big data know about you? Quite a lot. Every time we use a computer, access our phones, or open an app on a tablet, we're leaving a digital trail. Most people are vaguely aware that Google knows what they've searched for, or that Facebook knows who their friends are, but it goes much, much deeper than that. I've compiled a list of 21 things Big Data knows about almost every one of us — right now: Of course, Google knows what you've searched for. So do Bing, Yahoo!, and every other search engine. And your ISP knows every website you've ever visited. Ever (even in private browsing). Google also knows your age and gender — even if you never told them. They make a pretty comprehensive ads profile of you, including a list of your interests (which you can edit) to decide what kinds of ads to show you. Facebook knows when your relationship is going south.

Amazon showed off its dominance in the public cloud market on Thursday as the capstone to a better than expected quarterly earnings report. Revenue from Amazon Web Services during the first quarter of 2016 was up 64 percent year-over-year, showing the big money that's still out there as companies invest more and more in the public cloud.  Amazon's cloud platform generated revenue of $2.56 billion, putting it on pace to make $10 billion this year, in line with a letter from CEO Jeff Bezos sent to shareholders earlier this month. That's big money to go with Amazon's massive customer base, which includes names like Netflix, Time Inc., and Intuit. 

So, you have big plans for big data. You've picked out a lovely infrastructure and it's time to get started. But one question remains: which language will you inflict on, we mean, insist that your developers and data scientists use? The reigning champs these days are R, Python, Scala, SAS, the Hadoop languages (Pig, Hive, etc.), and of course, Java. At last count, a scant 12 percent of developers working with big data projects chose to use Java. Almost half of all big data operations are driven by code programmed in R, while SAS commanded just over 36 percent, Python took 35 percent (down somewhat from the previous two years), and the others accounted for less than 10 percent of all big data endeavors.

AI technologies such as machine learning will play a key role in shaping the future, Google CEO Sundar Pichai said in the company's annual Founders' Letter to stockholders on Thursday. "It's what has allowed us to build products that get better over time, making them increasingly useful and helpful," wrote Pichai, who cited examples such as voice search, translation tools, image recognition and spam filters.

Let's face it: the world is a dangerous place. From terrorist attacks and earthquakes to outbreaks of hemorrhagic fever and pickpockets, there are a thousand ways for an American to stumble into trouble when travelling abroad. Now a new app from Prescient aims to help travelers stay safe by tapping into real-time data feeds, analyzing threats, and sending alerts when users wander too close to danger. Prescient Traveler is a mobile app that functions as a digital concierge for businesspeople, tourists, and students as they travel around the U.S. and the world. Conceptually similar to an application Prescient developed for the Defense Intelligence Agency, Prescient Traveler helps people stay aware of emerging and persistent threats as they venture to strange locales, both foreign and domestic.

Nobody likes to be delayed or have their travel plans disrupted. By using data analtyics, airlines can detect potential operational problems before an issue actually occurs and reduce disruptions.

Governments are increasingly moving away from a one-size-fits-all model toward more personalized governing, where public services are tailored to meet the needs of residents. This type of governance is built on a firm understanding of citizens' needs, information that is best gathered directly.

Spark's momentum is building, and it is rapidly emerging as the central technology in analytics ecosystems within organizations. See why Spark's technical advancements around iterative processing combined with its easy overall environment and tool set for developers make it a true operating system for big data analytics.

Social media has taken the retail industry on an extraordinary, tangential path for gaining a deeper understanding of the consumer. See what several big data experts have to say about the impact social media data and analysis of customer conversations have on the consumer experience.

With the threat of security breaches increasing, VDI continues to grow in relevance and importance as a way to deliver security-by-design.

While introducing wearables into enterprise functions has clear benefits, organizations must address issues concerning privacy and security.

Dark fiber is seen as a potent tool for mobile backhaul, bypassing chokepoints in the network and other important tasks.

Ransomware has certainly captured the attention of the media and hospitals across the country.  The poster child of this trend is Hollywood Presbyterian Medical Center (HPMC) in Los Angeles. Earlier this year, HPMC was the victim of a Ransomware attack and paid $17,000 to get the key and access their files again. More recently, the 10-hospital Medstar system in the Washington, DC area was attacked and asked for 45 bitcoins (about $18,500), although the hospital claims to have restored its data without paying the ransom. In between these attacks, three other hospitals were also victims. Methodist Hospital in Henderson, Kentucky reportedly paid a $17,000 ransom in early March and Prime Healthcare Management hospitals in Chino and Victorville, California were targeted by the same ransomware as HPMC, but restored its systems without payment.

Section 179 lets business owners deduct purchases of depreciable business equipment instead of capitalizing and depreciating the asset.

Will we actually see improved security for wearables and the IoT? That's the real question.

Watch this on-demand webinar series to take deep dives into predictive extensions, big data algorithms and cognitive capabilities. Each webinar features detailed demonstrations of techniques that can help you achieve more insightful data analysis. * Expanding the possibilities: How to enhance IBM SPSS Statistics functionality with predictive extensions Get more from IBM SPSS Statistics 24 by learning how to install and build extensions using R, Python or SPSS Syntax to expand on its features and functionality. Watch now. * No coding required: Expand your data mining toolset with predictive extensions See how predictive extensions for IBM SPSS Modeler can open the door for professional analysts and business users to apply new functionality without writing code. Watch now. * Turning big data into big insight: New algorithms for big data analytics The latest release of IBM SPSS Modeler is specifically designed to handle more types and sources of data.

The SAP HANA Center of Excellence is focused on helping our customers create systems of innovation, and typically that requires using historical data to produce a solution based on advanced analytics or machine learning. Whenever we start working on a project, I inevitably ask the customer the following question. "How important is it that you…




Who is Pitney Bowes? For some, the 96-year-old company is synonymous with 20th century technology, like mailing machines and postage meters. But after Tuesday's launch of its new Commerce Cloud, it's clear the company sees its future helping small and midsize businesses better navigate the global commerce, with data and analytics playing key roles.

Russell Nash is a Solutions Architect with AWS. Amo Abeyaratne, a Big Data consultant with AWS, also contributed to this post. One of the most powerful features of Amazon EMR is the close integration with Amazon S3 through EMRFS. This allows you to take advantage of many S3 features, including support for S3 client-side and server-side encryption. In a recent release, EMR supported S3 server-side encryption with AWS KMS keys (SSE-KMS), alongside the already supported SSE-S3 (S3 managed keys) and S3 client-side encryption with KMS keys or custom key providers.

Oracle is spending $663 million to buy Textura, a company that offers cloud services for the engineering and construction industry. Textura's products will be combined with Oracle's existing Primavera project-management suite — the result of a 2008 acquisition by the database giant — in the Oracle Engineering and Construction Global Business Unit, Oracle announced on Thursday.

Confluent hosted the first Kafka Summit focused on use-cases for the open source real-time streaming and messaging technology for big data — featuring support and integration announcements from partners. Here's why this technology is so hot right now.

I just released a new screencast course for O'Reilly Media: Jupyter Notebook for Data Science Teams! First, some background: the Jupyter Notebook (evolved from the IPython Notebook) has been a favorite tool of people who use Python, R, Julia, and many of the other languages that it supports. Data scientists and researchers, in particular, have taken up Notebooks. There are many reasons it's so popular, but to pick a couple: The inline plotting that shows the output in the same document that one codes in allows rapid data visualization and quick iteration of code ideas.

Saswat Panigrahi, Google, says it can cost millions of dollars to upgrade to Windows 10. Chrome OS provides a less expensive alternative.

Oracle is spending $663 million to buy Textura, a company that offers cloud services for the engineering and construction industry. Textura's products will be combined with Oracle's existing Primavera project-management suite — the result of a 2008 acquisition by the database giant — in the Oracle Engineering and Construction Global Business Unit, Oracle announced on Thursday. The focus of that unit will be offering a comprehensive cloud-based project control and execution platform that manages all phases of engineering and construction projects.

Contributed by Chuck Currin of Mather Economics: There's tremendous value in corporate data, and some companies can maximize their data value through the use of a data lake. This assumes that the adopting company has high volume, unstructured data to contend with. The following article describes ways that a data lake can help companies maximize the value of their data. The term "data lake" has been credited to James Dixon, the CTO of Pentaho. He offered the following analogy: "If you think of a datamart as a store of bottled water — cleansed and packaged and structured for easy consumption — the data lake is a large body of water in a more natural state.

In this podcast, Gabor Samu from IBM describes the newly available IBM Platform LSF Suites for Workgroups and HPC. Designed to make it much easier to "kick the tires" on LSF, the new suites can help you configure install, maintain, and job manage HPC clusters with a single download. "The new IBM Platform LSF Suites are packages that include more than IBM Platform LSF, they provide additional functionalities designed to simplify HPC for users, administrators and the IT organization." Watch the video presentation.

In the past few years, we've seen an explosion in the number and variety of organizations that are adopting big data technologies such as Hadoop and Spark and the recent trend to leverage data services in the cloud. How are enterprises coping?

Mark Zuckerberg expects artificial intelligence will progress to make computers better than humans at basic sensory perception within the next 10 years, and that Facebook will end up knowing a lot more about you than it does now. The prediction is the latest from a top tech CEO to indicate the fast improvement being made in machine learning systems that just a few years ago would have struggled to recognize a dog from a cat.

This entry was posted in News and tagged , , , , , . Bookmark the permalink.