Big Data News – 07 Jun 2016

Today's Infographic Link: Flight videos deconstructed

Featured Article
Co-authored by Saeed Aghabozorgi and Polong Lin. Data Scientists and Data Engineers may be new job titles, but the core job roles have been around for a while. Traditionally, anyone who analyzed data would be called a "data analyst" and anyone who created backend platforms to support data analysis would be a "Business Intelligence (BI) Developer". With the emergence of big data, new roles began popping up in corporations and research centers — namely, Data Scientists and Data Engineers.

Top Stories
More companies are contributing to open source projects, but the management of open source software is still chaotic.

More and more businesses are waking up to the threat of poor data quality. We're gradually seeing the risk being taken more seriously as the shockwaves of poor management are felt.

A new version of ransomware has been detected recently that not just holds the data hostage and the victim's machine until a ransom has been paid but also has the machine exploited as a part of DDoS attack. This implies that the victim cannot access the endpoint. Service is denied to another victim utilizing the same end point. This results in two attacks.

IBM rolls out a new integrated development environment in the cloud to build on its investment in Apache Spark and enable data scientists to collaborate with each other, regardless of their preferred language or data source.

Egnyte, an enterprise-focused file sync and sharing startup, is expanding beyond its roots in holding onto companies' data for them, and now aims to protect any data a company has, no matter where it's stored. Egnyte Protect is a service that aims to provide a single tool for controlling and securing company data that's stored in private data centers and in the public cloud.

IBM's cloud-based integrated development environment for building analytics applications is aimed at data scientists of almost any skill level.

"If someone asks me what cloud computing is, I try not to get bogged down with definitions. I tell them that, simply put, cloud computing is a better way to run your business." — Marc Benioff, CEO of As one of the earliest proponents of the SaaS ideology it's not surprising that Benioff believes, so completely, in the abilities of the Cloud to transform businesses. What is making news, though, right now is just how many businesses have come around to the Benioff way of thinking! 

The Nest cofounder and former CEO plans to advise Alphabet and Larry Page, while Nest looks to improve revenue.

In its attempts to market The Machine, HPE showcases the kind of questionable decisions public companies continue to make.

MapR Technologies, Inc., provider of the industry's converged data platform, announced a new enterprise-grade Apache Spark distribution. This new distribution includes the complete Spark stack engineered to support advanced analytic applications, along with patented innovations in the MapR Platform, plus key open source projects that complement Spark.

In this contributed article, Nicholas Lee, Head of Global Digital Programs for Fujitsu, takes a close look at the virtues of artificial intelligence from the lens of security in terms of how it will help us mitigate risk and identify threats.

With the third baby born in the US with microcephaly related to Zika virus, the disease is again making headlines. Researchers are working diligently to find prevention and treatment methods, and as is for any research, collaboration and data sharing are critical to this process. I learned recently at a Zika Hackathon, hosted by Cloudera Cares, public data sets related to Zika are hard to find. Hackers have resorted to developing code to scrape CDC, WHO and other websites to collect information for research when public data is not available.

Indian business intelligence (BI) software revenue is predicted to reach $214 million in 2016, an 18.6% increase over last year's $180 million, market research firm Gartner reported.

The majority of surveyed business execs said big data is a driver of revenue and is becoming as valuable to their businesses as existing products and services.

News: The move is part of the company's plans to use open-source technologies, a larger trend under CEO Satya Nadella.

Making sense of data can involve a wide variety of tools, and IBM is hoping to make data scientists' lives easier by putting them all in one place. The company on Tuesday released what it calls Data Science Experience, a new development environment in the cloud for real-time, high-performance analytics. Based on data-processing framework Apache Spark, Data Science Experience is designed to speed and simplify the process of embedding data and machine learning into cloud applications. Included in the new offering are tools such as RStudio and Jupyter Notebooks.

There's a lot of sci-fi-level buzz lately about smart machines and software bots that will use big data and the Internet of things to become autonomous actors, such as to schedule your personal tasks, drive your car or a delivery truck, manage your finances, ensure compliance with and adjust your medical activities, build and perhaps even design cars and smartphones, and of course connect you to the products and services that it decides you should use.

C-level briefing: The Telegraph's CTO talks to CBR about vendor lock-in and his changing role in the midst of digital disruption.

BI isn't new but confusion remains about what it is.

MLeap interview: Productionizing Data ScienceIn this episode, we have an interview with Hollin Wilkins and Mikhail Semeniuk, the driving forces behind the MLeap project. If you are working with Spark, are deep into machine learning and are struggling to put those beautifully trained models into production, you definitely do not want to miss this episode!

In 2013, Ron Howard directed and released the movie Rush, a film that captured the rivalry between James Hunt and Niki Lauda during the 1976 Formula One racing season. It's a vivid portrait of the…

Making sense of data can involve a wide variety of tools, and IBM is hoping to make data scientists' lives easier by putting them all in one place. The company on Tuesday released what it calls Data Science Experience, a new development environment in the cloud for real-time, high-performance analytics. Based on data-processing framework Apache Spark, Data Science Experience is designed to speed and simplify the process of embedding data and machine learning into cloud applications. Included in the new offering are tools such as RStudio and Jupyter Notebooks.

Reimagine the data science experience as an open experience with this IDE, which aims to facilitate a full range of development tasks, from data acquisition and data mining to prototyping and programming. When you do, discover how you can use Apache Spark and R to pursue open analytics by building your own data science applications.

Navistar Director of Data Science Gyasi Dapaa discusses the coexistence of open source and enterprise software in today's data scientist world.

When Susan Thomas graduated with a technical degree in the 80s, everyone assumed she'd teach. Instead, she became a coder and programmer.

Schrödinger's Cat is a thought experiment developed by Erwin Schrödinger (1887-1961) to illustrate that micro-scale quantum effects can be made to produce real (and quite bizarre) effects in the real world. Being Slightly Dead Figure 1 — Schroedinger's cat, thankfully alive In this case, Schrödinger uses superposition on atomic scale to affect the life-span of a cat. The cat is placed in a sealed box with a radioactive material and a Geiger counter. The half-life of the radioactive material is known. The Geiger counter ensures that if the material has decayed, poison is released and the cat dies (don't blame me, I am just the messenger).

We now work in a world in which data flows to and from the cloud across platforms, geographies and applications. In his session at 18th Cloud Expo, Jeff Greenwald, Senior Director of Market Development at HGST, will discuss the millennial approach to managing how data is stored, protected, accessed, shared, and governed. Attendees will learn that employing the right infrastructure will set your business free, even within the most demanding use cases — from business continuity to backup and archive.

Deep coordination is one of the many challenges that smart cities face, along with costs and standards.

According to Gartner Research, by 2020 the total number of connected cars will be nine times more than that of 2015. Additionally, 80% of all new vehicles will have data connectivity, 30% of connected-vehicles will have built-in, over-the-air software capabilities, and over one billion connected automotive subsystems will be shipped. With the exponential growth of… The post Accelerating Connected Car Data Science & Machine Learning appeared first on Hortonworks.

It's time for Bermuda shorts and ridiculous tourist-style Hawaiian tops, flip flops, and sunburns. But where should the tech geeks among us head for this year? If you aren't into beach bumming and overpriced amusement parks, there are some quite nifty tourist attractions specifically designed for the nerds and geeks among us. Here are a few of your options. 1. The Computer History Museum in Mountain View, California Today's mainframes are nothing like those of yesteryear.

The bad guys haven't forgotten about sites like MySpace, and chances are good they'll find that treasure and use it to their advantage.

Ross Ihaka, one of the co-creators of R (along with Robert Gentleman), recently gave an interview to the University of Auckland's alumni magazine, Ingenio. In the article, he shares the story of the…

The latest edition of the Gartner BI Summit rolls out this week in Mumbai, India.

The enterprise has been scaling up resources for quite a while now, but as the old saying goes, "You ain't seen nothin' yet."

The MIT Forum for Supply Chain Innovation and Infosys Global Risk Advisory Group announced that it has released preliminary findings from its global risk survey, led by MIT Professor David Simchi-Levi and conducted with Infosys President Ravi Kumar, co-leader of the Global Risk Group. The group will issue a full report on the results, titled, "The Risk of Complexity in a Digital Economy."

IBM is joining the R Consortium with a "significant investment," the company is scheduled to announce at today's Apache Spark Summit, becoming a top-tier Platinum supporter of the open-source R programming language. R, designed specifically for statistical computing and other data analysis tasks, has become increasingly popular in recent years as both data volumes and interest in data science have exploded. IBM says that R is among the languages it used to develop its Watson natural language/machine learning platform.

In this age, all systems need to be up all the time and run efficiently. It is not that simple and most of us need help and advice from experts who have done this before. The great news is that there are a set of experts ready to help and share their experiences at Hadoop Summit San… The post Expert Advice on Cloud and Operations appeared first on Hortonworks.

Microsoft, with its Hortonworks-based cloud Hadoop distro, and MapR with its own Hadoop-powered wares, each pivot toward Apache Spark.

It's not a secret that Lenovo has had designs on being a major player in the data center, ever since it moved to acquire x86 server business of IBM.

By: Steven Ramirez, Conference Co-Chair, Text Analytics World Chicago In anticipation of their upcoming conference co-presentation, Understanding our Customers' Customers' Customers' Needs – Text Analytics for B-to-B Businesses at Text Analytics World Chicago, June 21-22, 2016, we asked Michael Dessauer, Data Scientist at The Dow Chemical Company and Justin Kauhl, Computational Linguistics Expert at The Dow Chemical Company, a few questions about their work in text analytics. Q: In your work with text analytics, what behavior or outcome do your models predict?

What is big data? Do you really understand what it is? Try to recall the perfect definition of big data. Umm…not getting, right? In simple terms, big data is large volume of data which may be unstructured or structured generated by businesses. No, don't connect this data with volume. Actually, it's about the management of data by the organizations.Audience is still confused with big data. Here are five signs that you don't understand big data –Big Data! Ignore it…

People who know each other and trust each other will collaborate more and get more done.

For some, the big data revolution represents a clear progression toward a new (and inevitable) way of seeing the world. Like the driverless cars that pilot themselves down I-280 during morning rush hour in San Mateo County, big data seems to offer a tantalizing glimpse of a world free from nearly all human error. The organizational theorist Geoffrey Moore goes so far as to argue that "Without big data, you are blind and deaf and in the middle of a freeway." But consider the recent, real-life instance of John Gass, a middle-aged driver from Massachusetts with a nearly flawless driving record who received an automated notice informing him that his license was being revoked without further explanation, effective immediately.

Apple's Worldwide Developer Conference kicks off next week. While the company will likely introduce new products, what it really needs is a new attitude.

Collective search for people and information has tremendously benefited from emerging communication technologies that leverage the wisdom of the crowds, and has been increasingly influential in solving time-cr…

The best way to design memorable experiences is by listening to your customers and understanding what they want and how they feel. Here's how to do that. Keep on reading: Mining Your Data to Craft Better Customer Experiences

This entry was posted in News and tagged , , , , , , , . Bookmark the permalink.