Big Data News – 01 Jul 2016

Today's Infographic Link: Cheetah: Nature’s Speed Machine

Featured Article
"Fail fast" is one of the credos that has come to permeate our big data culture. The quicker we can get the bad ideas out of the way, the quicker we can find the ones that work. But as MythBuster Adam Savage said at the MongoDB World conference in New York City this week, there are different degrees of failures, and different paths to success. "There's failure, and then there's failure," Savage told an audience of about 1,500 people Tuesday at the Hilton Midtown, where MongoDB held its conference. "When we're talking about failure, I'm talking about 'small f' failures.'Big F' failure is 'I missed my son's bar mitzvah because I was drunk.'

Top Stories
Instead of having fragmented copies of data everywhere, Cohesity uses APIs to access data that is indexed in an object-based storage system.

Ideally, the ultimate output of big-data analysis can provide a company with a valuable competitive advantage. But those results aren't getting much additional security, according to an IDG Enterprise study of big-data initiatives.

If you'd like to make a good prediction, your best bet is to invent a time machine, visit the future, observe the value, and return to the past. For those without access to time travel technology, we need to avoid including information about the future in our training data when building machine learning models. Similarly, if any other feature whose value would not actually be available in practice at the time you'd want to use the model to make a prediction, is a feature that can introduce leakage to your model.

Industry works to get project release cycles, Hadoop core under control. But new complexity sneaks through anyway.

The update is designed to capture data about how Uber's drivers operate their vehicles — measuring braking, acceleration, and speed.

The Cisco Spark and WebEx collaborative workspace platforms will be integrated with IBM's leading cloud collaboration solutions.

The insurance sector has been grappling with data for decades and is perhaps one of the oldest institutions to practice what we would now consider to be Data Science and analytics. The inherently risk averse nature of Insurance has historically demanded Actuaries to implement rigorous statistical and mathematical processes to enable the quantification of risk into a packaged, saleable product. Although insurance organisations have long been aware of the applications of data for business gain, there has been renewed interest in Big Data given the diversification of new sources now available through the exponential growth in technology.

How moving away from binary could speed up processing.

Find more about compression and data compression., the company bringing AI to business, today announced the availability of Sparkling Water 2.0. Sparkling Water 2.0 builds off the popularity of Sparkling Water,'s API for Apache Spark, with additional features and functionality.

xMatters for the Enterprise DevOps toolchain stitches together the disparate operational tools to help orchestrate hand-offs between the tools and team members. Customers like us because we understand the need for DevOps or NoOps automation at enterprise scale and recognize that humans are still involved when something goes wrong.

Guest blog post by Bill Vorhies Summary: What happens after you make those critical discoveries in the Data Lake and need to make that new data and its insights operational? image source: EMC Data Lakes are a new paradigm in data storage and retrieval and are clearly here to stay. As a concept they are an inexpensive way to rapidly store and retrieve very large quantities of data that we think we want to save but aren't yet sure what we want to do with. As a bonus they can be unstructured or semi-structured data, streaming data, or very large quantities of data, covering all three "Vs" of Big Data.

Step by step guide on implementing a customer segmentation solution – without paying millions of dollars for an enterprise platform.Customer micro-segmentation is without a doubt one of the most powerful methods for extracting value from your database, and a must for every frequent flyer loyalty program. Segmentation underpins almost every aspect of a successful program, from loyalty marketing, revenue management, member engagement and being able to measure analytics and achieve your set OKR & KPIs.

Designing for performance is absolutely essential; but runtime is so crazy a variable that we can reasonably blame too-early optimization for a non-negligible chunk of lousy UX and unmaintainable code. The latest Guide to Performance and Monitoring covers both the static and dynamic, the verifiable and the unknowable sides of building and maintaining performant applications.

In order to stay in pace with the customers' demand, businesses must administer CRM technologies to reap the maximum benefits out of customer data and create double-glazing sales number.

In this special guest feature, Tom Goodmanson and Michael Bragg of the Calabrio Executive Team, discuss why contact center data is so important to every member of the C-suite and how, when viewed comprehensively, the organization as a whole will benefit.

Professor Hawking joined Larry King to discuss the greatest issues facing the planet, including where artificial intelligence is headed (and what he makes of Kurzweil's singularity theory).

CollabNet has a long history helping the federal market build quality software at speed, including the Department of Defense (DoD), which is, believe it or not, one of the most active software developers in the world. Federal software needs range widely — from military and defense systems, to communications, command and control and operations — so you can imagine how complex the development and delivery processes have become. With so many layers of DoD software development, corralling the systems into a manageable workflow is an enormous task, and that's where comes in.

Guest blog post by SupStat Contributed by the neuroscientist Sricharan Maddineni. He holds huge passion and talents in data science. Thus he took NYC Data Science Academy 12 weeks boot camp program between Jan 11th to Apr 1st, 2016. The post was based on his second project, which posted on February 16th (due at 4th week of the program). He acquired the publicly transportation data and consult from social media. Consuming the data through his mind, he visualized the economic and business insights. Why Are Airports Important?

ANALYSIS REVEALED!  Join Dell Statistica's webcast to learn about 2016 trends and how Statistica 13.1 factors in.  Date: Thursday, July 21, 2016  Time: 10:00 a.m. PT / 1:00 p.m. ET  Join Howard Dresner, thought leader in the Business Intelligence community and founder of Dresner Advisory Services, as he discusses key findings from his "2016 Business Intelligence Market Study." In this webcast, Howard will describe market forces currently impacting the BI and performance management landscape and will also discuss Dell Statistica's role within that market.  John Thompson, Dell Statistica General Manager, will join Howard to talk about how Statistica's business model and solution offering earned top ranking in both "customer experience" and "vendor credibility" models in its first year that it was included in the report. Statistica also scored best-in-class for technical support product knowledge and consulting product knowledge. 

"We work in the area of Big Data analytics and Big Data analytics is a very crowded space – you have Hadoop, ETL, warehousing, visualization and there's a lot of effort trying to get these tools to talk to each other," explained Mukund Deshpande, head of the Analytics practice at Accelerite, in this interview at 18th Cloud Expo, held June 7-9, 2016, at the Javits Center in New York City, NY.

The solution will help Airtel improve consumer experience and leverage customer data to execute omni-channel campaigns.

Today enterprises accumulate data at ever-rising volumes and ever-rising velocities. Whether you're talking about data from corporate systems, the Internet of Things, or social media, the flow never stops. While it brings its share of challenges, this constant stream is actually a good problem to have. There is intrinsic value in all of those bits and bytes. I like to think of data in financial terms: as information capital. It's the key to becoming competitive and driving long-term value for shareholders. Businesses can leverage this capital to launch new initiatives, to connect to customers in ways that strengthen relationships and drive higher sales, and to work proactively to avoid costs. Those institutions that do so are poised to realize targeted returns.

A U.S. patent awarded this month to ClearStory Data, the big data preparation tool specialist, covers its automated data harmonization tool designed to work across disparate data sources and a variety of data types. Such data prep tools are gaining favor as the amount and types of unstructured data from sources like social media and sensors continue to skyrocket. ClearStory, Menlo Park, Calif., said U.S. Patent 9,372,913, "Apparatus and method for harmonizing data along inferred hierarchical dimensions," covers its in-memory, Spark-based data harmonization platform.

Would you spend millions of dollars building a data lake if you knew you wouldn't get your money back? Of course not. But all too often, organizations embarking on big data projects don't do what it takes to achieve a return on investment. Here are eight ways you can help your own cause and earn a return on your data lake investment. 1. Start Small, Be Targeted You may have grandiose ambitions to use big data analytics to transform your organizations into digital powerhouse, and that's great. Big things are possible with big data. But the truth is that most successful big data teams start small, with a single project firmly grounded in solving an actual problem that affects your organization.

Data analytics has probably been discussed and debated as much as any of the disruptive technologies that have rocked our times. Finally, a clear set of tools, strategies, and technologies have emerged to make data analytics practical for all sizes of organizations, and the latest tools make it possible to do so without a huge team of expensive data scientists. For example, the data lake makes it possible to collect and store data before you even determine what you'll do with it analytically.

Your business relies on your applications and your employees to stay in business. Whether you develop apps or manage business critical apps that help fuel your business, what happens when users experience sluggish performance? You and all technical teams across the organization — application, network, operations, among others, as well as, those outside the organization, like ISPs and third-party providers — are called in to solve the problem.

In today's digital landscape, technology offers compelling motivation to capitalize on data-driven evolution opportunities for adding value on both the advertiser and consumer sides of the targeted digital marketing equation. Discover why understanding consumers and what they really want is programmatic advertising's holy grail.

Telecommunications organizations cannot ignore the need to engage in the moment by quickly and accurately predicting or detecting network and device problems that may affect the customer experience. Engaging in the moment requires proactive approaches to customer care that can greatly minimize customer churn. Learn more by viewing an on-point discussion with an industry strategist at last year's IBM Insight 2015.

We don't pay enough attention to data-centric concerns, like where to find the data and how many endpoints have access to it.

BMC, a leader in IT software solutions for the digital enterprise, announced an expanded Big Data strategy to automate, accelerate and secure enterprise-class Hadoop® environments, enabling operational excellence and a competitive edge in the digital age.

Very interesting data compiled and analyzed by O'Reilly, using statistical models such as Lasso regression to predict salary based on different factors. It reminds me our own analysis based on simulated (but realistic) data, to assess whether having Python or R (or both) commands a bigger salary, and what is the extra boost provided by these skills, individually.

These numbers are from and were first mentioned here.

Guest blog post by Harry Powell, Head of Advanced Data Analytics at Barclays. I was at a meetup in Oxford recently and one of the speakers, the CEO of a tech start-up, brought up the subject of Data Scientists' pay.

Salary mostly depends on experience, education, location, industry, and unfortunately, factors such as gender. Also, most data scientists have all the three skills and more (R + Python + SQL), so it is hard to assess which one is the most valuable.

Virtual assistants are about to transform the way IT services are delivered in a way most IT organizations never imagined.

These books were added in the last few months. Some do not cost anything (those marked as eBook in the listing below). Easy R Programming for Beginners  Introducing Data Science: Big Data, Machine Learning using Python  Deep Learning With Python  Mastering Python for Data Science  Ten Signs of Data Science Maturity – eBook by Kirk Borne and Peter Guerra New book on data mining and statistics  Exploring Data Science  Big Data Science & Analytics  Statistics for Non-Statisticians  14 Timeless Reference Books  Probabilistic and Statistical Modeling in Computer Science – eBook 10 Machine Learning books  10 great books about R  For a much bigger list, click here. For books added in 2016, click here. And if you are interested to see what kind of books were published just 5 years ago, click here: you will see that hot topics are changing over time, with less data mining, more deep learning and Python, just to give an example.

Watson might schedule your meetings someday if a partnership between IBM and Cisco Systems bears the fruit they're hoping for. In the meantime, the companies hope to save employees from some of the meaningless tasks they have to carry out just to work with their colleagues. IBM's Verse email platform and Connections collaboration suite are a good match for Cisco products like the Spark messaging app and WebEx conferencing service, so the two vendors have found ways to integrate them, company officials say. All this will happen in the cloud.

This week we made a huge step forward in accelerating genomics-based precision medicine in research and clinical care, starting a consortium of experts and organizations who will help to define the next generation of genomics research. We've already been joined by Arizona State University, Baylor College of Medicine, Booz Allen Hamilton, Mayo Clinic, OneOme and… The post 3 Reasons to Join The Consortium Defining the Next Generation of Genomics Research appeared first on Hortonworks.

Readings in Database Systems Also known as the 'Red Book', Readings in Database Systems has been published since 1988. The 5th edition was published in 2015 after a 10 year hiatus. Quite ironic really that yours truly started his graduate IT career in 1988 on, you guessed it, database systems! The Red Book contributors are Peter Bailis (Stanford Future Data Systems group), Joe Hellerstein (Professor of Computer Science at the University of California, Berkeley) and Michael Stonebraker (Professor at MIT).

Watson might schedule your meetings someday if a partnership between IBM and Cisco Systems bears the fruit they're hoping to grow. In the meantime, the companies hope to save employees from some of the meaningless tasks they have to carry out just to work with their colleagues. IBM's Verge email platform and Connections collaboration suite are a good match for Cisco products like the Spark messaging app and WebEx conferencing service, so the two vendors have found ways to integrate them, company officials say. All this will happen in the cloud. They'll demonstrate the first examples next month at the Cisco Live conference. The collaboration could have particular value for enterprise Apple users.

This week, Amazon announced that it is moving its Elastic File System out of preview and offering it for general use. The new file system offering is available in three AWS regions.

The following post by Vik Paruchuri, founder of data science learning platform Dataquest, offers some detailed and instructive insight about data science workflow (regardless of the tech stack involved, but in this case, using Python). We re-publish it here for your convenience. Data science companies are increasingly looking at portfolios when making hiring decisions.

Big data is changing the way enterprises interact with and consume data. Modern data platforms, such as Hortonworks Data Platform (HDP) and Hortonworks Data Flow (HDF), are driving a data revolution by powering new workloads and analytic applications. This week, there are thousands of attendees in San Jose at Hadoop Summit 2016 learning about the… The post Quickly Launch Hortonworks Data Platform in Amazon Web Services appeared first on Hortonworks.

This entry was posted in News and tagged , , , , , , , , , . Bookmark the permalink.