Big Data News – 02 May 2016

Top Stories
On March 30th, 2016 Microsoft announced the release of their BizTalk Server 2016 Community Technical Preview 1 (CTP). This release is one of Microsoft's milestones they highlighted in their recent Integration Roadmap. In addition to the BizTalk Server CTP, Microsoft has also released an initial CTP for its Host Integration Server offering.

Real-time open source data projects gain momentum, FICO updates its predictive analytics suite, Samsung introduces new devices for IoT, and a new contest for creating tools to analyze satellite data is launched in this Big Data Roundup for the week ending May 1, 2016.

Our friends over at Webtrends just released the infographic below to explain some of the differences between analytics solutions of yesterday vs. today and provides details into what's coming with their new solution Infinity Analytics.

Public IaaS or PaaS offerings may not satisfy the regulatory, security, or performance demands of every workload. This article explores how CoreOS have studied the emerging state-of-the-art application design and deployment patterns, and created and integrated a number of open source projects in pursuit of a modular platform that satisfies the needs of modern container cluster infrastructure.

Big data is often associated with batch processing — analytics that takes a lot of time. But increasingly, new data processing and analytical tools are empowering real-time analysis. There are many potential uses for real-time analytics that simply could never be done with traditional batch processing. Here are the potential uses for real-time analytics, along with the tools you need to get it done. The Uses for Real-Time Data Analytics Processing online transactional data is just one use for real-time analytics.

Manuel Fahndrich describes how they tackled one particular resource allocation aspect of Google Cloud Dataflow pipelines, namely, horizontal scaling of worker pools as a function of pipeline input rate. Managing the redistribution of key ranges across new pool sizes and the associated persistent data storage was particularly challenging.

Simon Ritter looks at the fundamentals of how modularity in Java works. He explains the impact project Jigsaw has on developers in terms of building their applications, as well as helping them to understand how things like encapsulation will change in JDK 9.

Ben Stopford looks at the implications of mixing toolsets from the stream processing world into real-time business applications: how to effectively handle infinite streams, how to leverage a high throughput, persistent Log and deploy dynamic, fault tolerant, and streaming services. By Ben Stopford

Tony Grout and Chris Matts talk about the emerging areas of Business Mapping and Skills Liquidity and how combining business strategy with the abilities and aspirations of people can create a rich landscape of options for achieving genuine business agility, and lead to rich collaboration between business and technical stakeholders.

Jan Neumann presents how Comcast uses machine learning and big data processing to facilitate search for users, for capacity planning, and predictive caching.

Microsoft is carrying through experiments with synthetic DNA for digital data storage and has recently agreed to purchase ten million strands of DNA from genetics startup Twist Bioscience.

Platfora, the Big Data Discovery platform built natively on Apache Hadoop and Spark, announced the general availability of Platfora 5.2. The new release democratizes big data across an organization, moving it beyond IT and early adopters by enabling business users to explore big data and discover new insights through their favorite business intelligence (BI) tool.

Following on from the announcement at the March 2016 "Let us loop you in", Apple has open sourced its new CareKit framework, aimed to make it easier for developers to create apps that help people manage their medical conditions. Along with the new framework, Apple has also made available four iOS apps that use it.

Jeremy Edberg discusses remote teams and the pitfalls they've run into, the parts that are working well, and a summary of their research talking to other fully or partially remote teams.

Aaratee Rao discusses some practical and real world examples of how some well-known hyper growth companies accumulated and managed technical debt. Rao covers tools on how to properly identify different forms of technical debt, track it and factors to consider when deciding to repay so that the entire team can reap its benefits.

Nikhil Garg talks about the mental frameworks, processes and tools that allow Quora to strike a good balance and move fast sustainably, both in the short-term and in the long-term.

Roy Rapoport demonstrates the power of alignment (or lack thereof) using real-world examples from previous and current employer, with specific emphasis on his experience introducing Python to production use within Netflix, the organizational structures he interacted with through that process, and the way they tie into Netflix's formal culture.

In this contributed article, DriveSavers Director of Engineering Mike Cobb shares three common myths about RAIDs. DriveSavers has performed tens of thousands of RAID data recoveries in the company's history.

Infosys (NYSE: INFY), a global leader in consulting, technology, and next-generation services, announced the launch of Infosys Mana, a platform that brings machine learning together with the deep knowledge of an organization, to drive automation and innovation — enabling businesses to continuously reinvent their system landscapes.

Last week saw the first DevOps Days conference catering specifically to the enterprise world, in London. Talks ranged from re-thinking (traditional) management processes in a technology-disrupted world to facts and drivers of DevOps adoption by early adopters. The idea of bi-modal IT was also discussed throughout the conference, as well as need for better security and opinionated platforms.

Ahmad Fahmy provides an authentic retrospective of a large scale agile transformation at a large bank, looking at what worked, what didn't and lessons which can be applied at other organizations facing similar challenges. By Ahmad Fahmy

Find out how cognitive capabilities supplied by IBM Analytics solutions helped the Tennessee Highway Patrol promote traffic safety across the state, enhancing the driving experience for Tennesseans everywhere.

With more than 50 distributors, each supplying sales data in different formats, medical diagnostic equipment maker Alere had limited visibility into its own customer base until it built a single-source data strategy built on master data management.

News this week included the end of a 2G network, a coalition for safer autonomous driving, scary ransomware, a Mac Skype for Business option.

The top five areas exposing your sensitive data to risk and tips for minimizing security vulnerabilities.

"Data scientist" has already been declared this year's hottest job, and now a new report offers several more reasons to consider it as a career. For the past three years executive recruiter Burtch Works has been surveying data-science professionals about salaries and other related topics.

Machine learning-powered artificial intelligence will match and exceed human capabilities in the areas of computer vision and speech recognition within five to 10 years, Facebook CEO Mark Zuckerberg predicted this week. Like most of the Web giants, Facebook (NASDAQ: FB) uses machine learning technology to leverage its massive data set and deliver better services to its customers. Algorithms work behind the scenes at Facebook to do things like recommend new connections to Facebook users, to present content that matches a user's interest, and to block spam.

Accenture (NYSE: ACN) and Splunk (NASDAQ: SPLK) entered into an alliance relationship that integrates Splunk products and cloud services into Accenture's application services, security and digital offerings. Accenture is helping clients use Splunk solutions to improve business outcomes by mining vast amounts of application and operational data to identify trends and improvement opportunities that were previously difficult to detect.

Surging big data is changing data modeling techniques, including schema creation. The word from Enterprise Data World 2016: Data pros must adjust.

Business professionals need answers to critical questions that bolster the accuracy of predictive forecasting. And forecasting is a vital approach for a number of business measures including product demand, revenue, sales and more. Discover why predictive forecasting is essential for any line of business.

Organizations with big data environments are turning to SQL-on-Hadoop software to speed up analytical queries and data integration jobs — and eliminate the need to program in MapReduce.

Cloud providers hosting blockchain secure transactions technology should take additional steps to protect their records, IBM says. IBM's new framework for securely operating blockchain networks, released Friday, recommends that network operators make it easy to audit their operating environments and use optimized accelerators for hashing — the generation of numbers from strings of text — and the creation of digital signatures to pump up CPU performance.  Along with the security guidelines, IBM announced new cloud-based blockchain services designed to meet existing regulatory and security requirements. The company has worked with security experts to create cloud services for "tamper-resistant" blockchain networks, it said.

Flash storage is maturing, and the latest products are taking on a more general-purpose storage function. To meet those needs, IBM introduced three such arrays this week.

Spark just seems to be getting big play everywhere in the technology arena. What is Spark? And do you need it? Get a good glimpse into its in-memory execution capabilities, some of its key components, its integrations and its availability as a service.

Open source data engineering has become a way of life at e-commerce leader eBay, says the company's Debashis Saha. Kylin is one of the tools that has resulted.

Containers won't emerge as a true enterprise solution until they prove themselves in production environments.

AirFlow recently joined the Apache Incubator program. AirFlow is a workflow and scheduling system designed to manage data pipelines. Developed by AirBnb for their internal usage, it was open sourced last September, as previously reported by InfoQ. By Alex Giamas

I've been following OpenStack since its inception half a dozen years ago. Back then the fledgling open source cloud operating system was a little project created jointly by Rackspace and NASA. A lot of time, massive amounts of funding and a few different value propositions have gone under the bridge and now OpenStack has its own foundation, huge vendor buy-in and a growing number of production deployments at scale.

Some BI developers will get by fine with features such as the Fiori Launchpad and Overview pages. Here's what's built into Fiori now, and what's on the horizon.

  Why NoSQL? Your database options in the new non-relational world The database you pick for your next application matters now more than ever. It can be difficult, and often times impossible, to quickly join today's data into the relational model. While NoSQL databases share some common qualities, such as being non-relational and generally easy to scale, there are many unique differentiators to consider when deciding upon the correct NoSQL solution for your unique data needs. The right NoSQL database can act as a viable alternative to relational databases or can be utilized in a complementary fashion along with existing systems.

Cloud providers hosting blockchain secure transactions technology should take additional steps to protect the records, IBM says. IBM's new framework for securely operating blockchain networks, released Friday, recommends that network operators make it easy to audit their operating environments and use optimized accelerators for hashing — the generation of numbers from strings of text — and the creation of digital signatures to pump up CPU performance.  Along with the security guidelines, IBM announced new cloud-based blockchain services designed to meet existing regulatory and security requirements. The company has worked with security experts to create cloud services for "tamper-resistant" blockchain networks, it said.

Guest blog post by Vincent Granville You have gathered gigabytes or terabytes of unstructured text, for instance scraping the Internet, or pieces of email from your employees or users, or tweets, or millions of products that you want to categorize (only product description and product name is available – sometimes with typos). Now you want to make sense of it, and extract value, possibly design a nice search engine so that your customers can easily find your products. The core algorithm that you need is an automated cataloguer, also called indexer. I am going to explain in layman's terms how it works.

Deepjazz is a project from Ji-Sung Kim, a computer science student at Princeton University. It is built using Theano, Keras, music21, and Evan Chow's project jazzml. Deepjazz is a computational music project that creates original jazz compositions using recurrent neural networks trained on Pat Metheny's "And Then I Knew". You can hear some of deepjazz's original compositions on soundcloud.

It's happening, fast and furious: Disruptive consumer and enterprise technologies continue to rapidly transform the way we work and live — across verticals and throughout the consumer experience, the business-to-business world and the entire supply chain. From cloud, mobile, and Big Data and Analytics to IoT, 3D printing, biotech and cybersecurity, entire industries and business models are quickly shifting as emerging technologies gain momentum, moving from experimental options to mainstream acceptance and maturity. Top-performing organizations know they have to keep up with the latest tech trends in order to transform into the digital business of the future.

A peculiar thing happens in northern Florida every year in the springtime. That's harvest season on the many fern farms scattered across the region, and it's also the time when demand for rattlesnake antivenom skyrockets there. That's no coincidence. Rattlesnakes like to form dens under fern crops. That means trouble for those who harvest the plants, and it puts urgent pressure on local hospitals and healthcare providers, which must come up with the highly perishable antivenom on demand. "A lot of times you never really know how much you're going to need," said Kyle Pudenz, senior director of purchasing for pharmaceutical wholesaler H. D. Smith. "But you also can't stock up and leave it on the shelf."




In order to implement DevOps, individuals and organizations must prepare for the culture shift, new tools, and automation. This consensus has evolved during years of debate concerning what exactly DevOps means and how to use it. There are many voices in the discussion, and even with some areas of consensus, many points are far from agreement.

Most people are familiar with the graphics processor unit (GPU), which is an important part of making your PC or smartphone graphically and video enabled. In fact, the GPU has a long history. It started out primarily as a peripheral supplementing the central processing unit (CPU), first in high end workstations and then in a PC. It was optimized to support graphical operations (something that a CPU of early days did not do very well).

It's not an overstatement to say that, at least for me personally, Edward Tufte's book The Visual Display of Quantitative Information was transformative. Reading this book got me and, I feel…

This is Jamie Martin writing once again about a new data visualization I've been working on. If you didn't see my basketball ranking data visualization a few weeks ago, you should check it out as well. My latest visualization helps answer the question posed by a higher education professional — Are there specific parts of the country… The post Data Visualization: Student Loan Default Rates By Institution appeared first on BrightPlanet.

Everyone knows that a picture is worth 1,000 words–or, in a business context, that complex data is much easier to convey and understand when presented visually. Data visualization can be an incredibly powerful tool, from customer and team presentations to internal performance reporting. But who has time, the expertise, or the attention span for that?…

Originally posted on Data Science Central Guest blog post by Martijn Theuwissen, co-founder at DataCamp. Other R resources can be found here, and R Source code for various problems can be found here. A data science cheat sheet can be found here, to get you started with many aspects of data science, including R. Learning R can be tricky, especially if you have no programming experience or are more familiar working with point-and-click statistical software versus a real programming language. This learning path is mainly for novice R users that are just getting started but it will also cover some of the latest changes in the language that might appeal to more advanced R users. Creating this learning path was a continuous trade-off between being pragmatic and exhaustive.

In recent blogs, I wrote about using codified narrative as a form of data. I also discussed using attribution models to systematically evaluate codified narrative for ontological constructs: e.g. "child abuse" "physical confinement" "cannibalism." I provide a brief overview of these topics a bit later in the blog. The third important piece to make use of narrative data involves "attribution profiling" in a process that I call "catching scent." Following the odour of data involves establishing a scent and then searching for it. After attribution models create profiles from the codified narrative, I have the search engine hunt for similar profiles. I

News: Goal is to increase appeal to enterprise businesses and highly regulated industries.

In anticipation of his upcoming Predictive Analytics World for Manufacturing conference presentation, Using Vehicle Digital Bus Data for Predicting Failure of Line Haul Trucks, we interviewed Jeffrey Banks, Department Head, Complex Systems Engineering & Monitoring at The Applied Research Laboratory at The Pennsylvania State University. View the Q-and-A below and glimpse what's in store for the PAW Manufacturing conference…. The post Wise Practitioner – Manufacturing Predictive Analytics Interview Series: Jeffrey Banks at The Applied Research Laboratory at The Pennsylvania State University appeared first on Predictive Analytics Times.

In this special guest feature, Rob Consoli, Senior Vice President of Sales and Marketing for Liaison Technologies, makes a case for the benefits of a data-centric approach to integration called dPaaS (Data Platform as a Service), an integration and data management platform that is fundamentally different from traditional approaches and offers a number of advantages.

Adaptive Insights, a leader in cloud corporate performance management (CPM), released its global CFO Indicator Q1 2016 report, revealing that while chief financial officers (CFOs) remain worried about growing economic volatility, the vast majority of the 377 CFOs surveyed remain confident in their forecasts, and believe that the combination of big data, analytics, and scenario planning will likely be the key to navigating their organizations through the current financial uncertainty.

MapR Technologies, Inc., provider of the Converged Data Platform, announced at Kafka Summit 2016 it is now offering stream processing training via MapR Academy's free On-Demand Training program. The new training enables Apache Kafka developers to extend their real-time analytics and Internet of Things (IoT) applications.

A peculiar thing happens in northern Florida every year in the springtime. That's harvest season on the many fern farms scattered across the region, and it's also the time when demand for rattlesnake antivenom skyrockets there. That's no coincidence. Rattlesnakes like to form dens under fern crops, it turns out. That means trouble for those who harvest the plants, and it puts urgent pressure on local hospitals and healthcare providers, which must come up with the highly perishable antivenom on demand. "A lot of times you never really know how much you're going to need," said Kyle Pudenz, senior director of purchasing for pharmaceutical wholesaler H. D. Smith. "But you also can't stock up and leave it on the shelf."

This entry was posted in News and tagged , , , , , , , , , , , . Bookmark the permalink.