Big Data News – 24 Jun 2016

Featured Article
Enterprise workloads are moving to the cloud at a steady pace. Does this mean the local data center is doomed?

Top Stories
Once again another Hadoop Summit conference is upon us and this time it is in San Jose and of course we will be attending. We will be in Booth # 703 and we have several sessions, Including a keynote that we hope you will be able to attend. Here is a quick snapshot of the sessions Navigating the World of User Data Management and Data Discovery on June 29th at 11:30 AM

DataStax Enterprise 5.0 couples a Cassandra store with a version of the open source Titan graph DB. The goal is fast analytics closely tied to fast transactions.

    For analysts, blending multiple sources of data together is one of the most time-consumingstep in their analytical process. They understand that getting the data in the right format for visual interaction and analysis helps deliver the right results. Combining the power of Alteryx and Microsoft Power BI can help analysts:   Get all the data you need from multiple sources into one place   Perform easy data blending to create one seamless dataset   Rapidly share and transform your data without manual formulas or coding   Output the dataset directly to Power BI to start creating rich visualizations   Download now and learn how Alteryx and Microsoft Power BI help speed up data preparation tasks so that you can spend more time creating richer visualizations.       Download Now        

In my last post, we discussed the importance of data and insights — insights that help drive board level decisions. There is a plethora of new data sources that could be used by organizations to drive new and meaningful insights.   Many organizations start their big data journey by bringing in new and untapped data sources from the enterprise into Apache Hadoop. According to Forrester, 73% organizations aspire to be data driven. However only 29% of these organizations are using the data to take action on the data.

Alation makes data more actionable via such innovative means as combining human experts and technology systems. The next BriefingsDirect Voice of the Customer big-data case study discussion focuses on the Tower of Babel problem for disparate data, and explores how Alation manages multiple data types by employing machine learning and crowdsourcing.

A real-time notifications system was a champ behind-the-scenes at The Championships, Wimbledon 2015 by enabling its digital and content team to break the news of a key tournament statistics milestone that scooped media organizations worldwide. See what value an extension to that system is adding to the 2016 event to engage fans through predictability and real-time insight.

Monday newsletter published by Data Science Central. Previous editions can be found here. The contribution flagged with a + is our selection for the picture of the week. Featured Resources and Technical Contributions Bookdown: Authoring Books with R Markdown 17 Data Visualization Tools & Resources You Should Bookmark New Machine Learning Cheat Sheet by Emily Barry Predicting the Higgs-Boson Signal The challenges of word embeddings 12 Algorithms Every Data Scientist Should Know – Infographics Dozens of Great Interactive Data Visualizations from Nesta

I'm a regular user of Amazon Video: as someone who spends a fair bit of time on planes, it's great to be able to download some of my favourite shows (hello, Orphan Black and Vikings) and catch up on…

I've written about Bitnami many times in the past. It's CEO, Erica Brescia, is one of a (sadly) small number of tech startup founders who happen to be women. And, while that makes a great headline or discussion point, it's Bitnami's success, outside of any gender-specific focus that really interests me. Bitnami builds marketplaces that allow cloud vendors to offer the end-user application on top of their clouds. Bitnami is an application store for open source applications — for end users, what this means is that on the cloud platforms that Bitnami is integrated with, they can deploy the open source application or development environment they want quickly and easily — fully configured and ready to run.

Don't leave a weak link in the chain of your body-worn camera program. Instead, look to video analytics to help you take full advantage of your available personnel without losing precious time.

Manjeet Chayel is a Solutions Architect with AWS There is streaming data everywhere. This includes clickstream data, data from sensors, data emitted from billions of IoT devices, and more. Not suprisingly, data scientists want to analyze and explore these data streams in real time. This post shows you how you can use Spark Streaming to process data coming from Amazon Kinesis streams, build some graphs using Zeppelin, and then store the Zeppelin notebook in Amazon S3. Zeppelin overview Apache Zeppelin is an open source GUI which creates interactive and collaborative notebooks for data exploration using Spark. You can use Scala, Python, SQL (using Spark SQL), or HiveQL to manipulate data and quickly visualize results.

Kristian Lum (@KLdivergence) joins me this week to discuss her work at @hrdag on predictive policing. We also discuss Multiple Systems Estimation, a technique for inferring statistical information about a population from separate sources of observation. If you enjoy this discussion, check out the panel Tyranny of the Algorithm? Predictive Analytics & Human Rights which was mentioned in the episode.

Britain votes to leave the EU, but what does this mean for the technology industry? Let's take a look at the situation in the United Kingdom of Great Britain and Northern Ireland. But let's keep this a politics-free zone, eh? In IT Blogwatch, British bloggers panic (or not). Your humble blogwatcher curated these bloggy bits for your entertainment. Not to mention: don't panic…

With so many different attackers with methods as varied as their motives, there is no such thing as an "impenetrable barrier." So your only real option is to go on the offensive. IBM i2 Enterprise Insight Analysis integrates visual analysis capabilities with advanced analytics. It enables organizations to take an intelligence-driven approach to cyber defense, uncovering hidden connections and patterns across large, disparate data sets in near real time.

SYS-CON Events announced today that Bsquare has been named "Silver Sponsor" of SYS-CON's @ThingsExpo, which will take place on November 1–3, 2016, at the Santa Clara Convention Center in Santa Clara, CA. For more than two decades, Bsquare has helped its customers extract business value from a broad array of physical assets by making them intelligent, connecting them, and using the data they generate to optimize business processes.

DevOps at Cloud Expo — being held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA — announces that its Call for Papers is open. Born out of proven success in agile development, cloud computing, and process automation, DevOps is a macro trend you cannot afford to miss. From showcase success stories from early adopters and web-scale businesses, DevOps is expanding to organizations of all sizes, including the world's largest enterprises — and delivering real results. Among the proven benefits, DevOps is correlated with 20% faster time-to-market, 22% improvement in quality, and 18% reduction in dev and ops costs, according to research firm Vanson-Bourne.

Big Data is so in vogue, that it's at risk of becoming the next "synergy". Ah yes, all those wonderful hip words used by executives trying to show just how young and hip they are. While executives wear ball-caps to cover up their bald spot, Big Data is bandied around the Board Room. But what's the point of Big Data if it doesn't actually drive decision-making in an effective direction?

The UK government has taken the first step in providing a solid grounding for the future of data science ethics. Recently, they published a "beta" version of the Data Science Ethical Framework. The framework is based around 6 clear principles: Start with clear user need and public benefit Use data and tools which have the minimum intrusion necessary Create robust data science models Be alert to public perceptions Be as open and accountable as possible Keep data secure See the above link for further details. The framework is somewhat specific to the UK, but it would be nice to see other countries/organizations adopt a similar framework. Even DJ Patil, U.S. Chief Data Scientist, has stated the importance of ethics in all data science curriculum.

DataStax, a leading provider of database software for cloud applications, announced the release of DataStax Enterprise (DSE) 5.0 and DataStax OpsCenter 6.0, to provide enterprises a comprehensive and operationally simple data management layer.

Bart van Leeuwen, a senior firefighter from Amsterdam, was the featured speaker at the 2016 National Fire Protection Association (NFPA) Conference & Expo in Las Vegas. He spoke about his quest to leverage big data so that firefighters and community members are as safe as possible from fire.

In this special guest feature, Hank Weghorst, Chief Technology Officer at Avention, believes the emergence of "data exhaust" is a trend that will continue to pick up momentum.

With the haircut that the sterling-euro exchange rate has taken in the wake of the U.K.'s vote to leave the European Union, the U.K. has suddenly become a low-cost country for companies wishing to host or process the personal information of EU citizens. EU businesses will need to weigh that price cut against the regulatory uncertainty Thursday's vote introduced — but it turns out that's surprisingly small, at least in the short to medium term. As for U.K. businesses hoping for more relaxed data protection rules in the wake of the referendum vote, they will have to wait — perhaps for a very long while. That's because many of the rules that the 51.9 percent who voted to leave the EU hoped to escape are, in fact, firmly part of U.K. law, and will only go away if the U.K. parliament votes to repeal them.

The multilevel model of meme diffusion conceptualizes how mediated messages diffuse over time and space. As a pilot application of implementing the meme diffusion, we developed the social media analytics and research testbed to monitor Twitter messages and track the diffusion of information in and across different cities and geographic regions. Social media analytics and research testbed is an online geo-targeted search and analytics tool, including an automatic data processing procedure at the backend and an interactive frontend user interface. Social media analytics and research testbed is initially designed to facilitate …

This article investigates the claims and complexities involved in the platform-based economics of health and fitness apps. We examine a double-edged logic inscribed in these platforms, promising to offer personal solutions to medical problems while also contributing to the public good. On the one hand, online platforms serve as personalized data-driven services to their customers. On the other hand, they allegedly serve public interests, such as medical research or health education.

The 19th International Cloud Expo has announced that its Call for Papers is open. Cloud Expo, to be held November 1-3, 2016, at the Santa Clara Convention Center in Santa Clara, CA, brings together Cloud Computing, Big Data, Internet of Things, DevOps, Digital Transformation, Microservices and WebRTC to one location. With cloud computing driving a higher percentage of enterprise IT budgets every year, it becomes increasingly important to plant your flag in this fast-expanding business opportunity. Submit your speaking proposal today!

Many people spend their day sifting through data, combining multiple data sources, and finally getting data ready for the moment of truth: seeing it in a data visualization. Data visualizations are the culmination of all data crunching work–they're supposed to take long numeric lists and complicated KPIs, and present them in intuitive, easy to understand way. That is, if you choose the right visualization for your data. The problem is, it's often challenging to choose the right visualization for the data you want to show. Do you want to compare values or analyze a trend?

Data has an increasingly important role in businesses.

Summer is here and temperatures are rising. While some of us take vacations or cool off at the beach, prospective data scientists are heating up their job prospects by participating in one of a growing number of data science bootcamps. Bootcamps of all types are growing quickly. According to Course Report, a website that tracks bootcamps, the number of graduates from the country's 91 full-time coding bootcamps will grow by 60 percent this year, increasing to 17,966 graduates accounting for $199 million in revenues. While Course Report didn't break bootcamp growth down by topic, data science is clearly one of the most popular bootcamp topics.

Seven steps that will empower IT teams in their efforts towards a sustainable DevOps transformation.

Enterprise storage is trending away from traditional, enterprise managed network-attached storage (NAS) and storage area networks (SAN) towards a more complex environment that includes software-defined and cloud-based solutions. Spinning disks are also being replaced with flash arrays and solid state devices. These transformations are driven by challenges associated with the parallel processing of unstructured data within a near-real-time business operational tempo.

"Data governance applies to everything that we do," shared Janice Haith, Department of Navy's Deputy CIO. And, being responsible for complex, mission-critical initiatives such as enterprise architecture, software licensing, information assurance, data and help desk consolidation, and compliance, to name a few — means there is a lot of data to be dealt with. But, just like with any organization, IT must tell impactful stories that convey why an initiative must be undertaken and how it affects the end-user. Those who work in the armed forces know all too well the avalanche of acronyms used every day, but acronyms don't impart emotion, cause and effect, or urgency. So how do those in charge humanize the importance of projects in a way that garners buy-in from all necessary parties? The answer may surprise you — through effective storytelling.

Consider the following integration scenarios: Moving medical records between EMR systems; financial information between banking systems; HR information between ERP systems; and software development information between SDLC tools. At first glance the approaches required for these integrations may seem the same. But if you look slightly deeper you will realize that this can't be the case because of impedance mismatch. I'm defining impedance mismatch as the friction that occurs when trying to align two things or concepts that don't naturally/actually match. Because many of the hardest impedance mismatches are domain specific, to overcome them you have to have a layer of domain understanding "baked into" your integration software to address business problems.

Oracle's namesake database may have been born on-premises, but the next big update to the software will make its debut in the cloud. Oracle Database 12c Release 2, also known as Oracle Database 12.2, is slated for release in the second half of this year. It will first be made available in the cloud, with an on-premises version arriving at some undefined point in the future. "We are committed to giving customers more options to move to the cloud because it helps them reduce costs and become more efficient and agile," Oracle said in a statement sent by email. "Oracle Database 12.2 will be available in the cloud first, but we will also make it accessible to all of our customers."

A couple of recent studies show how vital DNS security is and how much a DNS-related security incident can cost you.

Microsoft has significantly upped the tally of U.S. government gag orders slapped on demands for customer information, according to court documents filed last week. In a revised complaint submitted to a Seattle federal court last Friday, Microsoft said that more than half of all government data demands were bound by a secrecy order that prevented the company from telling customers of its cloud-based services that authorities had asked it to hand over their information. The original complaint — the first round in a lawsuit Microsoft filed in April against the U.S. Department of Justice (DOJ) and Attorney General Loretta Lynch — had pegged the number of data demands during the past 18 months at 5,624. Of those, 2,576, or 46%, were tagged with secrecy orders that prevented Microsoft from telling customers it had been compelled to give up their information.

Data scientists and others often encapsulate big data by its dimensions known as the four Vs: volume, variety, velocity and veracity. But when considering big data as a source for insight to enhance decision making, it may be best characterized by its three Cs–confidence, context and choice–with cognitive as a fourth C bonus.

Original article is published at Forbes: Link Ask most people outside academia or Silicon Valley what comes to mind when they hear the term "machine learning" and you're likely to get a response that involves a movie like "The Matrix" or "Ex Machina." You're less likely to hear how it's a great tool for fraud detection or supply chain optimization, and that's too bad. Machine learning has a tremendous range of business applications, from optimizing data centers to predicting fine wine price changes to retail market basket analysis. With that in mind, I hope to cut through the science fiction clutter and misconceptions so you can consider how machine learning relates to your business.

BIWA Summit 2017 THE Big Data + Analytics + Spatial + Cloud + IoT + Everything "Cool" Oracle User Conference 2017 January 31 — February 2, 2017 Oracle Conference Center at Oracle Head Quarters Campus, Redwood Shores, CA What Oracle Big Data + Analytics + Spatial + Cloud + IoT + Everything "Cool" Successes Can You Share? We want to hear your story. Submit your proposal today for Oracle BIWA Summit 2017, January 31– February 2, 2017 and share your successes with Oracle technology. Speaker proposals now are being accepted through October 1, 2016.

Although we often write about and discuss digital transformation, we often fail to identify the end goal we are really trying to achieve. We talk at great length about data, analytics, speed, information logistics systems and personalized user experiences, but none of these are the end goal. Ultimately we must digitally transform so we can remove the "fog of war," and have clear visibility and insights into our businesses and the needs of our customers. The end goal of digital transformation, however, is the ability to rapidly act and react to changing data, competitive conditions and strategies fast enough to succeed.

What if lost limbs could be regrown? Cancers detected early with blood or urine tests, instead of invasive biopsies? Drugs delivered via nanoparticles to specific tissues or even cells, minimizing unwanted side effects? While such breakthroughs may sound futuristic, scientists are already exploring these and other promising techniques. But the realization of these transformative advances is not guaranteed. The key to bringing them to fruition, a landmark new report argues, will be strategic and sustained support for "convergence": the merging of approaches and insights from historically distinct disciplines such as engineering, physics, computer science, chemistry, mathematics, and the life sciences.

SaaS companies can greatly expand revenue potential by pushing beyond their own borders. The challenge is how to do this without degrading service quality. In his session at 18th Cloud Expo, Adam Rogers, Managing Director at Anexia, discussed how IaaS providers with a global presence and both virtual and dedicated infrastructure can help companies expand their service footprint with low "go-to-market" costs.

Colleges are graduating more people with analytics skills every year, but businesses are still adapting to the shortage by finding other ways to hire the skills they need.

Cognitive Computing is becoming the foundation for a new generation of solutions that have the potential to transform business. Unlike traditional approaches to building solutions, a cognitive computing approach allows the data to help determine the way applications are designed. This contrasts with conventional software development that begins with defining logic based on the current way a business operates. In her session at 18th Cloud Expo, Judith S. Hurwitz, President and CEO of Hurwitz & Associates, Inc., put cognitive computing into perspective with its value to the business. The session detailed what it takes to build a cognitive application and the types of solutions that are the best fit for this data-driven approach.

The public, media and others have a right to see video footage collected from body-worn cameras on law enforcement officers, but compliance issues with governing agencies adds a lot of complexity to these requests. Consider four key questions agencies owning–and are responsible for–all that video data need to tackle when satisfying key legal requirements.




In early March, I spoke at the Hadoop with the Best online conference. I had fun sharing one of my total passions: data pipelines! In particular, I talked about some techniques for catching raw user events, acting on those events, and understanding user activity from the sessionization of such events. Here, I'll give just a taste of what was covered. For more detail, please check out the video above (courtesy of With the Best).

"Data governance applies to everything that we do," shared Janice Haith, Department of Navy's Deputy CIO. And, being responsible for complex, mission-critical initiatives such as enterprise architecture, software licensing, information assurance, data and help desk consolidation, and compliance, to name a few — means there is a lot of data to be dealt with. But, just like with any organization, IT must tell impactful stories that convey why an initiative must be undertaken and how it affects the end-user. Those who work in the armed forces know all too well the avalanche of acronyms used every day, but acronyms don't impart emotion, cause and effect, or urgency. So how do those in charge humanize the importance of projects in a way that garners buy-in from all necessary parties? The answer may surprise you — through effective storytelling.

This framework based on Apache Flume, Apache Spark Streaming, and Apache Impala (incubating) can detect and report on abnormal bad HTTP requests within seconds.  Website performance and availability are mission-critical for companies of all types and sizes, not just those with a revenue stream directly tied to the web. Web pages can become unavailable for many reasons, including overburdened backing data stores or content-management systems or a delay in load times of third-party content such as advertisements. The post How-to: Detect and Report Web-Traffic Anomalies in Near Real-Time appeared first on Cloudera Engineering Blog.

Red Hat wants to create a spectrum of hybrid services that IT organizations will be able to use to integrate both external and internal APIs.

We just added a new case study to our website. This case study focuses on how Deep Web data harvesting is helping combat online pharmaceutical fraud. Keep in mind that this same process can be applied to help protect any intellectual property that may be vulnerable online. The case study takes you through the entire BrightPlanet process… The post Case Study: Combating Pharmaceutical Fraud with Deep Web Data Harvesting appeared first on BrightPlanet.

Three vendors, including Microsoft and Amazon Web Services, have won a key U.S. government authorization that allows federal agencies to put highly sensitive data on their cloud-computing services. The AWS GovCloud, Microsoft's Azure GovCloud, and CSRA's ARC-P IaaS have received provisional authority to offer services under the high baseline of the government's Federal Risk and Authorization Management Program (FedRAMP), a set of security standards for cloud services. The FedRAMP high baseline, including more than 400 security controls, allows federal agencies to use AWS for highly sensitive workloads, including personal information, AWS said Thursday.

Organizations are experiencing a new emphasis when it comes to cybersecurity. They are moving from securing the perimeter to securing the data within it, which is the result of the proliferation of connected devices in organizations today: smartphones, tablets and the IoT. Organizations used to focus their efforts on keeping attackers outside the perimeter, because just a few years ago, the network perimeter was much more static and limited. Today, the perimeter is everywhere — and constantly moving.

by Joseph Rickert Just about two and a half years ago I wrote about some resources for doing Bayesian statistics in R. Motivated by the tutorial Modern Bayesian Tools for Time Series Analysis by…

On June 28 — 29, 2016 EMC, we will be at the MongoDB conference and on the expo floor in Booth #3. This year we will be focusing on "Modernize and Innovate" and will have a great session that you we hope you can attend. We will have a 45 minute — Lunch and Learn on June 29 @ 1240 — 1410 Title: Modernize to Innovate, Operating Infrastructure at Scale.  

List: Does business come first for vendors or is support a kindness?

Join us for another practical Statistica webcast: "What's in the Dresner Business Intelligence Market Study? — Analysis revealed!" Join Howard Dresner, thought leader in the Business Intelligence community and founder of Dresner Advisory Services, as he discusses key findings from his "2016 Business Intelligence Market Study." In this webcast, Howard will describe market forces currently impacting the BI and performance management landscape and will also discuss Dell Statistica's role within that market.

I love the Fourth of July. It's like Christmas for Patriots. In fact if Santa Claus also wore blue I might petition for his U.S. citizenship. But he doesn't, so he can live beyond the wall until his worker's visa kicks in on Christmas Eve. In December I wrote about how Santa Claus could create… The post Give Me Data, or Give Me Death! appeared first on Hortonworks.

By 2025, companies will be spending $60B on Big Data & Analytics solutions. How much will your organization be investing?  The Predictive Analytics Innovation Summit returns to Chicago onNovember 16 & 17 focusing on how your business can make the most of Predictive Analytics and drive your success. Unite with 300+ industry peers and join 40+ leading experts to confront the key challenges you are facing, and to gain new practical skills to bring back to your organization.  Our experts will be there to present the way they use predictive analytics in their organizations, including…

Hello DSC Member, Strata + Hadoop World in New York is happening September 26-29 and the Best Price expires midnight Friday, June 24. There's no event quite like it. Just a few days among the best minds in data can entirely change the way you think about data and provide tips and techniques that saves time–as one attendee put it: "I learned more about big data in just three days attending Strata + Hadoop World than in one year doing research on my own." If you haven't seen the program yet, check it out now–you'll get in-depth training and sessions from the best minds in data and business, make invaluable connections through the legendary networking opportunities, and see the latest in the thriving vendor ecosystem. Strata + Hadoop World sold out last year with over 6,300 attending.

This entry was posted in News and tagged , , , , , , , . Bookmark the permalink.