Big Data News – 20 Aug 2015

Top Stories

Listen to much of the well-peddled advice in the enterprise tech world today, and you'd have to be excused for coming away with the belief that "big data" holds all the answers your company is looking for. Too bad it often can't live up to that promise — at least, not in its traditional form. Turns out, what's commonly referred to as big data — all those vast "lakes" of numerical measures captured by the enterprise resource planning (ERP), consumer relationship management (CRM) and other business systems so enthusiastically mined by today's analytics tools — actually amounts to only 10 percent of the data an average company has at its fingertips, according to IDC.

The Hadoop security project called Ranger supposedly was named in tribute to Chuck Norris in his "Walker, Texas Ranger" role. The project has its roots in XA Secure, which was acquired by Hortonworks, then renamed to Argus before settling in at the Apache Software Foundation as Ranger. When Hadoop started, it was a set of loosely coupled parts primarily used in the back end of the big Internet companies like Yahoo. These parts were wrapped into distributions and marketed as Hadoop by the likes of MapR, Cloudera, and Hortonworks.

At the recent Agile 2015 conference the Agile Alliance hosted the fifth annual industry analyst panel briefing in which a group of commentators answered questions on the theme of agile trends and future directions.

Apache jClouds Apache jclouds is a framework provided by the Apache Software Foundation. The framework is written in Java and serves the purpose to provide an independent library for typical cloud operations. At present (November 2014), Apache jclouds provides 2 kinds of services: a compute service and a blob service [Apa14b]. Apache jclouds can be […]

Our benchmark research on next-generation business planning finds that a large majority of companies rely on spreadsheets to manage planning processes. For example, four out of five use them for supply chain planning, and about two-thirds for budgeting and sales forecasting. Spreadsheets are the default choice for modeling and planning because they are flexible. They adapt to the needs of different parts of any type of business. read more

Market research conferences pay a lot of attention to balancing out brain heavy learning with brain light entertainment. For example, ESOMAR had a SoapBox to allow anyone to rant on any topic for a couple of minutes. MRIA had ping pong tables freely available during breaks. MRA planned a bar hopping social event. I've seen caricaturists, photo booths, prize drawings, and more. read more

Retailers clearly understand that as consumers shop across multiple channels and expect seamless interactions, analytics will play a vital role in customer acquisition and retention strategies. Here is how analytics can be used to address the top 3 challenges. read more

This post covers scrum myths described by Ilan Goldstein, Certified Scrum Trainer.

A team of researchers this week announced that they can use DNA to store information for at least 2,000 years, and they're now working on a filing system to make it easier to navigate. The researchers, from ETH Zürich, an engineering, science and technology university, said DNA storage could preserve troves of historical texts, government documents or entire archives of private companies, all in a droplet…

XO Communications

With OnHub, Google hopes to simplify home networking, to encourage interest in IoT devices.

Big Data is everywhere, and as it becomes more of a priority organizations are embarking on data projects that are far more ambitious than before. However, for your Big Data strategy to be a success, there are 4 potential risks that you need to consider. In just over 3 weeks time the Big Data Innovation Summit will return to Boston, on September 9 & 10, uniting today's most innovative data professionals to explore both the new opportunities of Big Data and the associated risks. You can use code DSC300 to save $300 off all two-day pass prices. Check out the schedule here. 1. Security and Privacy Five of the six most damaging data thefts of all time have happened in the last two years.

Make Big Data and Analytics Your Cybersecurity Allies Did you know that the number of cyberattacks averages 117,339 every single day. In fact, your organization may have already been a victim–and not even know it. Since security solutions that are designed to detect or minimize a specific type of attack are no longer enough protection, a better approach is to bolster existing methods with big data analytics. The result is an improved ability to predict and identify attacks, and also shorten the time needed for remediation if one does get through.

Our partner O'Reilly Media is followed by thought leaders, business analysts, and journalists for recognizing game-changing technologies and providing insightful editorial perspectives. You may already have some of O'Reilly's highly rated reports–but have you seen the current collection of their top reports on data science, big data and business topics?

As markets become more competitive and fragmented, CPG brands need mechanisms to connect with consumers faster. That's why CPG leaders are building strategies around real-time data to optimize micromoments in stores. Here are a few example initiatives that are bringing brands closer to their consumer.

Space is limited. Reserve your Webinar seat now Please join us on September 15, 2015 at 9am PT for our latest Data Science Central Webinar Event: 5 Things Your Organization Needs to Succeed in Data Science sponsored by Teradata. What does it take to succeed in the world of Data Science and Analytics? It takes the right culture, people, process and governance, the ability to operationalize analytics, and special weapons and tactics. Join John Thuma in this latest DSC Webinar as he discusses his strategy to conquer the 5 challenges to succeed in data science. .

Advanced Analytics with Tableau Deep statistical analysis meets an intuitive, drag-and-drop visual analytics environment. Advanced analytics is an integral part of Tableau's mission to help people see and understand their data. Read this whitepaper to learn how Tableau can help with all stages of an analytics project by using advanced capabilities. See the specific steps of how to use Tableau in analytics projects with these advanced capabilities: Segmentation and cohort analysis What-if scenario analysis…

Gartner's annual Hype Cycle is out, and IoT and autonomous cars are in this year. Big data, however, is losing some of its luster.

The FCC estimates that telecommunications providers lose $150 million each year from subscriber fraud through cloned cell phones, data fraud on SIM cards and cell phones purchased with fake names. RCR Wireless News reports that estimated global fraud loss in the telecommunications industry increased 15 percent from 2011 to 2013 to a total estimated loss of $46.3 billion, primarily through identify theft, device theft and unauthorized network access.

Listen to much of the well-peddled advice in the enterprise tech world today, and you'd have to be excused for coming away with the belief that "big data" holds all the answers your company is looking for. Too bad it often can't live up to that promise — at least, not in its traditional form. Turns out, what's commonly referred to as big data — all those vast "lakes" of numerical measures captured by the enterprise resource planning (ERP), consumer relationship management (CRM) and other business systems so enthusiastically mined by today's analytics tools — actually amounts to only 10 percent of the data an average company has at its fingertips, according to IDC.

Published Date: 2015-08-19 17:18:17 UTC Tags: Analytics Title: Using Data to Drive Online Marketing Subtitle: Making Data Driven Decisions

Project Calico have released Calico v1.0, a virtualised layer 3 networking solution for VM and container workloads, which enables flexible, scalable and secure IP-based communication without the need for an overlay network. The release includes integration with the OpenStack 'Neutron' networking stack, and proof of concept level' integrations with Docker, Kubernetes and other related technology.

Listen to much of the well-peddled advice in the enterprise tech world today, and you'd have to be excused for coming away with the belief that "big data" holds all the answers your company is looking for. Too bad it often can't live up to that promise — at least, not in its traditional form. Turns out, what's commonly referred to as big data — all those vast "lakes" of numerical measures captured by the enterprise resource planning (ERP), consumer relationship management (CRM) and other business systems so enthusiastically mined by today's analytics tools — actually amounts to only 10 percent of the data an average company has at its fingertips, according to IDC.

Published Date: 2015-08-19 16:17:29 UTC Tags: Analytics, Chief Analytics Officer, Data Science, Open Data, Predictive Analytics Title: Using Public Data For Better Public Services Subtitle: How can data be used in the public sector?

 

Medical experts agree that personalized care is integral to the fight against cancer, but providing such care is easier said than done when more than 14 million people are diagnosed with cancer each year. In this podcast, hear how one of the world's leading cancer researchers plans to use big data and IBM Watson to stem the global tide of cancer.

Originally posted on AnalyticBridge When most technical professionals think of Big Data analytics today, they think of Hadoop. But there are many cutting-edge applications that Hadoop isn't well suited for, especially real-time analytics and contexts requiring the use of iterative machine learning algorithms. Fortunately, several powerful new technologies have been developed specifically for use cases such as these.

When environmental group Conservation International (CI) needed to measure how much of the world's rainforests have been chopped down, it turned to HP. CI wanted to use satellite imagery to compare the amount of rainforest there today to images of the same areas three decades ago. HP engineers built a tool using the company's DistributedR program that automatically scanned the images and categorized them, pixel-by-pixel

When environmental group Conservation International (CI) needed to measure how much of the world's rainforests have been chopped down, it turned to HP. CI wanted to use satellite imagery to compare the amount of rainforest there today to images of the same areas three decades ago. HP engineers built a tool using the company's DistributedR program that automatically scanned the images and categorized them, pixel-by-pixel to determine which areas were forested and which no longer are. For an organization like CI which collected more than $140 million in contributions last year it is a classic example of an enterprise working with a big-name technology vendor.

Although predictions can be life-or-death propositions for businesses, inspiring confidence in business plans can be highly challenging if they can't be justified with high-quality predictive models. Discover why tools, platforms and best practices for building, refining and putting predictive analytics into operation are at the very core of any well-run, modern organization.

Did you know that Facebook can look at its user's posting patterns and moods in order to predict their future romantic relationships? Use Code PATIMES15 for 15% off a two day pass or combo pass. (Excludes workshops & All Access) As a posting on the blog "Fierce Big Data" put it on February 19, 2014: […] The post Predictive Analytics & Facebook A Love Story? appeared first on Predictive Analytics Times.

LucasFilm streamlined managerial processes across the board with the help of data that enabled intelligent decisions. Learn more about how to leave the spreadsheet shuffle behind.

James Tamm, author of the book Radical Collaboration, gave the closing keynote at the recent Agile 2015 conference. His talk was titled "Want Better Collaboration" don't be so defensive and provided advice on how to be more collaborative by understanding the factors which cause our own defensiveness.

Originally posted on Data Science Central What is Hadoop: Hadoop is a framework written in Java for running applications on large clusters of commodity hardware and incorporates features similar to those of the Google File System and of MapReduce. HDFS is a highly fault-tolerant distributed file system and like Hadoop designed to be deployed on low-cost hardware. It provides high throughput access to application data and is suitable for applications that have large data sets (In the range of terabytes to zetabytes). Who uses Hadoop: Hadoop is mainly used by the companies which deals with large amount of data. They may require Process the data, Perform Analysis or Generate Reports. Currently all leading organizations including Facebook, Yahoo, Amazon, IBM, Joost, PowerSet, New York Times, Veoh etc are using Hadoop.

QCon London 2016 will take place at the The Queen Elizabeth II Conference Centre on March 7-11, 2016 next year, and registration is now open.

Netflix has open sourced Falcor, a JavaScript library offering a model and asynchronous mechanism for fetching JSON data from multiple sources.

Originally posted on Data Science Central This infographic was originally published by Intel and can be found here. It dates back to 2014 but still provides a very comprehensive view of this fast expanding field.

Several energy and utilities challenges are set to impact the industry, with a number of macro changes altering the way companies are doing business and making technological investments. The following energy and utilities challenges are all expected to have a significant impact on the industry over the next few years:

Everyone needs to be vigilant about security on the Web today. One particular threat — the man-in-the-middle attack — is a risk anytime you are communicating over the Internet, and an attacker has… …

In anticipation of his upcoming keynote conference presentation at Predictive Analytics World for Healthcare Boston, Sept 27-Oct 1, 2015, we asked Dr. Michael Dulin, Chief Clinical Officer for Analytics and Outcomes Research at Carolinas Healthcare System, a few questions about incorporating predictive analytics into healthcare. Catch a glimpse of his keynote presentation, Turning Big Data into […] The post Wise Practitioner Predictive Analytics Interview Series: Dr. Michael Dulin, Carolinas Healthcare System appeared first on Predictive Analytics Times.

I met an ex-colleague on a flight recently, and as we were catching up, the subject of data management for 4D seismic data came up. My ex-colleague said that although the Oil & Gas industry have been talking about 4D seismic (where repeat images of the subsurface are combined to see how reservoir fluids have moved) for years, things aren't that different to warrant a new approach. Not that different? Really? That reminded me of a quote I'd read, "Different stuff is different." Dave Dellanave, a trainer, coined that phrase (although he used a different "s" word).

I met an ex-colleague on a flight recently, and as we were catching up, the subject of data management for 4D seismic data came up. My ex-colleague said that although the Oil & Gas industry have been talking about 4D seismic (where repeat images of the subsurface are combined to see how reservoir fluids have moved) for years, things aren't that different to warrant a new approach. Not that different? Really? That reminded me of a quote I'd read, "Different stuff is different." Dave Dellanave, a trainer, coined that phrase (although he used a different "s" word).

I met an ex-colleague on a flight recently, and as we were catching up, the subject of data management for 4D seismic data came up. My ex-colleague said that although the Oil & Gas industry have been talking about 4D seismic (where repeat images of the subsurface are combined to see how reservoir fluids have moved) for years, things aren't that different to warrant a new approach.

Microsoft has announced the presence of a critical flaw that exists in all versions of Internet Explorer, allowing for remote code execution. This flaw applies to all current Windows systems and should be patched as soon as possible.

Tiny computers, real-time depth sensing, and breakthrough memory technology are among the innovations featured at this year's Intel Developer Forum. CEO Brian Krzanich detailed how connected devices will change the way we all do business.

In major league baseball, batters take less time to decide whether to swing at a pitch than they do to blink. They observe, interpret, evaluate and decide, all in fractions of a second: reading the seams of the ball eight hundredths of a second after it leaves the pitcher's hand; determining the type of pitch knuckle ball, fastball, curveball, slider; deciding whether it can be hit .22 seconds later; starting to swing, when the ball is only half way between the mound and home plate. Total elapsed time: 0.3 second. Design by francesca greggs & lemonwood design Physicians have longer to form an opinion. But they have less room for error. For them, the count never goes to 3 and 2. They get one shot at a correct diagnosis.

Agile software development is in a rut. The most popular agile methods are consistently mis-applied, mis-understood, mis-used, and all to often abandoned by the companies who need them the most. But worse than that, our popular agile methods are not actually agile themselves! This article proposes a new approach that recognizes and works around limitations in human cognition and decision making…

Data is growing at a faster pace than ever. The amount of global data will reach 40 Zettabytes in just five years, equal to about 8.5 trillion DVDs. read more

To the layperson anxious for answers to complicated questions, the very idea of bringing together sets of disparate data and turning it into precious insights may seem like magic, a modern day alchemy, a goal placed well beyond the grasp of mere mortals. Fortunately, this is no longer the case, thanks in part to bagatelle-proportioned advances in Big Data and Big Data analytics and massive advances in imagination; we are able to look into the past, the present and the future, with absolute certainty. read more

Since phishing typically impacts users on a personal financial level, it's not something that is often given much thought. However, with a little careful training and proactive measures you can ensure that your users and your business are protected from phishing scams. read more

Originally posted on Data Science Central This is of course the wrong question. I use R because I'm familiar with it, more than SAS or Python. And I use R mostly for graphics / visualization. Though things have changed, I consider R mostly as a tool to perform ad-hoc analysis or EDA (exploratory data analysis) rather than a component of enterprise analytic applications / production code running in batch mode or accessed via API's. Is there an enterprise version of R? Also R used to be limited by the amount of RAM, not sure how easy it is to go around this limitation. RHadoop is R for Hadoop, I suppose that's a possible solution for big data, though I'm not familiar with the product. Picture from Kunal Jain's blog I used SAS a while back, and I know it has significantly improved over the last 10 years, including offering a better sort, hash tables, and very fast SAS for really big data.

eTix, the largest ticketing service provider in North America, decided to implement real-time analytics in the cloud rather than on-premises. And they chose Attunity CloudBeam and Amazon Redshift to…

There's a lot of ways companies are trying to store and analyze data today, but one of the most compelling involves graph analytics. MarkLogic, which supports a variant of the graph database known as a semantic triple store, hopes that a recent update puts this type of big data analysis on the map for more customers. MarkLogic develops a so-called multi-modal database that can shapeshift depending on the problem at hand. Its document store and search engine capabilities shined when MarkLogic was brought in to replace an Oracle database for the big federal healthcare marketplace created by the Affordable Care Act.

Originally posted on Data Science Central 1 The Tragedy As most of you guys know, the country of Nepal has been hit with a powerful earthquake killing over 1,800 lives(as of this writing). Thousands are injured. Lives are disrupted. Historic temples, monuments, buildings are leveled. I plead you to help whatever you can. Money. Time. Good wishes, thoughts, and prayers. Scroll to the end of this post for where you can help. Thank you and enjoy the post. 2 Mapping & Visualzing the Nepal Earthquake For this analysis, I scrape the data from this website, which has the latest earthquake data worldwide.

Originally posted on Data Science Central In this article we will discuss about using S3 as replacement of HDFS (Hadoop Distributed File System) on AWS (Amazon Web Services), and also about what is the need of using S3. Before coming to original use-case and performance of S3 with Hadoop let's understand What is Hadoop and What is S3 Let's try to understand what the exact problems are & why HDFS is not used in cloud. When new instances are launched on the cloud to build a Hadoop cluster they do not have any data associated with them. So one approach is to copy the entire huge dataset on them, which is not feasible due to various reasons including bandwidth, time to copy & associated cost. Secondly after completion of jobs once again you will need to copy the result back before terminating cluster machines otherwise the result will be lost when instances are terminated & you will not get anything. Also due to associated cost running the entire cluster just for data collection is not feasible. To read complete article please visit: http://www.technology-mania.com/2012/05/s3-instead-of-hdfs-with-hadoop_05.html

Astronomers developing the Event Horizon Telescope (EHT), a synchronized network of radio antennas with a viewing field as large as the Earth, hope to soon take the first ever photo of Sagittarius A*, a massive black hole at the center of the Milky Way. The amount of data being collected by 10 radio antennas around the globe, often located atop mountains, has forced the astronomers to turn to helium-filled hard drives for their data storage. For example, one of the radio telescopes that make up the EHT array is located 15,000 feet above sea level at the top of the Sierra Negra, an extinct volcano in Mexico. When researchers attempted to collect data using 32 conventional hard drives, 28 of them failed.

Data is driving all industries, from corporations to the government to healthcare. While data has become a precious commodity for growing your business, it's also become unwieldy and unmanageable. Couple that with a lack of educational programs focused on data science and a skills gap emerges. The University of Wisconsin (UW) has stepped up to address this need with its graduate program in data science, which students can complete online. The program will be offered through the Continuing Education, Outreach and E-Learning (CEOEL) division of the University of Wisconsin Extension, which includes 26 UW System campuses in the U.S., but students from around the world can apply.

Data is driving all industries, from corporations to the government to healthcare, but many organizations find that they have a hard time managing and analyzing the mass amounts of data they collect. While data has become a precious commodity for growing a business, it's also become unwieldy and unmanageable. Couple that with a lack of educational programs focused on data science, and the need for more professionals in the field becomes clear. The University of Wisconsin (UW) has stepped up to address this need with its graduate program in data science, which students can complete online. The program will be offered through the Continuing Education, Outreach and E-Learning (CEOEL) division of the University of Wisconsin Extension, which includes 26 UW System campuses around the country, but students can apply from all around the world.

An estimated 1.1 petabytes of data related to a handful of databases, search engines and other caching technologies is exposed online, a data security startup reports. A key security flaw centers on default settings that lack basic configurations for authentication, encryption, authorization or other security controls. BinaryEdge, a security services startup based on Zurich, Switzerland, said its Internet data exposure survey looked at four representative technologies the company regularly uses: MongoDB; the key-value cache and store technology Redis; Memcached, the distributed memory cache system; and ElasticSearch, a &#quot;full-text&#quot; search engine. Given its ability to scale, the MongoDB database was found to have the highest data exposure: nearly 620 terabytes.

When you sell a product your customers can find elsewhere, is there any way to ensure they'll turn to you, instead of your competition, when the time comes to make a purchase? Ultimately, that's the challenge every commodity-based business faces: What can you do to make sure your brand is the clear first choice when other options are available? For The CSS Group, a leading insurance provider headquartered in Lucerne, Switzerland, standing out from the crowd is key to their success. "If you are in a business like we are, where the products are commodity, you need other differentiators than the product…

By the end of the summer, Netflix will close its last data center and move its entire streaming service to the cloud with help from AWS. It's a lesson for companies large and small.

The amount of data we're generating is doubling roughly every 18 months, which is eerily similar to Moore's Law scale for the growth of processing power. But much of that new data will remain invisible to those who would use it. The situation around this dark data threatens to derail big data initiatives before they can get off the ground. Gartner defines dark data as &#quot;the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes.&#quot; By some estimates, including one by Tamr, up to 90 percent of the data stored in a typical organization is dark data. The dark data problem is largely one of organizational structure, and has parallels in the master data management (MDM) issues that are threatening data lakes, the ongoing situation with maintaining multiple data silos, and Hadoop's Wild West approach to data governance.

by Gregory Vandenbrouck Software Engineer, Microsoft This post is the fourth in a series that covers pulling data from Microsoft SQL Server or MySQL/MariaDB on Azure to an R client on Windows or… …

User-generated data can be rich with insights on consumer perception, priorities and response. This is often an untapped or underutilized source for organizations in any industry, despite having big implications for customer experience management. For utility companies in particular, user-generated data can significantly inform business decision-making.

Ember 2.0 has been released — with zero new features. The decision has been met warmly by the JavaScript community, who have widely praised the framework for remaining backwards compatible with 1.13. Ember 2.0 only removes the features that were deprecated in Ember 1.13, meaning that apps that run on Ember 1.13 without deprecation warnings should also run on Ember 2.0.

I wrote a series of blog posts on Bayesian modeling with R and Stan. Bayesian modeling with R and Stan (1): Overview Bayesian modeling with R and Stan (2): Installation and an easy example Bayesian modeling with R and Stan (3): Simple hierarchical Bayesian model Bayesian modeling with R and Stan (4): Time series with a nonlinear trend Bayesian modeling with R and Stan (5): Time series with seasonality Stan is a growing platform for MC(MC) computing implemented with C++. Compared to WinBUGS or OpenBUGS, it is very fast and programmable intuitively. This series of the posts show how to install Stan on R, how to run it, and how to apply it to actual datasets. I hope you'll find it to practice Bayesian modeling easier than ever.

Published Date: 2015-08-17 18:04:33 UTC Tags: Analytics, Big Data, Global Business, Innovation, Product Innovation Title: Big Data Hits the Runway: How Big Data is Changing the Fashion Industry Subtitle: Who knew the cloud could be so fashionable?

This entry was posted in News. Bookmark the permalink.