Big Data News – 27 Apr 2016

Today's Infographic Link: What Happens in an Internet Minute?

Top Stories
Prescient, a risk management firm co-founded by former naval intelligence officers and best known for federal counterintelligence and national security programs, launched its commercial spinoff. Prescient Traveler, a sophisticated traveler risk management system, analyzes petabytes of data from over 38,000 sources in real-time. Further, prescriptive analytics advise travelers on how to mitigate those risks or "react smartly" to the dangers.

Streaming and batch big data analytics technology Apache Apex has been elevated to a Top-Level Project by the Apache Software Foundation. Used by organizations including Capital One and GE, the technology can help developers more quickly create apps that leverage real-time data.

In anticipation of his upcoming conference presentation, Leveraging Hands on Approaches to Identify Actionable Topics in Property Insurance at Text Analytics World Chicago, June 21-22, 2016, we asked Frederick Guillot, Manager, Research and Innovation at Co-operators General Insurance Company, a few questions about his work in text analytics. Q: In your work with text analytics, what… The post Wise Practitioner – Text Analytics Interview Series: Frederick Guillot at Co-operators General Insurance Company appeared first on Predictive Analytics Times.

In this special guest feature, Ulrik Pedersen, Chief Operations Officer at TARGIT, highlights the constant battle between IT and finance on Total Cost of Ownership (TCO) when it comes to implementing a new BI solution.

The insideBIGDATA Guide to Streaming Analytics is a useful new resource directed toward enterprise thought leaders who wish to gain strategic insights into this exciting new area of technology. Many enterprises find themselves at a key inflection point in the big data timeline with respect to streaming analytics technology.

With all of the media coverage surrounding the 2016 U.S. Presidential election, you can't help but be intrigued.  Republican, Democrat or Independent – no matter what your personal views may be, one can assume that most of the population would be exercising their right to vote.  But did you know that less than 50% of…

For half a decade, the federal government has operated under a policy to prioritize cloud computing as agency CIOs embark on new technology initiatives, but in such a vast and varied IT environment, it's not been a quick transition. [ Related: Government cloud adoption efforts lag as security concerns persist ] Five years into the so-called cloud-first policy, federal CIOs say they continue to struggle with procurement and management challenges, while security concerns about the safeguards around sensitive data still linger.

Professor Norm Matloff from the University of California, Davis has published From Algorithms to Z-Scores: Probabilistic and Statistical Modeling in Computer Science which is an open textbook. It approaches statistics from a computer science perspective. Dr. Matloff has been both a professor of statistics and computer science so he is well suited to write such a textbook.

The Node.js Foundation this week announced the release of Node.js 6, a version that is considerably faster and more secure than its predecessor.

A look at how Randstad Technologies got creative in regard to bolstering gender diversity in the workplace.

How easy it is to build a learning machine?  Shouldn't one just hire some Machine Learning PhDs and have them run their algorithms?  Well, this is most probably a good idea, but it won't be enough.  I'll try to explain why in this blog entry. Before answering our questions, let's define what we are dealing with.  A Learning Machine is a machine (a software, a web site, a mobile app, a robot, pick your favorite) that performs a task, and that gets better and better as it performs it.  In recent years, some learning machines made headlines.  For instance, IBM Watson defeated best humans at Jeopardy few years ago. 

Google, Uber, Ford, Volvo and Lyft form self-driving car 'advocacy' group. State and U.S. regulators seem poised to crack down on feet- and hands-free driving, so Google's gang wants to head them off. Autonomous vehicles are likely to be a moneyspinner for the industry, so you can see their motivation. But it's easy to be cynical, given the fact that it's led by a former NHTSA honcho. Yes, the revolving door is alive and well in Washington, DC. In IT Blogwatch, bloggers revolve indoors. Your humble blogwatcher curated these bloggy bits for your entertainment.

Pair programming is one of the core techniques of eXtreme Programming and has been shown to be effective for knowledge sharing as well as code quality, but it is a practice that is often not used, even in the most agile of organizations. Linda Cook explores why that is and provides some advice on how to encourage teams to try the practice. By Linda M Cook

Alex Blewitt speaks to Martin Thompson at QCon London 2016 on his open-source high-performance networking stack Aeron, and how it avoids garbage collection delays for consistently low latency. Martin explains the use of the xadd processor instruction to avoid spinning wait loops and looks ahead to where CPU technology is heading in the future. By Martin Thompson

IBM is expanding its flash storage lineup to power cloud data centers that carry out so-called cognitive computing. The company's newest FlashSystem arrays, introduced Wednesday, combine its fast and relatively affordable FlashCore technology with a scale-out architecture designed to be easy to expand. Cognitive computing, which IBM defines as real-time data analysis for immediate, automated decision-making, is at the heart of much of IBM's current technology push for enterprises and service providers.

Node.js 6.0 has been released, becoming the new current version. It comes with performance improvements, better test and documentation coverage, better security and wide support for ES2015.

The rise of big data systems is primarily driven by web based application paradigms for the B2C market. The growth of B2B solutions delivered through web based application models is driving a few shifts in enterprise architecture. It is as much about the convergence of two different approaches, as it is about the conflict in basic conceptual models.

IBM is expanding its flash storage lineup to power cloud data centers that carry out so-called cognitive computing. The company's newest FlashSystem arrays, introduced Wednesday, combine its fast and relatively affordable FlashCore technology with a scale-out architecture designed to be easy to expand. Cognitive computing, which IBM defines as real-time data analysis for immediate, automated decision-making, is at the heart of much of IBM's current technology push for enterprises and service providers.

Transforming US community colleges starts with analyzing the data that you have. Learn how leading schools are building an analytics ecosystem.

Here's this week's news in Data Science and Big Data. Don't forget to subscribe if you find this useful! Interesting Data Science Articles and News IBM Watson Has a Few Predictions for 'Game of Thrones' Season Six — IBM Watson gives TV forecasting a go. Using personality insights, Watson predicts the fate of the characters in Game of Thrones. Data Visualization Drives the Era of Information Activism — This generation of passionate and tech savvy individuals is using data visualization tools for self expression.

What do you do during the calm before the storm? Discover how insurers can harness weather data in their operations to offer weather alerts for policyholders, helping prevent claims while boosting customer retention.

Seeking a "generational change" in the economics of in-memory computing, database specialist Redis Labs Inc. has collaborated with South Korean memory chip powerhouse Samsung Electronics to accelerate the processing and analysis of bulging datasets using next-generation memory technology designed to significantly cut memory costs. Redis Labs, Mountain View, Calif., said this week its flash-based platform running on standard x86 servers is available now as part of the company's enterprise cluster.

AI has already proved its prowess in chess, Jeopardy and the ancient game of Go, but it's now come out victorious in yet another arena: the classic game of Foosball. A group of computer engineering students at Brigham Young University have spent the past semester creating a robotic, computer-controlled Foosball table with the goal of beating human players. The table is constructed so that a camera mounted above can track the movement of the ball, while an algorithm controls the rods on which the plastic players are attached.

John Oliver takes a look at both G1 and Shenandoah, explaining how they work, what are their limitations, providing tuning advice. He also looks at recent and future changes to garbage collection.

ODPi, a nonprofit organization accelerating the open ecosystem of big data solutions, announced the first release of the ODPi Runtime Specification and test suite to ensure applications will work across multiple Apache Hadoop® distributions.

Command Query Responsibility Segregation (CQRS) was never meant to be the end goal of what we are trying to achieve, it is a stepping stone towards the ideas of Event sourcing, Greg Young stated in his presentation at the Domain-Driven Design Europe conference earlier this year. He noted though that just applying CQRS is still a valuable pattern.

You've probably heard terms like search, index, mine, extract, and even harvest referring to data collection. We use the term harvest and are frequently asked why we use that term. Watch our second educational video where in less than two minutes we explain why we use the term harvest and how we harvest data from… The post Video: Why We Use the Term Harvest When Talking About Web Data Collection appeared first on BrightPlanet.

By remotely monitoring patients, doctors can reduce costly visits while still providing top-quality care for their patients. Real-time analytics allows doctors to immediately spot problems and intervene, even before patients know something is wrong.

A graph database with a quadrillion nodes? Such a monstrous entity is beyond the scope of what technologist are trying to do now. But with the latest release of the Neo4j database from Neo Technology, such a graph is theoretically possible. There is effectively no limit to the sizes of graphs that people can run with Neo4j 3.0, which was announced today, says Neo Vice President of Products Philip Rathle.

We see time and time again that users don't always have the same level of appreciation for or understanding of why we need device management tools.

The challenge of the IoT is finding efficient ways to reach networks. Low-power wide-area networks are an option.

Five simple ways to reinvigorate employees — solutions that not only take up little time, but will lead to a higher-performing workforce.

The European Commission plans to invest a billion euros in quantum computing as part of a larger initiative to strengthen Europe's competitiveness in the digital economy. The investment, about $1.1 billion, will be made through an effort called Quantum Flagship, akin to existing "flagship" projects in the European Union focused on graphene and on the human brain. It is expected to be partly funded by EU research and innovation programs. The aim is "to place Europe at the forefront of the second quantum revolution, bringing transformative advances to science, industry and society," said Nathalie Vandystadt, an EC spokesperson.

When assessing the reality behind today's AI technology, businesses need to think about how it can perform in specific tasks rather than hoping for a do-it-all tool.

Over the last couple of years, Stitch Fix has amassed one of the more impressive data science teams around. The team has grown to 65 people, collaborates with all areas of the business, and has a well-respected data science blog plus several open source contributions.As a member of this team since late 2014, and someone… The post Data Science at Stitch Fix appeared first on Predictive Analytics Times.

Paxata, provider of the Adaptive Data Preparation™ platform for the enterprise, announced the availability of its Spring '16 product release. Paxata's latest release bridges the gap between analysts and IT with new intuitive capabilities, providing connected information to every person in the enterprise without compromising on security, scale, and cost efficiency.

Riverbed SteelConnect can be deployed as a virtual appliance on existing infrastructure or deployed as a physical appliance acquired from Riverbed.

See how Hershey's Brazil implemented IBM Cognos TM1 as the basis of an enterprise planning solution enabling enhanced analysis of sales and customer data to identify key value drivers and increase profit margins. And, get the full, inside story from The Hershey Company at IBM Vision 2016.

Most Virtual Reality content is expensively and professionally produced by technology companies. For the technology to proliferate and satiate consumers wants, we need more ways to produce and share content.

Jon Jagger takes a look at pair programming, a technique focused on the team rather than the individual, wondering why it is not more used if it as effective as some of the evidence shows. By Jon Jagger

When CPL Training Group started providing Web-based training several years ago, the UK company expected strong but steady growth.  But when unexpected demand threatened to freeze the relational database serving the application, the company turned to big data tech for solutions. If you work in certain fields in the UK, you're required by law to receive a certification, or a "personal license," that shows you have received a minimum level of training.

Datawire have released their open source Datawire Connect framework, which allows developers to 'resiliently connect microservices' using automatically generated RPC-style client libraries for Java, Python or NodeJS services. The client libraries generated provide service registration and discovery, dynamic load balancing and routing, automated timeouts and circuit breakers.

Today at GraphConnect Europe 2016, Neo Technology announced the release of Neo4j 3.0, which includes a new binary protocol for transmitting data between server and client, and a new set of standardised drivers for interacting with the database, along with stored procedure support and higher performance and capacity. InfoQ spoke to Neo Technology to find out more.

Teradata on AWS The availability of Teradata's eponymous database on both Amazon's AWS and Microsoft's Azure was announced in late 2015 for availability in Q1 2016. Lo and behold, the Teradata DBMS was duly available via the Amazon Marketplace a few weeks ago, just sqeaking into the promise of beaing available in Q1 2016. Many thanks to Teradata's own Mike Whelan for supplying the URL which yours truly was seemingly unable to find without help. Doh! At the time of writing there there is no sign of Teradata on Azure, but hey-ho, at least Teradata on AWS is available. EC2 Configurations OK, so Teradata is available on AWS so what configurations are available? Well, at least initially, Teradata is only available on AWS as a single node SMP server. Multi-node MPP availability is slated for Q4 2016. Teradata MPP via the public cloud is a bigger challenge than Teradata SMP.

I am both a very frequent (400k+ mileage last year) and demanding (lots of emails to airline CEOs) traveler. As I sit inside an aluminum can at 30,000 feet, I spend time thinking about the customer experience I have as an air traveler and contrasting that experience with what I find in the rest of my life. I think about the way airlines interact with me and see where that pales when compared to the way that other organizations (from e-tailers to technology companies, from traditional retail to insurance organizations) go about their communications. I'm often left unimpressed by the communication of the travel industry generally, and airlines specifically. Given that the bulk of my travel is with my national airline, Air New Zealand, that airline and its CEO, Christopher Luxon, has come in for an arguably unfair amount of criticism.

The transition to hyperconvergence will be gradual and measured, rather than frantic and disruptive.

The best-known graph database has a brand new version, with lots of the maturity associated with a third release.

The number of people who have lost their jobs over the failed mobile effort at Microsoft and Intel is massive.

Last month, SVDS CEO Sanjay Mathur spoke at O'Reilly's conference on engineering leadership, Cultivate, about How to Eat Change for Breakfast. While on site at the event, he spoke with Radar's managing editor, Jenn Webb, about creating an Experimental Enterprise. You can watch their full conversation above. Here are some notable highlights: At 0:21, Sanjay says that the reason we call it an Experimental Enterprise is the importance of adapting based on the information that you have. Data needs to change how you operate, he says, not just get stuck in a report.

Microsoft has recently announced the intention to integrate Linux Bash with Windows 10, making it possible to run native Linux ELF64 binaries on their operating system. To avoid incorrect speculation on how this is possible, Deepu Thomas, the Leader of the Windows Subsystem for Linux team, has provided details on how Linux runs on Windows.

Apple has announced that new watchOS apps submitted after June 1, 2016 must be native apps built with the watchOS 2 SDK or later. Furthermore, Apple has refreshed its review guidelines for the App Store.




Narrative Science, the leader in advanced natural language generation (Advanced NLG) for the enterprise, today announced availability of Narratives for Power BI, a first-of-its-kind extension for the Microsoft Power BI community. The extension, now available for download, allows users to access important insights from their data in the most intuitive, consumable way possible — dynamic, natural language narratives.

    Tableau is a powerful visualization tool. However if you are like many data analysts you spend most of your time preparing data — leaving you less time to visualize data. Download our new whitepaper and see how Alteryx enables   Fast blending of multiple, disparate data sources   Cleansing and shaping of data sources to perfect for output   Analyzing data with predictive and spatial analytics, no coding required   All your work is done in an intuitive drag anddrop workflow that is re-useable, modifiable and outputs to a Tableau Data Extract — saving you time and letting you visualize your data faster. Learn more about how you can benefit from Alteryx data blending for Tableau.       Download Now        

Atlassian, makers of development tools such as JIRA and Confluence, have just released version 5.11 of their continuous delivery tool Bamboo with a host of new features to help teams scale and collaborate. The key feature in this new release is the ability to scale from 100 to 250 elastic build agents. By Craig Smith

by Lixun Zhang, Data Scientist at Microsoft As a data scientist, I have experience with R. Naturally, when I was first exposed to Microsoft R Open (MRO, formerly Revolution R Open) and Microsoft R Server (MRS, formerly Revolution R Enterprise), I wanted to know the answers for 3 questions: What do R, MRO, and MRS have in common? What's new in MRO and MRS compared with R? Why should I use MRO or MRS instead of R? The publicly available information on MRS either describes it at a high level or explains the specific functions and the underlying algorithms. When they compare R, MRO, and MRS, the materials tend to be high level without many details at the functions and packages level, with which data scientists are most familiar.

Wayne Beaton overviews the current state of Eclipse, discussing how to improve the user experience, support channels, and how to tap into the funding available to work on Eclipse IDE improvements. By Wayne Beaton

The latest version of IBM BigInsights offers several value-add services that can be used with its core distribution of open source Hadoop for managing big data.

A MongoDB database filled with the personal information of 93 million Mexican voters was found configured improperly on the Amazon cloud. The incident raises issues of how information is protected in the cloud.

Zscaler's alternative to the VPN takes the IT organization out of the business of remotely configuring and managing remote access infrastructure.

The MapR Hadoop distribution replaces HDFS with its proprietary file system, MapR-FS, which is designed to provide more efficient management of data, reliability and ease of use.

Aerospike, the high-performance NoSQL database company recognized for "speed at scale" leadership and as the NoSQL leader in the digital media and ad tech industries, announced that Manage.com Group Inc. (Manage) has selected Aerospike to power its innovative demand-side platform for mobile advertising. Manage chose Aerospike to support its goals for technology evolution and business growth.

This year, ODSC is growing to bring together 2,500+ of the best and brightest at ODSC East in Boston! There will be 20+ workshops, 10+ training sessions, & 100+ speakers! Not to mention a career fair with top-notch companies ready to hire!

Big data use is only growing, with businesses realizing that their options are to jump on the craze or find themselves in an uphill battle against those utilizing the faster and more accurate information. However, with the next generation of entrepreneurs stepping onto the playing field, they're doing more than just joining the craze — they're innovating it. Rather than simply using the advantages already present, driving big data to grow, they're pushing forward and transforming the way data is accessed and used.

Introduction At Hortonworks we are pleased to announce the inaugural Kafka Summit 2016 to be held in San Francisco on April 26. The inaugural Kafka Summit is a full day conference that brings together the Apache Kafka community. At Hortonworks, since we are committed to delivering data-in-motion and data-at-rest completely in the open, we continue… The post Inaugural Apache Kafka Summit 2016 Kickoff appeared first on Hortonworks.

As the star of NoSQL technology rises over a diverse database landscape, many data scientists are turning to NoSQL in its various implementations to help them accomplish a host of data management tasks. Find the data storage and management solution that's right for you when you begin your NoSQL journey today.

Combining big data with the cloud seems like the perfect technology unity. Despite the many advantages that the cloud offers, many corporations have been slow to make this transition. In fact, according to a Gartner survey from 2014, less than half of all major corporations with big data programs confirm that they're using the cloud.

Visual Studio 2015 Update 2 has brought several new capabilities and improvements to VS2015. One area that has seen improvement is compiling code for .NET Native, yielding better support for generics and an improved backend compiler.

Over the past year our education programme has focussed on nurturing new talent, from sponsoring students to facilitating industry placements and scaling this with Data Talent Scotland. Increasing the pool of talented graduates is great; however, as the use of data to drive insight and decision making becomes increasingly important to many organisations, there is also a need and an opportunity to develop existing talent.  

The accelerating pace of global business means that enterprises need more agile data-related systems and practices. Becoming more agile — and succeeding at it — isn't always easy given existing technology investments, constant technological evolution, and lingering cultural obstacles. No matter how agile your company is or isn't now, consider these important points.

Impala 2.5, now shipping in CDH 5.7, brings significant performance improvements and some highly requested features. Impala has proven to be a high-performance analytics query engine since the beginning. Even as an initial production release in 2013, it demonstrated performance 2x faster than a traditional DBMS, and each subsequent release has continued to demonstrate the wide performance gap between Impala's analytic-database architecture and SQL-on-Apache Hadoop alternatives. The post Apache Impala (incubating) in CDH 5.7: 4x Faster for BI Workloads on Apache Hadoop appeared first on Cloudera Engineering Blog.

A simple guide to understanding SQL-on-Hadoop choices When it comes to SQL-on-Hadoop, it is easy to feel overwhelmed with the number of choices available in the ecosystem. However, in reality, choosing the right tool to best help you tackle the job at hand can be simple if you know what you are planning to achieve, which in most cases is typically apparent from your job title or team function. There are three main classes of SQL-on-Hadoop tools available – ETL and Data Preparation Tools, Analytic Databases, and Data Engineering Tools.

AngularJS has become the world's most popular JavaScript framework for creating web applications. And now Angular 2 and TypeScript have brought true object oriented web development to the mainstream, using a syntax that is strikingly close to Java 8. In this article we provide a high-level overview of the Angular 2 framework. By Yakov Fain

News reports report news of Spotify hack — there seem to have been many users' account details leaked on Pastebin. But the Swedish streaming service says all is safe, rather implying it's users are the ones at fault. A Spotify spokesperson claims it proactively resets hacked passwords, but it doesn't appear to have done so this time. Users are also reporting problems working with Spotify to get back into their accounts. What's going on? In IT Blogwatch, bloggers try to get to the bottom of it. Not to mention: David Gilmour's tribute to Prince… Your humble blogwatcher curated these bloggy bits for your entertainment. Are you not entertained?

Dropbox has a futuristic vision for how its users will be able to share massive files and have quick access to them on their computers, without their hard drives overflowing. The cloud storage company announced a new initiative at its Open conference in London on Tuesday called Project Infinite. It's a push to create a new Dropbox interface that allows users to see all of the files they've stored in the cloud in their computer's file explorer without requiring them to keep local copies of each document, image, spreadsheet or other file.  With Project Infinite, users will be able to manage their files in the cloud by moving them around inside the Mac OS X Finder or Windows File Explorer, just like they would any local files that are taking up space on their hard drives.

When launching a new governance, risk and compliance (GRC) program or deciding to select a software solution to support it, one is usually asked to provide the ROI of the project. In short, the return on investment (ROI) is defined as the outcome of an investment – be it positive (gain) or negative (loss). For…

Analysis: Telcos were well represented during the keynotes in Austin.

Hadoop Summit – RetrospectiveAfter the last two special edition episodes where we quickly covered each Summit day in a "same-day" episode, we go over the full event in this episode, highlighting the sessions we enjoyed the most and sharing our general feelings about the 2016 Hadoop Summit in Dublin.

The Google Analytics suite can be overwhelming to new or inexperienced users. This jump-start video guide to GA details how to use one of its most important features: dashboards.

Why is it so Hard? Do you work for a data-driven organization, or one that claims to be a data-driven organization, or one that wants to be a data-driven organization? You probably do, whether you work for a big retailer or a small service provider. Every organization wants to believe that they use information to make decisions in an unbiased manner, although not every organization actually does that. It's definitely not easy getting to be a real data-driven organization. At a minimum, an organization has to address five issues: Funding. Being data-driven is a top-down decision because it must be supported by adequate funding.

I've been working a fair bit in the Healthcare space lately, which got me thinking about how Big Data and Advanced Analytics can be used to improve health outcomes and reduce spending. Before I moved into Data Science, a friend and former colleague of mine had already made the jump into bioinformatics working on cancer genomics, so right from the start it was on my radar. More and more data is being collected each year on medical treatments, drug prescriptions, hospital readmissions and so on. How can we use this ever-increasing deluge of data to improve people's lives?

Myths about the Internet of Things affect the perception of the IoT as well as its development and support. Here's why busting those myths should be a priority. Keep on reading: What Most People Get Wrong about the Internet of Things

In this omni channel world, consumers leave clues about their purchasing decisions at every touch point. What data analytics can you leverage to optimize your marketing message and merchandising? Well, it turns out, a lot.

Mob Programming is a software development approach where the whole team works together on the same thing, at the same time, in the same space, and at the same computer. This is a relatively new approach and one which is generating a lot of discussion. The first Mob Programming conference is coming up on 1-2 May. InfoQ spoke to the organizers to understand why the event matters. By Stephane Wojewoda

IBM (NYSE: IBM) is making it easier and faster for organizations to access and analyze data in-place on the IBM z Systems mainframe with the new z/OS Platform for Apache Spark. This is creating new opportunities for data scientists and developers to apply advanced analytics to the system's rich data sets for real-time insights.

This entry was posted in News and tagged , , , , , , , , , , . Bookmark the permalink.