Big Data News – 10 May 2016

Today's Infographic Link: 30 Shots

Top Stories
by Katherine Zhao, Hong Lu, Zhongmou Li, Data Scientists at Microsoft Bicycle rental has become popular as a convenient and environmentally friendly transportation option. Accurate estimation of bike demand at different locations and different times would help bicycle-sharing systems better meet rental demand and allocate bikes to locations. In this blog post, we walk through how to use Microsoft R Server (MRS) to build a regression model to predict bike rental demand.

Designing Great Visuals Read the whitepaper Designing Great Visualizations What every data scientist should know This paper traces the history of visual representation, from early cave drawings through the computer revolution. It also examines the different styles of data visuals, discusses some of the barriers to making effective visuals and the methods used to overcome those barriers. Read this paper to learn about: " The importance of context in visualizations " Human perception's capabilities and limitations (and how to exploit it with effective visuals) " The power of using data to tell stories About the author: Jock D. Mackinlay is an information visualization expert and VP of Research and Design at Tableau Software.

The BI company and the big data Hadoop company reported their first quarter results, providing an inside look into the emerging analytics and data software industry. Tableau said it sees slower spending in 2016, while Hortonworks reported nearly double the number of $1 million-plus deals.

In today's hyper-connected network economy, cyber security is a top-of-mind boardroom discussion topic. Information is the new sinews of war. Your customer information, of course, but also your own financial and strategic plans, your employees' and contractors' personal data, and so on. An attack on this data (either for leakage, manipulation, ransom or other malicious…

Enterprise messaging is the technology backbone of communications for applications and systems within and between organizations. Both its importance and its complexity are growing as organizations increasingly have to provide real-time responses to business customers and consumers as well as their own business professionals who support them and their internal supply chains.

Financial analysts, banks, and institutions are jumping on the big data band wagon because it means more money.

VANCOUVER, BC – Successful big data projects have five key requirements, says Amy Gaskins, a data scientist with more than a decade of experience designing and implementing data and intelligence projects for the private sector, government agencies and the U.S. military. In her keynote presentation at the Apache: Big Data North America conference in Vancouver on Monday, Gaskins stressed that five factors can make or break big data projects…

The information management industry has a tidal wave of changes to deal with as Millennials push new ways of managing and sharing data through cloud and new collaboration tools.

insideBIGDATA was pleased to be on hand for the recent Qlik Qonnection 2016 conference in Orlando, Florida on May 1-4. We had the opportunity to sit down with Drew Clarke, Vice President of Products, Qlik Cloud to get a vibe for Qlik and how its solutions have evolved to the cloud.

Docker Inc have announced general availability of Docker Security Scanning, which was previously known as Project Nautilus. The release comes alongside an update to the CIS Docker Security Benchmark to bring it in line with Docker 1.11.0, and an updated Docker Bench tool for checking that host and daemon configuration match security benchmark recommendations. By Chris Swan

A hefty funding round in difficult times for Dyn, the company that is focused on ensuring that the Internet works as fast as possible. Well, not quite the entire Internet, but they're focused on making sure that their customers get the best internet performance. Dyn is a vendor in the Internet performance management (IPM) space.

Over the past few years Microsoft has made significant effort to re-position Visual Studio as the premier developer tool for Windows, regardless of what platform the user is targeting. With this increase in scope comes an increase in both disk space and installation time– both areas Microsoft plans to address with Visual Studio "15", the successor to Visual Studio 2015.

As homeowners and realtors track the dynamic U.S. housing market, platforms like the online real estate database Zillow are seeing surges in traffic as buyers and sellers keep tabs on which properties are moving and when a seller might be ready to drop the asking price. To keep up with demand for its services and gauge customer preferences, Zillow Group Inc. (NASDAQ: Z) said this week it is standardizing on Splunk Inc.'s real-time "operational intelligence" platform. Seattle-based Zillow said the ongoing shift to Splunk (NASDAQ: SPLK) includes its mobile as well as web-based real-estate services.

At Hortonworks, we work with hundreds of enterprises to ensure they get the most out of Apache Hadoop and the Hortonworks Data Platform. A critical part of making that possible is ensuring operators can quickly identify the root cause if something goes wrong. A few weeks ago, we presented our vision for Streamlining Apache… The post Advanced Metrics Visualization Dashboarding with Apache Ambari appeared first on Hortonworks.

A new part of Angular 2, the Angular Mobile Toolkit, brings together tools and techniques to help developers make their web apps feel more native. In a session at ng-conf 2016, Jeff Cross and Alex Rickabaugh showed how to use three of these techniques to build a "progress web app".

The hardest thing for a new manager/leader to adjust to is being the pace setter. Once you assume the role of a leader, your job is to be on offense, not defense. I see even the greatest individual…

News: Analytics programme turns cyber security tool as IBM launches research project.

The 10th Annual QCon San Francisco, a practitioner-driven conference designed for software architects/tech leads/leaders who influence innovation in their teams, has opened registrations. QCon SF will be held at the Hyatt Regency San Francisco and has tickets on sale for $1695 through May 14th. There will be a full 3-day conference from Nov 7-9 and two days of workshops from Nov 10-11. By Wesley Reisz

NativeScript 2.0 has been released, integrating with AngularJS 2.0 to allow developers to write native mobile applications for iOS and Android. The release brings developers "an unprecedented code reuse story between [their] web and native mobile app," Valio Stoychev says.

The book Putting Stories to Work by Shawn Callahan provides a process with a practical approach to master business storytelling; a leadership skill that helps to achieve results. It contains many stories that can help you to use storytelling for business communication and culture change.

You've probably heard the admonition: Correlation Does Not Imply Causation. Everyone agrees that correlation is not the same as causation. However, those two words — correlation and causation — have generated quite a bit of discussion. Why Causality Matters No one gets perturbed if you say two conditions or events are correlated but even suggest that causation is possible and you'll get the cliched admonition and perhaps with even harsher criticism. It's not easy to prove causality, though, so there must be a reason for putting in the effort.

It's no secret that much of the wisdom of the world lies in unstructured data, or the kind that's not necessarily quantifiable and tidy. So it is in cybersecurity, and now IBM is putting Watson to work to make that knowledge more accessible. Towards that end, IBM Security on Tuesday announced a new year-long research project through which it will collaborate with eight universities to help train its Watson artificial-intelligence system to tackle cybercrime.

One of the storylines in this year's presidential election is how both major political parties are using big data analytics to inform their decisions and try to get ahead. But what you may not realize is how pervasive big data has become up and down the ticket. Here's an inside look at how two of the leading analytic firms are helping their parties win with analytics. In the last two general elections, the capability of Barack Obama's campaign team to effectively wield big data analytics was seen a factor in his victory over his rivals.

In the early days of computing, developers were often jacks of all trades, handling virtually any task needed for software to get made. As the field matured, jobs grew more specialized. Now we're seeing a similar pattern in a brand-new domain: big data. That's according to P.K. Agarwal, regional dean and CEO of Northeastern University's recently formed Silicon Valley campus, who says big data professionals so far have commonly handled everything from data cleaning to analytics, and from Hadoop to Apache Spark.

Racket, a multi-paradigm programming language belonging to the Lisp/Scheme family, has reached version 6.5, writes Ryan Culpepper on Racket blog. The new version adds several new features, including improvements to typed/untyped code interaction, faster iteration on hash tables and sets, and more.

Tony Grout and Chris Matts spoke about the emerging areas of business mapping and skills liquidity at QCon London 2016 and how they apply them at Lloyds Bank. They showed how they deploy these techniques and explained how they combine business strategy with the abilities and aspirations of people to improve collaboration between business and technical stakeholders. InfoQ interviewed them.

In the early days of computing, developers were often jacks of all trades, handling virtually any task needed for software to get made. As the field matured, jobs grew more specialized. Now we're seeing a similar pattern in a brand-new domain: Big data. That's according to P.K. Agarwal, regional dean and CEO of Northeastern University's recently formed Silicon Valley campus, who says big-data professionals so far have commonly handled everything from data cleaning to analytics, and from Hadoop to Apache Spark.

News: OPS Rules specialises in the application of data science to create supply chain and operations analytics solutions.

It's no secret that much of the wisdom of the world lies in unstructured data, or the kind that's not necessarily quantifiable and tidy. So it is in cybersecurity, and now IBM is putting Watson to work to make that knowledge more accessible. Towards that end, IBM Security on Tuesday announced a new year-long research project through which it will collaborate with eight universities to help train its Watson artificial-intelligence system to tackle cybercrime.

Interview with Sumeet Singh – Senior Director, Cloud and Big Data Platforms @ Yahoo!Having met Sumeet at the Hadoop Summit we thought he'd make a great guest for the podcast, so here he is for your listening pleasure!

Mike Cohn explains how to prevent estimate inflation.

An international group of mathematicians at MIT and other institutions has released a new online resource that provides detailed maps of previously uncharted mathematical terrain. The "L-functions and Modular Forms Database," or LMFDB, is a detailed atlas of mathematical objects that maps out the connections between them. The LMFDB exposes deep relationships and provides a guide to previously uncharted territory that underlies current research in several branches of physics, computer science, and mathematics. This coordinated effort is part of a massive collaboration of researchers around the globe. The scale the computational effort that went into creating the LMFDB is staggering…

NoSQL database adoption in a large organization takes significant effort and time for the transition from using relational database models to NoSQL databases. Mike Bowers, Enterprise Data Architect at LDS Church, spoke at the recent Enterprise Data World Conference about lessons learned from eight years of using NoSQL databases. By Srini Penchikala

HPE/Aruba confirmed today that the company has signed a definitive agreement to acquire Rasa Networks, a network performance management and analytics startup, for an undisclosed amount. As Network World reported last month, HPE/Aruba had been planning the move for several weeks. An internal communique to employees stated that Rasa's technology would become a part of the company's Clarity wireless management software, and that Rasa workers would be integrated into Aruba's R&D team, reporting to CTO and co-founder Keerti Melkote.

It's a personal comparison that I give when asked about the difference between Hadoop and Teradata Aster. On the surface they are machines you can drive, but are they the same? Both, Hadoop and Teradata Aster, can store and analyse data, but are they therefore the same? Not quite. The tractor is designed to plough the fields, the Ferrari is designed to RACE.

Last week, Sean Parker (a founder of Facebook and, notoriously, Napster) announced the single largest donation to support immunotherapy cancer research. Totaling $250 million, the donation will support research to be conducted across six academic institutions, with the possibility of incorporating additional researchers if more funding is secured down the line. I think it goes without saying that all donations to support medical research, particularly programs like immunotherapy that have a more difficult time receiving traditional funding, are fantastic.

In the early days of computing, developers were often jacks of all trades, handling virtually any task needed for software to get made. As the field matured, jobs grew more specialized. Now we're seeing a similar pattern in a brand-new domain: big data. That's according to P.K. Agarwal, regional dean and CEO of Northeastern University's recently formed Silicon Valley campus, who says big-data professionals so far have commonly handled everything from data cleaning to analytics, and from Hadoop to Apache Spark.

In the early days of computing, developers were often jacks of all trades, handling virtually any task needed for software to get made. As the field matured, jobs grew more specialized. Now we're seeing a similar pattern in a brand-new domain: big data. That's according to P.K. Agarwal, regional dean and CEO of Northeastern University's recently formed Silicon Valley campus, who says big-data professionals so far have commonly handled everything from data cleaning to analytics, and from Hadoop to Apache Spark.

Over the past several years, Forrester's research has written extensively about the age of the customer. Forrester believes that only the enterprises that are obsessed with winning, serving, and…

Become a fully certified (and highly employable) Amazon Web Services professional. With this AWS Engineer Certification Bundle, currently discounted 87%, you'll receive top-notch instruction.

Big data has passed buzzword status and progressed to must-have-business-solution. Your competitors are getting into big data, which means you either need to start making big data plans for your business or get ready to lose market shares. Big data is useful for everything from marketing to R&D to risk analysis and more. But what do you need? Where do you start? Here's your guide to all things Big Data for BI.

After years of experience with the entire Hadoop stack, Hortonworks solutions engineer Paul Hargis became interested in the math and statistics behind machine learning. He has since morphed into a Big Data Architect Extraordinaire and Spark Subject Matter Expert. In a recent interview, Paul shared his background and unique insights around Hadoop, Spark and Machine… The post The Winning Composition of a Big Data Architect appeared first on Hortonworks.

Tools are emerging to make the switch to SDN and NFV easier and progress is, if anything, faster than anticipated.

The SAP governance, risk, and compliance (GRC) portfolio of solutions continued to grow with new additions this past year, not to mention all the interest and focus on our cybersecurity risk and governance offerings. I know many of you are curious to learn more, so here's an overview of some of the GRC sessions available…

In case you missed them, here are some articles from April of particular interest to R users.  Lukasz Piwek recreates classic graphs from Tufte's 'The Visual Display of Quantitative Information' in R. A preview of upcoming R conferences in Europe. Andrie de Vries updates the data on R package growth on CRAN, and finds a segmented regression model with break-points in 2007 and 2011 fits the data well. A Microsoft data scientist compares R, Microsoft R Open and Microsoft R Server. A webinar on data visualization with Microsoft R Open, presented by Naomi and Joyce Robbins.

In case you missed them, here are some articles from April of particular interest to R users. Lukasz Piwek recreates classic graphs from Tufte's 'The Visual Display of Quantitative Information' in R….

Cloud ERP vendor Intacct last week announced that it has secured debt funding by way of a $40 million facility from Silicon Valley Bank. This comes at the same time as Intacct announced year-on-year new bookings increasing by some 34 percent. Intacct has an interesting job in front of it — it is a mid-market vendor and therefore fills the space between tools designed for small and mid-sized businesses (QuickBooks and Xero, for example) and more enterprise-focused tools such as NetSuite, SAP, and Oracle.

Security analytics firm Niara analyzed email traffic and found malicious email campaigns that sophisticated attackers are using to circumvent traditional defenses.

At the recent Bio-IT World Conference in Boston, I had the privilege of speaking to an audience made up primarily of life sciences and medical researchers. My main message concerned the cloud, specifically the trend of cloudbursting. This audience is extremely important to me personally, as my daughter was diagnosed with autism at the age of two. She is now seven years old, stands 3'9" and is in the first grade — the sweetest kid you could ever meet — and though she is reading at grade level, her life is not without serious challenges.

"If (wealth management advisors) continue to work the way you have been, you may not be in business in five years" — Industry leader Joe Duran, 2015 TD Ameritrade Wealth Advisor Conference. The wealth management segment is a potential high growth business for any financial institution. It is the highest customer touch segment of banking and is fostered on long term and extremely lucrative advisory relationships…

Ransomware has topped targeted attacks as the "main theme of the quarter."

If you are designing a data warehouse, you need to map out all the areas where there is a potential for your project to fail, before you begin.

Empower your employees to deliver a stronger customer experience than ever and improve customer centricity. Learn how IBM case and capture solutions enable you to reduce risk and optimize outcomes.

C-level briefing: Digital Catapult's newly appointed head of IoT talks to CBR on how the government backed organisation will be searching for holes in the UK's IoT market.

Taking a thoughtful approach to data serialization can achieve significant performance improvements for HBase deployments. The question of using tall versus wide tables in Apache HBase is a commonly discussed design pattern (see reference here and here). However, there are more considerations here than making that simple choice. Because HBase stores each column of a table as an independent row in the underlying HFiles, significant storage overhead can occur when storing small pieces of information.

It's easier to integrate a variety of device types into the core VeepWorks platform, creating opportunities to apply VeepWorks in other verticals.

Neo Technology, creator of Neo4j, a leading graph database, announced the immediate general availability of Neo4j 3.0 — a landmark release propelling graph databases into the mainstream thanks to its massive scalability, new language drivers and a raft of other developer-friendly properties.




Apache Spark has quickly emerged as a powerful data processing framework for Apache Hadoop, well-poised to succeed MapReduce in the ecosystem. Cloudera's One Platform Initiative is hastening this transition with focused development on the scale, security, management, and streaming aspects necessary for Spark to support a wide range of enterprise applications. Spark's power and popularity stems from its flexible and extensible APIs for a wide spectrum of workloads, easy development, and better performance for batch processing.

C-level briefing: How Microsoft's broad portfolio is feeding data back into its cyber security offerings.

Python has emerged as one of the most popular languages in data science due to its open source nature, easy-to-learn syntax and active developer community. However, the data science industry is moving at warp speed and deadlines are becoming shorter, while data sizes are increasing. Many data scientists struggle to achieve necessary performance using Python with their existing infrastructure.

In a little under two weeks, on 17th May, the doors will be open to 2016 SAPPHIRENOW in Orlando, Florida.  I have little doubt that it will be yet another tremendous event put on by the many SAP teams and business partners. For my part, I've had the pleasure to work once again with the SAP…

Blazegraph announced version 2.1.0 of its graph database which features geospatial and improved semantic searching and optimized queries against the National Center for Biotechnology Information's (NCBI) PubChem database. New tools make semantic search usable on even the largest data published in the Linked Open Data structure.

Is the bloom already falling off the Dev/Ops rose in the enterprise?

Real-world training is essential to mastering big data projects on the job. Too often, university training is far behind what the work world actually needs from employees. By taking this approach, data science students are much more likely to be able to hit the ground running in their new jobs – and society benefits from these efforts too. It's a win all around.

The world of data has faced incredible changes over the past few years, and organizations are struggling to keep up. In this shifting landscape, we've found that flexible deployment options–and cloud-based data warehouse solutions–might offer answers to IT professionals.

MetroLab Network, launched as part of the White House's Smart Cities Initiative in September 2015, added 13 new city-university partnerships, which now brings the total members to 35 city-university partnerships, all of which are focused on incorporating data, analytics, and innovation into local government programs.

The White House's latest report on big data focuses on the intersection of algorithms, opportunity, and civil rights. Specifically, it includes several case studies on credit and lending, hiring and employment, higher education, and criminal justice to uncover both opportunities and dangers. It includes recommendations for government policies to both improve the use of big data and avoid discrimination in the process.

This entry was posted in News and tagged , , , , , , , , , . Bookmark the permalink.