Big Data News – 12 Aug 2016

Featured Article
Data scientists have a lot of tools at their disposal, but not all of them are equally accessible. Aiming to put IBM's Watson AI within closer reach, analytics firm Columbus Collaboratory on Thursday released a new open-source R extension called CognizeR. R is an open-source language that's widely used by data scientists for statistical and analytics applications. Previously, data scientists would have had to exit R to tap Watson's capabilities, coding the calls to Watson's application programming interfaces (APIs) in another language, such as Java or Python.

Top Stories
With so much going on in this space you could be forgiven for thinking you were always working with yesterday's technologies. So much change, so quickly. What do you do if you have to build a solution from the ground up that is expected to live in the field for at least 5-10 years? This is the challenge we faced when we looked to refresh our existing 10-year-old custom hardware stack to measure the fullness of trash cans and compactors.

In this special guest feature, Sinan Baskan, Solutions Director, CTO Solutions at MarkLogic, discusses why your database must be operational and transactional, but probably isn't.

Weide Zhang, a Senior Architect at Baidu, talks about his team's work in using Spark to drive deep learning training and prediction using Paddle, the deep learning library developed by Baidu IDL.

After Internet of Things (IoT), Connected Cars is the most used buzzword in the industry. From automobile manufacturers to software vendors to telecom operators to consumer electronic companies, everyone is excited about the connected vehicle phenomenon. With the power of 20 modern PCs, contemporary cars pack more punch than any another computing device. They contain more

Microsoft made a divisive announcement last month when it revealed that Azure Stack will be delayed until the middle of next year and that the private cloud software will only run on a set of integrated hardware systems rather than a wide variety of hardware.  Now, the company is trying to explain that change to customers. On Thursday, in a video interview, Microsoft Principal Group Program Manager Vijay Tewari made the case for shipping Azure Stack on a small variety of hardware. His main point was that constraining the software to a small set of hardware leads to a better product that's more useful right out of the gate.

News: Portfolio boosting acquisition for HPE to bolster HPC offering.

  Strata + Hadoop World: The one event you can't miss Early Price ends Friday, August 12 Strata + Hadoop World is September 26-29 in New York. Selling out last year with over 6,300 attendees, many call the biggest data gathering in the world the "one event you can not miss". Here's why: The all-reaching program includes: 11 tracks with 180+ sessions in Data-driven Business, Data Innovations, IoT & Real-time, and Enterprise Adoption–plus sessions in verticals such as ecommerce, finance, and energy.  ➤

  Visualization is the best way to explore and communicate insights about data. Whether you're dealing with geospatial, time series or tabular data, interactive graphics allow everyone on your team, from analysts to executives, to understand the patterns in your data. But as data grows to include millions and billions of points, traditional visualization techniques break down.

Apache Zeppelin, a web-based notebook, enables interactive data analytics including Data Ingestion, Data Discovery, and Data Visualization all in one place. Zeppelin interpreter concept allows any language/data-processing-backend to be plugged into Zeppelin. Currently, Zeppelin supports many interpreters such as Spark (Scala, Python, R, SparkSQL), Hive, JDBC, and others. Zeppelin can be configured with existing Spark eco-system and share SparkContext across Scala, Python, and R. Check out the Use Case that demonstrates some of the Zeppelin's capabilities… Crime Analysis Zeppelin Dashboard:

Microsoft made a divisive announcement last month when it revealed that Azure Stack will be delayed until the middle of next year and that the private cloud software will only run on a set of integrated hardware systems rather than a wide variety of hardware.  Now, the company is trying to explain that change to customers. On Thursday, Microsoft Principal Group Program Manager Vijay Tewari makes the case for shipping Azure Stack on a small variety of hardware in a video interview. His main point is this: constraining the software to a small set of hardware leads to a better product that's more useful right out of the gate.

Hewlett Packard Enterprise is buying SGI in a $275 million deal that it hopes will give it a major boost in big-data analytics and high-performance computing. It's the latest surprise development at HPE, which has continued to make big changes since it was formed in the break-up of the old Hewlett-Packard last year. The deal to buy SGI, announced Thursday, fits with HPE's goal to expand its data analytics business. It will also make HPE a bigger player in high performance computing, a growing part of the server market.  SGI has roughly 1,100 employees worldwide. On Thursday, it reported a net loss for its last fiscal year of $11 million, on revenue of $533 million.

Akana has announced the availability of version 8 of its API Management solution. The Akana Platform provides an end-to-end API Management solution for designing, implementing, securing, managing, monitoring, and publishing APIs. It is available as a SaaS platform, on-premises, and as a hybrid deployment. Version 8 introduces a lot of new functionality, all aimed at offering customers the richest API Management capabilities in a way that is easier than ever for API and app developers to use.

In today's digital economy, companies are faced with a fast data challenge as well as a Big Data one. As a result they are under pressure to adapt their analytics processes and data flows at pace to move beyond traditional data warehouse silos. Big Data projects are either too big or too complex to handle the traditional way. That's why most projects by companies at the start of their Big Data initiative have no process at all. Waterfall approaches are notably inefficient as you probably won't have access to proper staging environment and only limited time and scale for qualification.

The enterprise is enjoying a period of rapid technological innovation driven by real-time collaboration on a grand scale. New tools have brought the ability to work from anywhere, on any device, and at any time. Centralized offices and individually assigned desks are giving way to a distributed, mobile workforce. Digital workspaces offer enormous benefits for the enterprise and for workers, but there are also challenges ahead.

Columbus Collaboratory, an advanced analytics and cybersecurity company, today announced the release of CognizeR, an open-source R extension that can enhance and simplify how more than two million data scientists using R can access and build with IBM Watson.




Amazon launched a new tool on Thursday aimed at helping developers build applications that offer insights from a firehose of data in real time. Kinesis Analytics will let users set up SQL queries that run on data that's constantly updating, expanding the reach of the popular data analysis language beyond traditional database applications.  Once a user has set up a Kinesis Analytics stream, the results can then be routed to up to four different services, including Amazon S3, Redshift, and Elasticsearch Service.

Amazon launched a new tool on Thursday aimed at helping developers build applications that offer insights from a firehose of data in real time. Kinesis Analytics will let users set up SQL queries that run on data that's constantly updating, expanding the reach of the popular data analysis language beyond traditional database applications. 

The municipal broadband issue is important; limitations imposed by states make gaps in broadband coverage likely to be larger and more numerous.

Recently, I had the opportunity to participate in the Strategy Meets Action (SMA) insurance innovation community webinar on the business opportunities with "Finding the Business Value of Advanced Analytics". Many companies are not sure where to begin the journey, when and how to use new data sources, traditional data, and 'data in motion.' During this… The post Finding the Business Value of Advanced Analytics Fueling Business Outcomes appeared first on Hortonworks.

It's time to move to a model where employees use the phone they have on them more aggressively and stop supporting one more out-of-date technology.

Data scientists have a lot of tools at their disposal, but not all of them are equally accessible. Aiming to put IBM's Watson AI within closer reach, analytics firm Columbus Collaboratory on Thursday released a new open-source R extension called CognizeR. R is an open-source language that's widely used by data scientists for statistical and analytics applications. Previously, data scientists would have had to exit R to tap Watson's capabilities, coding the calls to Watson's APIs in another language, such as Java or Python.

Predictive analytics uses past data to forecast outcomes and target the right prospects — a move that's redefining the use of data in marketing.

Few fields change as fast as digital. New channels, new methods, new business models — and all of it demands new methods of measurement and analytics. As new technologies and practices disrupt the field, digital analytics practitioners adapt. In any given year, a few themes dominate, and right now, the topics dominating discussion at the enterprise digital analytics table are four P's: prioritization, personalization, people and perspective.

Ransomware is still evolving and cybercriminals continue to come up with new tactics that play off of fear or naivete of users.

Thanks to new optimizations for running Impala on Amazon S3, doubling cluster size on AWS doubles multi-user performance while keeping total workload cost roughly the same. With public-cloud deployments becoming increasingly popular, Cloudera is continuing to build out the capabilities of its platform to best take advantage of the cost-effective and flexible nature of the cloud. The current release of Cloudera's platform (5.8) includes a major step forward in that area with Impala 2.6 able to store and query data directly from the Amazon S3 object store. The post Analytics and BI on Amazon S3 with Apache Impala (Incubating) appeared first on Cloudera Engineering Blog.

AWS' faster load balancing capability gives IT organizations more granular control over where and how application workloads run on the AWS cloud.

It's rare these days that a customer conversation occurs without the topic of cloud being broached. Our customers like the idea of being able to provision capacity and grow and shrink clusters on-demand, as well as the convenience of not having to rack and maintain hardware. And when it comes to analytics, the cloud affords unprecedented elasticity and infrastructure availability. So when a retailer needs to look over holiday sales results or a well operator needs to determine why a drill is slowing, they can quickly provision compute resources and dig into their data.

On July 13th we welcomed the Open Data Science Conference meetup series to our HQ for the second time. ODSC believes that open source software (OSS) principles can accelerate data science knowledge, and we think pretty highly of OSS here at SVDS. We'll be at ODSC's next conference this November in Santa Clara. How much data do you have? This meetup featured a talk by Dr. Brian Spiering, of Galvanize1.

With the release of CognizeR, an open source extension for the statistical computing-focused R programming language, Columbus Collaboratory is aiming to simplify data science with IBM Watson. "Our goal was to connect data scientists everywhere with cognitive computing in a software environment they already know and love: R," Ty Henkaline, chief analytics innovator at Columbus Collaboratory, said in a statement yesterday. "CognizeR now shortens the journey toward building real cognitive solutions by providing quick and easy access to Watson services. Releasing this code to the open source community advances our mission of delivering accelerated business value to our member companies and beyond."

Wheels turning and forklifts filled–that's one measure of success in any warehouse. If you can increase the amount of product picked up and put away, the more productive and cost efficient you are. For Pittsburgh-based retailer Giant Eagle, the key to making that happen is to operate vision-guided, autonomous vehicles–robots–in its distribution centers. + Also on Network World: How IoT helps transplant surgeons track organ shipments +

You have probably heard I recently decided to leave Hortonworks. Rob shared some kind words earlier this week and I would like to take this opportunity to shout THANK YOU to Rob and the entire Hortonworks team for the fantastic thrill ride and unbelievable journey at Hortonworks over the past five years. Rob and I… The post The Opportunity Ahead appeared first on Hortonworks.

GigaSpaces, a provider of in-memory computing (IMC) technologies, announced the launch of XAP 12, the company's first open source initiative for its high-performance data grid.

Chapter 2 focuses on answering questions faced by individuals interested in using storage or database technologies to solve their Big Data problems.

Ryan Nienhuis is a Senior Product Manager for Amazon Kinesis This is the first of two AWS Big Data blog posts on Writing SQL on Streaming Data with Amazon Kinesis Analytics. In this post, I provide an overview of streaming data and key concepts like the basics of streaming SQL, and complete a walkthrough using a simple example.




Rob High, CTO for Watson at IBM, says CognizeR is an SDK that eliminates the need for a developer to write as much code to invoke Watson services.

Data Science is all about getting access to interesting data, and it is really nice when some kind soul not only points out an interesting data set but also makes it easy for you to…

Join IBM data science evangelist James Kobielus and Dave Saranchak, a data scientist with Elder Research, to discover how Dave develops and applies statistical data modeling techniques for national security clients.

Researchers wringing out new quantum computing architectures are increasingly looking at the nascent processing technology as a way to advance machine-learning algorithms for new AI applications. As quantum computing efforts scale up at corporate, university and government laboratories, the promising yet largely unproven technology could help unlock facets of artificial intelligence, leading to more powerful cognitive computing technologies like IBM's (NYSE: IBM) Watson platform. Applications range from developing new materials to faster searches of big data, researchers said. Among the early leaders along with IBM is D-Wave Systems Inc., which recently doubled the capacity of its D-Wave Two system to 1,098 quantum bits, or qubits.

In the previous post of our Understanding machine learning series, we presented how machines learn through multiple experiences. We also explained how, in some cases, human…

Nearly 20 years ago, I recall streaming data off of oil and gas wells in order to measure capacity, reserves, and shrink. For all intents and purposes, data was used used to manually repair a well by humans. What we call big data today we used to call very large databases or VLDBs. Big Data 2.0 is really a rebranding into the Internet of Things (IoT) or the Analytics of Things (AoT).

Microsoft and Boeing are each leaders in their field; they're also neighbors on the east side of Seattle. Here's how these two peas in a pod are using Azure data technologies to make commercial flights more efficient and even (gasp!) more pleasant.

In this blog I'll show you how to guard routes in Angular 2 Router (currently at 3.0.0-beta.2). Let's consider some scenarios that require a certain validation to be performed to decide if the user (or a program) is allowed to navigate to or leave the route: Allowed to open the route only if the user is authenticated and authorized to do so. Implement a multi-part form that consists of several components, and the user is allowed to navigate to the next form section only if the data entered in the current one is valid.

Sun Tsu once said, "every battle is won before it is ever fought." He was referring to the importance of information, which is a crucial asset on both the battlefield and the business world. In the 21st century, big data is the commodity that brands rely on to compete.Big data is incredibly valuable, but many brands don't know how to manage it properly. If you are making any of these big data blunders, you may be relying on inaccurate or incomplete information, which can cost your business dearly.

Manu: Rick, can you tell us a bit about yourself?  I saw in your TED talk that you used to be a photo journalist, so how did you get started on this journey? Rick Smolan: Yes, I was always very curious as a person so it's interesting that I'd end up in a job where I get paid to be curious. As you saw in the TED talk, I went from being a journalist where I work for other people who set the agenda, to the fortunate position of being able to steer my own ship. And now, when I get curious about something I'm able to invite my heroes, my peers and some young journalists along.

In this special guest feature, Pablo Stern, CTO at Radius, shares some tips for new computer science graduates entering the workforce, including why they should consider working in the data science field.

In this contributed article, Brian Giese, CEO and Founder of True Influence gives his views on the future of Business Intelligence (BI) in B2B companies.

This entry was posted in News and tagged , , , , , , , , , , , , , . Bookmark the permalink.