Big Data News – 03 May 2016

Today's Infographic Link: The evolution of Photoshop

Top Stories
Eight phrases you should avoid using in your next job interview and what you should say instead.

How are the top IT leaders at the top companies grappling with digital transformation, the third platform, the rise of analytics and big data, and other industry shifts? We asked them during the opening session of the InformationWeek Elite 100 conference. Here's what they said.

A brief account of technologies, practices, and ongoing research that aim to employ "explicit user feedback" in creating enhanced personalized online experiences for users.

We have a lot of worthy competitors in the markets we serve. They provide really good IT services. Some of them are even Managed Services companies, and those companies are perfectly fine. But what does it take to stand out in a sea of competition with Managed Services companies? How can one company take "good" IT to the next level? I sat in a prospect meeting recently and discussed this with a company. They were signed up with what I believe to be one of our most worthy competitors, but what I heard surprised me.

At the 2016 Collision Conference in New Orleans, LA, networking is king. Beyond the massive room filled with start-ups is the opportunity for start-ups to checkout what others are doing and get a chance to meet with investors.

At the OpenStack Summit, held in Austin, USA, CoreOS released 'Stackanetes', a framework that deploys standard OpenStack services into containers and uses Kubernetes' application lifecycle management capabilities to allow organisations to run OpenStack Infrastructure as a Service (IaaS) and containers side-by-side.

In this continuing regular feature, we give all our valued readers a monthly heads-up for the top 10 most viewed articles appearing on insideBIGDATA. We understand that busy big data professionals can't check the site everyday.

How do businesses that were started before the Internet need to adapt and change in order to thrive in the digital age?

"A breakthrough in Machine Learning would be worth ten Microsofts." — Bill Gates Machine Learning has been defined, by Arthur Samuel, as "A Field of study that gives computers the ability to learn without being explicitly programmed." In essence, the approach draws upon the tremendous computing power at the disposal of today's "machines" to compare vast amounts of data and iteratively improve decision making from instance to instance as more and more data gets available, and analysed. Clearly data is not in short supply today — there are more than enough scarily large numbers floating around to drive home that point adequately.  

List: Spain, UAE, Singapore and Australia, across the world governments and companies are investing heavily in smart city research and projects.

We asked the IT leaders in the 2016 InformationWeek Elite 100 about how they're addressing important IT issues, such as tech spending, organizational priorities, and strategy. Find out what we learned, and see how your IT organization stacks up.

In this the second installment on diagnosing performance issues, performance engineer Andreas Grabner focuses on spotting patterns that cause performance and scalability issues in distributed Micro Service Oriented Architectures. By Andreas Grabner

Hard on the heels of a similar purchase last week, Oracle has announced it will pay $532 million to buy Opower, a provider of cloud services to the utilities industry. Once a die-hard cloud holdout, Oracle has been making up for lost time by buying a foothold in specific industries through acquisitions such as this one. Last week's Textura buy gave it a leg up in engineering and construction.

There are dozens of Agile methods nowadays and more and more often we hear about Hybrid Agile, but what does that actually mean? This article provides a view on why it is important to have clarity around the term Hybrid Agile and what it has to mean to make sense. It provides guidance on circumstances when you could use the different kinds of methods. By Mirco Hering

"Continuous Delivery with Windows and .Net" is a short book by Matthew Skelton and Chris O'Dell that should be seen as a very useful complement to Jez Humble and Dave Farley's "Continuous Delivery" book for those that work in a Windows and .Net environment. InfoQ talked with the authors to learn more about the state of Continuous Delivery on Windows and .Net.

The California-based health care network adopted a centralized enterprise data management platform to achieve data integration, flexibility and agility.

Company bolsters Qlik data preparation strategy with connectivity to web-based data sources

Sivasailam Thiagarajan opened the Agile Games 2016 conference with a keynote talk titled Faster, Cheaper, Better:Designing Agile Training That Delivers Results. He explained that learning requires both content and activities which together create engagement and learning that sticks. He discussed how to design more effective training classes. By Stephane Wojewoda

Hadoop and other big data technologies revolutionized the way organizations run data analytics but the organizations are still facing challenges with operating costs of using these technologies for on-premise data processing. Ashish Thusoo recently spoke at Enterprise Data World Conference about Hadoop as a service offering that helps organizations bridge the gaps with these capabilities. By Srini Penchikala

Data Scientists seem to be the rock star of the modern organisation. Feted by the insights they generate and held in high regard for turning simple data into game changing metrics, they are the new face of the analytics team within every organisation. But with an extreme shortage of experienced data scientists in the market and finding the good ones harder than unearthing a pink diamond in your backyard, building a data science team for any organisation is the stuff of dreams. However consider for a moment that you've been given the task of building a data science team for your organisation. What do you want your team to be?

It wasn't so long ago that Oracle dismissed cloud computing as "gibberish." Today, it's singing a different tune. Through a string of acquisitions, the database giant has been buying a presence in the cloud in much the same way it built up its on-premises portfolio decades ago. What remains to be seen is whether that strategy can work as well in this new setting. Acquisitions of companies such as PeopleSoft and Siebel played a key role in fleshing out Oracle's traditional applications portfolio back in the mid-2000s, helping the company become a major player in enterprise software.

It wasn't so long ago that Oracle dismissed cloud computing as "gibberish." Today, it's singing a different tune. Through a string of acquisitions, the database giant has been buying a presence in the cloud in much the same way it built up its on-premises portfolio decades ago. What remains to be seen is whether that strategy can work as well in this new setting. Acquisitions of companies such as PeopleSoft and Siebel played a key role in fleshing out Oracle's traditional applications portfolio back in the mid-2000s, helping the company become a major player in enterprise software.

The creation of Capital One Wallet is an example of how a large financial services IT organization can move like a startup and think like a design firm, transforming business expectations in the process. Their work earned them the No. 1 spot in the 2016 InformationWeek Elite 100.

Mammoth Data, a leader in Big Data consulting, today announced the findings of its comprehensive cloud solution benchmark study, which compares Google Cloud Dataflow and Apache Spark.

The Weather Company estimates that weather is perhaps the single largest external factor affecting business performance, to the tune of nearly $1 trillion lost annually in the the US alone. Combining weather data with business data can improve decision-making for a wide range of companies. The company's work earned it the No. 2 spot on the 2016 InformationWeek Elite 100.

Horizon Blue Cross Blue Shield Of New Jersey implemented a fee-for-value healthcare delivery model that uses new technologies to gather data and improve member experience. More than 80 business processes were created or modified, transforming systems for enrollment, claims processing, billing, customer service, provider portals, sales, and benefit monitoring. These efforts earned the company the No. 3 spot in the 2016 InformationWeek Elite 100.

Penn Signals is a system that uses existing data from electronic health records to perform real time predictive analysis of heart failure patients. The goal? Penn Medicine wanted to place patients in proper risk groups and assign them to cardiology resources in order to get them the best care and improve their outcomes. This work earned the company the No. 4 spot in the 2016 InformationWeek Elite 100.

Disparate internal systems and a complex customs environment were slowing down the import/export process for business customers. So FedEx Services launched the Clearance Customer Profile app to help businesses overcome customs clearance hurdles. The company's efforts earned it the No. 5 spot in the 2016 InformationWeek Elite 100.

To talk about technology transforming business only tells part of the story, though. At the end of the day, it's the people behind the technology that are truly the agents of change. Join us as we celebrate their work in the 2016 InformationWeek Elite 100

As more users shift their analytics operations to cloud platforms, larger analytics vendors are eyeing startups providing them with quick access to emerging capabilities that can help plug gaps in their technology portfolios. The latest example comes from visual analytics vendor Qlik, which announced Monday (May 2) it is acquiring technology partner Industrial CodeBox, developer of a tool used to feed data from cloud sources such a social media into their analytics applications.

Analyst Rob Enderle sees the handoff as elegant and well received, and the structure of the new firm well-articulated.

Google Cloud Dataflow crunched data two to five times faster than Apache Spark in a benchmark test of batch analytics performed by Mammoth Data. While Dataflow's raw power is impressive, don't throw in the towel on Spark just yet. If you're looking to choose a framework to analyze your big data, good luck. With so many options out there, you've got your work cut out for you. This embarrassment of big data riches keeps the tech experts at North Carolina consulting firm Mammoth Data busy. When Google (NASDAQ: GOOG) asked Mammoth Data to test its Google Cloud Dataflow service in a real-world setting, the company jumped at the chance.

Onboard laptop security technology can be a key element of smart gun technology.

Hard on the heels of a similar purchase last week, Oracle has announced it will pay $532 million to buy Opower, a provider of cloud services to the utilities industry. Once a die-hard cloud holdout, Oracle has been making up for lost time by buying a foothold in specific industries through acquisitions such as this one. Last week's Textura buy gave it a leg up in engineering and construction. "It's a good move on Oracle's part, and it definitely strengthens Oracle's cloud story," said Frank Scavo, president of Computer Economics.

451 Research recently published an analysis of the upcoming Pivotal HDB 2.0 release and Pivotal's expanded partnership with Hortonworks. We thought you may find the report helpful as you explore technology options for your Big Data journey.   From the report: Continuing its bold strategy of driving its software to open source, Pivotal has announced Pivotal HDB 2.0, along with a partner agreement with Hortonworks to resell it. HDB is Pivotal's MPP Hadoop-native SQL database, which the company open sourced in 2015 as Apache HAWQ (incubating.)

In Part One of this blog, I shared how McKinsey research proved that the current hype around the Internet of Things is probably understated, since the full economic impact of IoT by 2025 is valued at $3.9 -$11.1 trillion per year. To understand where the total value potential of the IoT lies, I introduced the…

Net Present Value (NPV) offers a compelling look at whether the present value of cash inflows expected from your proposed open source big data project exceed what's potentially going out in terms of cash.

Nexsan is making a bet that storage management in the era of the private cloud is about to fundamentally change.

In the rush to get on the cloud, deploy Big Data and remake the IT department, it's worth it to stop and think where we want to be in a few years.

Compliance has never been easy. Organizations have to meet a myriad of external regulations, frameworks, and internal mandates such as PCI, HIPAA, FISMA, NERC, ISO and the EU Data Directive, many of which have a long list of required technical controls. Many large organizations have five or more regulations or mandates they must comply with. Organizations face many challenges to meet these requirements including experiencing difficulty complying with the technical controls required by the regulations and mandates, especially: Requirements around logging, monitoring and analyzing security events for incident detection and investigations, especially when logs need to be retained for months or years; Measuring and demonstrating compliance with all the various technical controls.

by Joseph Rickert When I first went to grad school, the mathematicians advised me cultivate the habit of reading with a pencil. This turned into a lifelong habit and useful skill for reading all…




SQL Server 2016, Microsoft's newest database software, is set to become available on June 1 along with a no-cost, developers-only version. With its new features and revised product editions, Microsoft is determined to expand SQL Server appeal to the largest possible number of customers running in a range of environments. But there's still no word on the promised SQL Server for Linux, a version of the popular database that Microsoft is hoping will open SQL Server to an entirely new audience.

EMC is going back to basics — but for a new generation of users — on the first day of EMC World on Monday. This year's annual user conference will be the last for EMC as an independent company, assuming Dell's pending US$67 billion acquisition goes through later this year as planned. Michael Dell will join EMC's Chairman and CEO on stage during the Monday keynote session. But EMC's core storage business is likely to stay much the same in the short term, because it complements Dell, said Enterprise Strategy Group analyst Mark Peters.

In this guest post, members of the Barclays Advanced Data Analytics Team describe the results of an offsite hackathon to develop a recommendation system using Apache Spark. In the depths of the cold, wet British winter, the Advanced Data Analytics team from Barclays escaped to a villa on Lanzarote, Canary Islands, for a week to collaboratively solve a key business problem: how to design a better customer experience.

While big data is not a crystal ball, and even predictive analytics are only right if current events stay on course, CFOs are turning to analytics anyway to plan for economic uncertainties ahead. Essentially, they're running a lot of "what-if" scenarios and planning actions they can take for each so they're ready when one of those scenarios pops into reality.

Until recently emails stored in the cloud were not as protected as emails stored on-premises. Finally, after three years of batting the issue about, Congress appears to be on track to change that and restore Fourth Amendment protections for email no matter where it is stored.

As light-emitting diode (LED) technology grows ever more capable and efficient, new business models are relying on LED lighting to help families save money while enhancing the quality of their lighting–and their life. Discover how exponential growth in LED capabilities is creating daily opportunities for businesses and individuals across the globe.

While FierceBigData routinely reports on the changing skillsets you need to land a big data job, or grow your pay in the job you have, and we routinely report on resources for developing your skills too, it's time now to look at which companies are paying the most.

Among my favourite findings was that while computer vision with deep neural networks has made leaps and bounds, it's still constrained by having a large, human-annotated training set of images. This means a lot of undergrad volunteers or a lot of mechanical turk to get you there. Alexei Alyosha (UC Berkeley) gave a very entertaining account of the power and limitations of computer vision. I highly recommend following the link and watching his talk. Did you know that most of the world's visual data will never be seen by a human?

Database fans, start your clocks: Microsoft announced Monday that its new version of SQL Server will be out of beta and ready for commercial release on June 1.  The news means that companies waiting to pick up SQL Server 2016 until its general availability can start planning their adoption. [ Unleash the power of SQL with 17 tips for faster queries. Download the PDF today! | Also on InfoWorld: 7 essential SQL Server security tips. | Keep up with hot topics in databases with InfoWorld's Big Data newsletter. ] SQL Server 2016 comes with a suite of new features over its predecessor, including a new Stretch Database function that allows users to store some of their data in a database on-premises and send infrequently used  data to Microsoft's Azure cloud. An application connected to a database using that feature can still see all the data from different sources, though. 

Vint Cerf, "father of the internet" and Google vice president, warns that the world faces the risk of a "Digital Dark Age" when too much data is lost to obsolete hardware and software. Certainly companies need to take steps to ensure this doesn't happen, but considering so much consumer data is now piped in, urging consumers to preserve their digital data is a vital measure too.

While we have plenty of data threats to worry about now, the National Institute of Standards and Technology says we should be looking at future threats now too. One they've spotted is a quantum computer threat to encrypted data. The NIST is already taking stepts to address it.

This weekend I finished production of my new course, Shapefiles for R Programmers. To celebrate the launch, tomorrow I will be announcing a special limited-time discount. Today, however, I want to announce two live webinars that I will be doing about the course: Wednesday at 1pm PT Thursday at 10am PT In these webinars I will be giving away, for free, as much of the course as I can during a during a 1 hour webinar. Additionally, I will giving a special bonus to everyone who stays to the end of the webinar. It should be fun! There will also be time for live Q&A at the end of the webinars.

Terracotta has released version 3 of their distributed caching technology Ehcache, sporting a number of important new features. First, its API has been refactored and now leverages Java generics. Performance has generally been enhanced, and support for the javax.cache API (JSR-107) and off heap storage capabilities have been added. By Matt Raible

Bob Hayes, chief research officer at AnalyticsWeek, recently did a study to discover the top 10 skills in data science as it is actually being practiced today.

The results of the experiment sheds light on the origin of learning and the adaptability in non-neural organisms not related to evolution. Perhaps it also has some implications for the evolution of machine learning.

Database fans, start your clocks: Microsoft announced Monday that its new version of SQL Server will be out of beta and ready for commercial release on June 1.  The news means that companies waiting to pick up SQL Server 2016 until its general availability can start planning their adoption. SQL Server 2016 comes with a suite of new features over its predecessor, including a new Stretch Database function that allows users to store some of their data in a database on-premises and send infrequently used  data to Microsoft's Azure cloud.

Amazon turned in a strong first quarter, with sales growing 28% worldwide on an expanding Prime customer base. The company's AWS division also posted solid earnings.

In this special guest feature, Brian Irwin, VP of Strategy at SHYFT Analytics, takes a look at the three market dynamics driving life sciences organizations to evaluate new data analytics strategies and technologies as they transform into value-based care delivery models.

Above the Trend Line: machine learning industry rumor central, is a recurring feature of insideBIGDATA. In this column, we present a variety of short time-critical news items such as people movements, funding news, financial results, industry alignments, rumors and general scuttlebutt floating around the big data, data science and machine learning industries including behind-the-scenes anecdotes and curious buzz.

Database fans, start your clocks: Microsoft announced Monday that its new version of SQL Server will be out of beta and ready for commercial release on June 1.  The news means that companies waiting to pick up SQL Server 2016 until its general availability can start planning their adoption. SQL Server 2016 comes with a suite of new features over its predecessor, including a new Stretch Database function that allows users to store some of their data in a database on-premises and send infrequently used  data to Microsoft's Azure cloud. An application connected to a database using that feature can still see all the data from different sources, though. 

Cloudtenna's Intelligent Search uses machine learning algorithms and file attributes to enable end users to search files on a more granular level.

Out of the 630 startups at the 2016 Collision Conference in New Orleans, InfoQ had a chance to speak with six of them to find out about their products wand why they came to Collision.

We are doing a very good job at making the necessary adjustments for potential ransomware attacks.

Apache Flink is an open source platform for distributed stream and batch data processing. Flink's core is a streaming dataflow engine that provides data distribution, communication, and fault tolerance for distributed computations over data streams. Flink also builds batch processing on top of the streaming engine, overlaying native iteration support, managed memory, and program optimization.

This entry was posted in News and tagged , , , . Bookmark the permalink.