Big Data News – 18 Aug 2015

Top Stories

Datameer, the big data analytics company built on Hadoop, announced a $40 million Series E round today led by ST Telemedia, an investment firm based in Singapore.


A Massey University PhD student is delving into the decision-making processes of companies to understand whether judgement calls based on big data produce better outcomes. Simone Gressel says she chose her thesis topic because of all of the hype around “big data”. She was interested to understand how managers were using data, whether it was reducing the role of human judgement and whether managers made better decisions as a result of it.


Mobile in all its forms is the place to be if you are seeking a job and career in an industry sector which is growing rapidly and needs employees with the appropriate tech skills to drive that growth. Data scientists – described as having “The sexiest job of the 21st century” – are reportedly in high demand, particularly those who freelance, in Australia and around the world.


Global satellite solutions provider SES Government Solutions (SES GS) has agreed to a one year contract with the National Oceanic and Atmospheric Agency (NOAA) to provide O3b Networks’ services and ground equipment to the National Weather Service Office (WSO) in Pago Pago, American Samoa.


As the amount of digital information generated by businesses and organizations continues to grow exponentially, a challenge – or as some have put it, a crisis – has developed. There just aren’t enough people with the required skills to analyze and interpret this information – transforming it from raw numerical (or other) data into actionable insights – the ultimate aim of any Big Data-driven initiative.


The Linux Foundation and IBM today announced initiatives to advance Linux on mainframe computers, including a new collaborative project from the open source steward and new servers from Big Blue, which is contributing code to the open source community.


Analysts at Zacks have given a short term rating of hold on Splunk Inc. (NASDAQ:SPLK) with a rank of 3. The shares have received an average rating of 1.39 from 27 brokerage firms. 20 analysts have rated the company as a strong buy. The shares have been rated as hold from 4 Wall Street Analysts. 3 analysts have suggested buy for the shares.


Hortonworks, Inc. (NASDAQ:HDP) has received a hold rating for the short term, according to the latest rank of 3 from research firm, Zacks. The shares could manage an average rating of 1.64 from 11 analysts. 7 market experts have marked it as a strong buy. 1 analysts recommended buying the shares. 3 analysts have rated the company at hold.


In light of Apache Hadoop’s wide ranging flexibility and practicality, and as data scientists can now leverage its power to solve some of today’s most pressing problems, Cloudera has announced that it is hosting and organizing Wrangle, a single-day, single-track community event that will dive into the principles, practice, and application of data science from the startup to the enterprise. The day includes two general sessions, 12 talks, and networking opportunities.


LinkedIn Corp. is joining the parade of major companies donating their homegrown Big Data tools to the open source community, offering up a Hadoop plugin for working with the Gradle build system.


C-level briefing: Rick Farnell, SVP & Co-founder, Think Big, a Teradata company, about staying on top of Big data. Farnell had plenty to say about the importance of getting data landing right and how providers are failing to fully understand it when he spoke to CBR.


HP has unveiled a number of new products, services, and programs designed to help organizations leverage data and analytics to build new products and experiences. These include a new release of HP Vertica, which will feature data streaming and advanced log file text search to power high-speed analytics on Internet of Things (IoT) data. Also announced…


A lot goes into creating a company and a lot more goes into creating a company that endures. There are long days and even longer nights, hiccups along the way, and a sense of trust that can last a…


This article is about understanding Git – both its benefits and limits – and deciding if it’s right for your enterprise. It is intended to highlight some of the key advantages and disadvantages typically experienced by enterprises and presents the key questions to be contemplated by your enterprise in determining whether Git is right for you and what you need to consider in moving to Git. By Bob Jenkins


As the amount of digital information generated by businesses and organizations continues to grow exponentially, a challenge – or as some have put it, a crisis – has developed. There just aren’t enough people with the required skills to analyze and interpret this information – transforming it from raw numerical (or other) data into actionable insights – the ultimate aim of any Big Data-driven initiative.


Google has unveiled that when it comes to Android, M stands for Marshmallow. They have also introduced Android 6.0 SDK which comes prepackaged with Android Studio or as a separate download that can be used with a different IDE.


Many companies will say that there is still a lot of work to be done, but the four organizations listed below are actually taking action and leading the way when it comes to diversity initiatives and opportunities for talented engineersregardless of gender, race, or nationality.



Identifying and preventing fraud is a daunting task for anyone. But what happens when the criminals seem to be getting smarter every day? Learn what Annie Analytics recommends for those in search of some peace of mind.


Spectra Logic today unveiled a new NAS archive appliance built with spinning disks featuring Shingled Magnetic Recording (SMR) technology. The increased storage density enabled by SMR lets the new Spectra Verde DPE (Digital Preservation Enterprise) appliance store petabytes worth of unstructured data at very low cost. Spectra is one of a number of storage devices manufacturers (including Seagate) using SMR technologies to help it overcome storage density limits of existing disk drives. The heads of traditional disk drives use perpendicular magnetic recording (PMR)…


For more applications of predictive analytics from the New York Times, attend the keynote presentation from Chief Data Scientist Christopher Wiggins’s at Predictive Analytics World for Business, Sept. 27 – Oct. 1, 2015 in Boston. Use code PATIMES15 for 15% off your conference pass! The bot, named Blossom, helps predict how stories will do on […] The post The New York Times Built a Slack Bot to Help Decide Which Stories to Post to Social Media appeared first on Predictive Analytics Times.


Teradata Portfolio for Hadoop Gets You Up and Running Faster Teradata&tm; Portfolio for Hadoop is a powerful, ready-to-run platform that is pre-configured and optimized for enterprise-class big data storage and management. The solution conveniently provides one-stop shopping with hardware, software, support and services all from a single vendor. “Performance, ease-of-use, manageability and reliability are greatly improved compared to other systems,” says Cesar Rojas…


Some of the biggest breakthroughs come as a result of challenging assumptions; especially those that are commonly accepted. But first you’ll need to put processes, systems and people in place to continually overcome preconceived notions.


by Carl Nan, DeployR PM A new version of DeployR, the server-based framework that provides simple and secure R integration for application developers, is now available. (If you’re new to DeployR,… …


Despite enterprises’ best intentions in enforcing top-down standardization of data sets, non-compliant data can easily seep in and, through aggregations, transformations, and standardizations, spread throughout the organization. In a typical enterprise, inventory data from multiple regions and divisions across product lines could easily result in dozens of data sources being used for one analysis.


In the past three decades, management information systems, data integration, data warehouses (DWs), BI, and other relevant technologies and processes only scratched the surface of turning data into…


The Internet of Things (IoT) is quickly becoming more integrated into our networked society. However, securing devices linked through the IoT poses challenges for organizations. How can they ensure their IoT development frameworks don’t create security risks?


The Internet and social feeds are full of articles, blogs, news and more on the strides that the public sector is making with big data and analytics. This Public Sector News series skims the newswires and culls highly interesting items that provide fodder for thought, discussion and debate.


by Richard Kittler, Revolution R Enterprise PM, Microsoft Advanced Analytics In its latest release Revolution has added to the platform support of Revolution R Enterprise (RRE) version 7.4. Released…


What is DMARC (Domain-based Message Authentication, Reporting & Conformance) and how does it differ from the already existing authentication methods (SPF Sender Policy Framework and DKIM Domainkey Identified Mail) is the purpose of this blog. If you are within the email marketing industry, DMARC is an acronym you may have heard either in passing or in details, and especially a lot more recently.


At the recent Agile 2015 conference two organisations had video booths set up to record interviews with conference attendees, organizers and speakers. These videos are now published and available. By Shane Hastie


While much is made of data analytics and the troves of data organizations retain, collecting and indiscriminately storing data can be detrimental to the business.


In anticipation of his upcoming conference presentation, Mining the Voter File, at Predictive Analytics World for Government, Oct 13-16, 2015, we asked Benjamin Uminsky, Executive Assistant, Data Scientist, Los Angeles County Registrar Recorder/County Clerk’s Office, a few questions about his work in predictive analytics. Q: How would you characterize your agency’s current and/or planned use […] The post Wise Practitioner – Predictive Analytics Interview Series: Benjamin Uminsky, Los Angeles County appeared first on Predictive Analytics Times.


Shadow properties are fields that don’t exist in the class itself, but are treated as if they did by Entity Framework. They can participate in queries, create/update operations, and database migrations. By Jonathan Allen


Experienced Scrum Masters explain how they define and measure their own personal success as Scrum Masters, and share their lessons learned about how to achieve success. From dealing with stakeholders, to how to improve coaching skills and how to help the team achieve a sustainable pace. The lessons come from many years of experience and will help you improve your performance as a Scrum Master. By Vasco Duarte


With big data now being used in so many different industries, it should probably come as no surprise to hear commercial airlines have been adopting big data analytics to improve their own businesses. Big data can be a highly versatile tool, something that helps companies be more responsive to change and become more prepared for the future. Airlines have long seen the inherent value in this and have slowly evolved to take advantage of big data’s strengths.


Regardless of company size, Bring-Your-Own-Device (BYOD) has become quite popular. According to Gartner, half of employers surveyed say they’re going to require workers to supply their own devices at work by 2017. Spiceworks did a similar study, finding about 61% of small to medium sized businesses were using a BYOD policy for employee devices. Businesses of all sizes are taking BYOD seriously, but are there differences in how large and small companies handle their policies?


Optimization is the application of algorithms to sets of data to guide executives and managers in making the best decisions. It’s a trending topic because using optimization technologies and techniques to better manage a variety of day-to-day business issues is becoming easier. I expect optimization, once the preserve of data scientists and operations research specialists will become mainstream in general purpose business analytics over the next five years.


As much as you’d like to dive directly into your data, the best data analysts know that garbage in garbage out will be the phrase of the day if time isn’t spent cleaning the data first. With that in mind, here are a few important steps you’ll need to take to clean your survey data.


The rapid development of information technologies in the recent decade provides forecasters with huge amount of data, as well as massive computing capabilities. However, “sufficient” data and strong computing power do not necessarily translate into good forecasts. Different industries and products all have their unique demand patterns. There is not a one-size-fits-all forecasting model or technique.


The technological world is moving fast, and many IT departments are scrambling to keep up with it. At the heart of most IT operations is the data warehouse, and one of the chief responsibilities of managing it is to ensure the data warehouse is kept up to date, ready to handle any new demands that arise. This task can be difficult at times, especially considering how revolutionary some advances are.


Despite an overall healthy industry outlook, there are still challenges facing auto industry players. Competition for market share, rising costs to support product differentiation, and a flourishing frontier of digital advertising has left vehicle manufacturers and dealers alike searching for ways to gain competitive advantage in a crowded atmosphere. With this shift, a considerable amount of focus is being directed towards reaching today’s must have demographic: Millennials.


It’s a classic scenario. Two people meet at a party. They chat and then exchange information. However, they never speak or meet again. It is as though the contact information was never exchanged. So, what happened? Was there never intent to follow up? Or, did the information get lost, forgotten, or placed in a pile that never got acted upon?


To deliver on your company’s future demands for data and insights, you will need to maintain your existing data warehouse – and add the great new capabilities available with big data management and in-memory analytics. The real opportunity is in making those technologies work together smoothly with minimum effort and risk.


The term “cloud computing” may be commonly uttered in businesses all over the world, but its complex history is less well known. Our interactive timeline explores the history of Cloud Computing and its future.


For a long time I’ve been a big proponent of weighting data as little and as rarely as possible. You see, I would rather fill cells to achieve the desired sample size rather than fake cells based on data I’ve already gathered – not on data that I failed to gather. Obviously, I failed to gather that data for a specific reason, perhaps one that is completely unknown and unguessable to me.


Big data and financial services are natural match – after all, with nothing to manufacture, and no physical product to sell, data is the bedrock the industry is built on. On top of that, most of their business is quantitative – they mainly deal in good old-fashioned numbers – the simplest data to record and analyze.


As the Internet has become an integral part of our economic and personal lives, hackers have found new ways to exploit the technologies that promote and support it. Google, Heroku, Cloud Foundry, Cloud Bees and other hosting providers make it possible to host applications with nothing more than an email address. This scenario gives hackers the option to auto-generate thousands of email addresses, and create free accounts for these services. The servers can then be used for a number of cyber attacks.


Companies of all sizes, from early-phase start-ups to Big Pharma must continuously collaborate to ensure quality, comply with regulations, and mitigate risks in each stage of the product development life cycle. Here are six key requirements for a future-proof integration platform that can help meet the demands of life science organizations.


In a prior blog post on challenges beyond the 3V’s of working with data, I discussed some issues which hindered the efficiency of data analysts besides drastically raising the bar on their motivation to begin working with new data. In this piece I will examine the time investment it takes to understand and find data today, questions to ask when searching and finally, those tips to draw upon in today’s data-heavy environments.


Do you visualize how “big” is big data? Just imagine these scenarios: people can move things around using their smartphones, pet-owners can monitor a dog’s health with software built in the collar, and casinos can monitor gamblers using a sensor.These are just a few examples of how big data works. It is generated in a number of ways we can barely imagine.


This entry was posted in News. Bookmark the permalink.