Big Data News – 26 Aug 2016

Featured Article
Free catered lunch and a dog-friendly office are two of the perks offered by an educational technology company in Palo Alto, Calif., that's looking to hire a machine learning engineer. The position, posted on Dice, will pay between $140,000 and $160,000 to the right candidate who's skilled in machine learning platforms as well as data mining, statistical modeling, and natural language processing.

Top Stories
The pace of doing business in today's ultraconnected world has changed everything. From the way advertisements are bought, sold and displayed, to the way businesses market to their buyers, we've entered an entirely new era. Although some get a bit weepy and nostalgic wishing for "the good old days," these are exciting times for today's leading companies. They're even more thrilling for today's disruptors. In today's marketing organizations, there's an ongoing war.

HP Sure View's privacy mode blocks all the light from the screen from being seen by anyone other than the primary user of the device.

There's been a lot of press recently about the problems of IoT security: easily hackable smart locks, as many as 100M Volkswagens at risk, vulnerable light bulbs, and even sex toys

In anticipation of his upcoming conference presentation, Importance of Model Risk Management in Financial Institutions at Predictive Analytics World Financial in New York City, October 23-27, 2016, we asked Hevel Jean-Baptiste, Global Senior Program Manager, Model Risk Management Systems at GE Capital, a few questions about his work in predictive analytics. Q: In your work with…

It's part of my job to cover the ecosystem of Hadoop, the open source big data technology, but sometimes it makes my head spin. If this is not your primary job, how can you possibly keep up? I hope that a discussion of what I've found to be most important will help those who don't have the time and energy to devote to this wide-ranging topic.

Traffic stats for Comcast and some lawsuits in the telco industry are in the news this week.

I recently caught up with Tim Trefren, Co-founder and VP of Customer Success at Mixpanel to discuss the company's new Autotrack feature that automatically collects every action on a customer's website and allows customers to easily access that data.

In this special guest feature, Cecilia Pizzurro, Senior Director, Strategic Data Projects at LOGICnow, discusses the convergence of data/machine learning and cybersecurity, and the idea that these two are playing off of each other in a more meaningful way than ever before.

Dropbox is asking users who signed up before mid-2012 to change their passwords if they haven't done so since then. The cloud storage service said it was asking users to change their passwords as a preventive measure, and not because there is any indication that their accounts were improperly accessed. Dropbox said its security teams learned about an old set of Dropbox user credentials, consisting of email addresses and hashed and salted passwords, which it believes were obtained in 2012 and could be linked to an incident the company reported around the time.

The creation of artificial general intelligence would be one of the biggest breakthroughs in the field of AI for years. Has it already happened or are the claims of Kimera Systems just hot air?

Machine Learning curates targeted offers for today's connected consumers

Dropbox is asking users who signed up before mid-2012 to change their passwords if they haven't done so since then. The cloud storage service said it was asking users to change their passwords as a preventive measure, and not because there is any indication that their accounts were improperly accessed.

Remember a few years ago, when Hadoop took knocks left and right for lacking usability, security, and other key features and functionality? Well, no more. A couple of weeks ago, Hortonworks revealed the latest version of Hortonworks DataFlow (HDF), its integrated system that allows for dataflow management and streaming analytics. What's Up at Hortonworks?

The developers of a "cognitive media platform" designed to render audio and video content and make the results searchable have closed a $50 million funding round. Veritone Inc. announced this week that patent investor Acacia Research Corp. (NASDAQ: ACTG) led the investment round. The startup and investor are both based in Newport Beach, Calif. An earlier round completed in 2014 raised $15 million in venture capital. The latest funding round includes an initial $20 million investment along with a $30 million "contingent" investment based on Veritone's achieving a series of development milestones.

Originally posted on Data Science Central This article introduces Mahout, a library for scalable machine learning, and studies potential applications through two Mahout projects. It was written by Linda Terlouw. Linda is a computer scientist who works on Data Science (Data Analysis, Data Visualization, Process Mining). Apache Mahout is a library for scalable machine learning.

Big data has been a big topic for a few years now, and it's only going to grow bigger as we get our hands on more sophisticated forms of technology and new applications in which to use them. The problem now is beginning to shift; originally, tech developers and researchers were all about gathering greater quantities of data.

"US, UK, China, Japan, Italy, Germany, South Korea and Norway win gold. Tunisia, Greece & Portugal win sport focus medals. Australia ranks 3rd for overall most medals" As the Olympic events continue to unfold over the weekend, people from all over the world have their eyes locked on to the medals table to determine which countries emerge as the overall Olympic champions.

Functionality unto itself can be impressive, but it's a hard sell. Embed the technology, and adoption ticks up.

Five challenges preventing IT from shifting to open source and tips for tackling them to keep the future of open source heading in the right direction.

Developers like biometrics, but creative hacks have many concerned that the solution may not be as secure as we hoped.




Machine Learning curates targeted offers for today's connected consumers

You've probably heard about Bitcoin, the virtual cryptocurrency that has often been associated with illegal activities and the "dark web." Indeed, it's been a favorite way for drug dealers, for various criminals, and for hackers who take over machines and then demand "ransomware" to get paid. Since Bitcoin transactions are secure and essentially untraceable, it makes life much easier for undercover transactions. Of course, there are many legitimate uses for Bitcoin as well, and many mainstream institutions have begun accepting it as payment.

The big data revolution is transforming how business gets done across multi-trillion-dollar industries like financial services, healthcare, manufacturing, and retail. Heck, even the Federal Government, with its nearly $4-trillion budget, is getting in on the act. But one industry that's often overlooked in the rush to use computers to crunch data to optimize our world is the information technology (IT) sector itself. It's really hard to quantify the impact that big data analytics will have in the future.

Companies today are awash in data, but current tools and processes are not enabling them to keep it secure. That's according to Informatica CEO Anil Chakravarthy, whose says his company — which has traditionally focused on data management and integration — is embarking on a major push to go further into data security. "You hear about breaches all the time — just imagine all the ones you're not hearing about," Chakravarthy said in a recent interview.

Here are a few best practices for striking an appropriate balance to ensure that speed and quality is not an either/or choice.

  London Data Festival, November 16th & 17th The program for the London Data Festival is filling out, with new sessions being added every week – the depth and breadth of content covered is unrivaled with three separate summits all under one roof. Consisting of over 400+ senior-level attendees, exclusive presentations and interactive workshops, this the best place to improve your data literacy and connect with the best.  

This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach. The first public cloud services went live in the late 1990s using a legacy construct called a multi-tenant architecture, and while features and capabilities have evolved, many cloud services are still based on this 20th century architecture.

EMC made a commitment today to simplifying the management of data flows across all those environments.

Despite major market inroads being made by Apache Spark, a new forecast estimates the global market for the Hadoop big data framework will continue to grow at a healthy clip through 2021, fueled in part by growing enterprise demand for Hadoop services.

Coho Data, a leading innovator and provider of true scale-out all flash storage architecture and infrastructure solutions for private clouds, announced DataStream 2.8, helping to make the Software-Defined Data Center (SDDC) a reality.

This Fall, #SAPTechEd will take place in Las Vegas (September), Bangalore (October), and Barcelona (November). The best go-to-market and product management experts on SAP BusinessObjects Predictive Analytics will deliver many different sessions (lectures, hands-on sessions,and yes, even a CodeJAM) during these three major SAP events. Here is a run-down of everything predictive that you won't…

Cloudera is eight; Apache Hadoop is ten. Big data has gone from zero to how-did-that-happen huge. The bestiary is bigger than ever, too: new projects like Apache Kudu, Apache Impala (incubating), Apache Kafka and Apache Spark define the future of big data and analytics, extending the core Hadoop platform to handle streaming, real-time and advanced analytics.

In this Q&A, a Forrester Research analyst explains how using analytics can uncover connections between people management and business outcomes and improve areas such as hiring, retention and training.

If you're fresh from a machine learning course, chances are most of the datasets you used were fairly easy. Among other things, when you built classifiers, the example classes were balanced, meaning there were approximately the same number of examples of each class.

Joins When dealing with large data sets, it's important to ensure that the data can be accessed correctly. Failure to address this issue early on in database development can lead to problems when later attempting to extract information from the data itself. This was highlighted recently during a count for London's Mayoral election on 5 May, when staff at the Electoral Commission had to 'manually query a bug-stricken database' which delayed the result by 'several hours'.

Power BI, Microsoft's data visualization and reporting platform, has made great strides in the past year integrating the R language. This Computerworld article describes the recent advances with…

Data Frames are the tables to store data. If you recall the vectors from the first R notes data frames can be imagined as the collection of vectors with same dimension. We have already created vectors, named the vectors and plotted on histograms.

Songzhi Liu is a Professional Services Consultant with AWS The data lake concept has become more and more popular among enterprise customers because it collects data from different sources and stores it where it can be easily combined, governed, and accessed. On the AWS cloud, Amazon S3 is a good candidate for a data lake implementation, with large-scale data storage.




IT organizations can upgrade the OneBlox 5210 any time they want by plugging in faster SSDs as they become available.

Here's a great visual summary overview of what you need to start a customer loyalty programme optimised: the questions to ask before getting started and an overview of all the possible data sources to consider.

If we want the opportunities and the conveniences the internet has to offer, we have to anticipate costs of cybercrime in the value of doing business.

Choosing the right consult can make all the difference in whether a project succeeds or fails spectacularly.

Joe Caserta is founder and president of Caserta Concepts, a New York–based innovation technology and consulting firm that specializes in big data analytics, data warehousing, ETL and business intelligence. Don't miss this enlightening discussion between Joe Caserta and IBM data science evangelist James Kobielus.

Four years ago, the New York Public Library began to move its web properties to the cloud. Today, the library system has all of its approximately 80 web sites in the cloud. The library has shrunk the number of on-premise servers by 40% and is running those web properties 95% more cheaply than if it had bought the hardware and software to do it all by itself.

Key Features Put machine learning principles into practice to solve real-world problems Get to grips with Python's impressive range of Machine Learning libraries and frameworks From retrieving data from APIs to cleaning and visualization, become more confident at tackling every stage of the data pipeline Book Description Machine Learning is transforming the way we understand and interact with the world around us. But how much do you really understand it?

This article was originally posted here. It was written by Steven Scott, a Bayesian statistician interested in data augmentation methods and Markov chain Monte Carlo. Steven has applied these methods to problems in educational testing, network security, biometrics, web browsing, e-commerce, and medical applications. "I'm happy to announce a new "Statistics" add-on for Google Sheets (the spreadsheet component of Google docs).

Go from messy, unstructured artifacts stored in SQL and NoSQL databases to a neat, well-organized dataset with this quick reference for the busy data scientist. Understand text mining, machine learning, and network analysis; process numeric data with the NumPy and Pandas modules; describe and analyze data using statistical and network-theoretical methods; and see actual examples of data analysis at work.

Summary:  Sensors that know how you feel?  Sensors that want to change the way you feel?  When did that happen and better yet how?   We're getting used to sensors finding out what we're doing.  Apparently they are now sufficiently sophisticated that they can even tell if I'm sitting up straight (yes Mom — BTW using a camera is almost cheating, you should be able to do this with just an accelerometer and a gyro).  But what if I told you that those same IoT sensors can tell how you feel?  And now they're even being programmed to change the way you feel!  A little creepy?  Feeling manipulated? 

Aimed at practitioners The presentation is as non-mathematical as possible Includes many examples of the use of statistical functions in spreadsheets Employs a realistic sample survey as an exemplar throughout the book Fills a gap in the existing literature on statistics About this Textbook: This book was written for those who need to know how to collect, analyze and present data.

This entry was posted in News and tagged , , , , , , , , , , . Bookmark the permalink.