Big Data News – 24 Feb 2017

Today's Infographic Link: 50 Years of Space Exploration

Featured Article
Filling in endless forms to buy a new policy may become a thing of the past.

Top Stories
Network access control software that Aruba sells under the ClearPass Security brand will be informed by machine learning, Big Data analytics by Niara.

Yes, Big Iron can do Big Data and Machine Learning, even while it keeps chugging away at its appointed transactional tasks. In fact, putting the two together makes all kinds of sense.

Recruiting firm Randstad's annual salary survey shows data pros are in the upper stratosphere, and DevOps experience is starting to justify a premium.

It's essential to adopt a broader scope about data and analytics, create a very flexible and agile IT framework, and build a strong foundation for data science.

President Donald Trump said this week that the federal budget is a "mess" and is promising to make it leaner. This means that federal IT spending — now at $81.6 billion — is likely to see cuts, analysts said. The Trump administration is still filling top technology policy positions, including replacing former federal CIO Tony Scott, who left last month. Scott, a former CIO of Microsoft and The Walt Disney Co., was appointed by President Barack Obama in February 2015. For now, all eyes are on former U.S. Rep. Mick Mulvaney (R-S.C.), Trump's just-confirmed budget director. Elected in 2010, Mulvaney was part of the Tea Party wave and a member of the conservative House voting block, the Freedom Caucus.

Predictive analytics are used in the connected building industry to find trends and alert Trane customers and service technicians on what needs action. This impacts the bottom line of a company.

Make no mistake, this is not a random academic pursuit. It is of utmost importance in an era where automation is the next big thing

The advisor called as AISHA will provide a chat function through which customers can get fashion advice on their fashion and wearing product requirements

Data is the single biggest raw asset that BASF has, according to Frithjof Netzer, chief digital officer for BASF. The company has a process to farm, refine and market its data. BASF also uses an in-house process called Innorate to create disruptive business ideas.

The rationale for shared infrastructure is simple: Service providers and carriers increasingly seek to offer both wireless and wired services.

The internet is a tough place to have a conversation. Abuse has driven celebrities and ordinary folks from social media platforms that are ill-equipped to deal with it, and some publishers have switched off comment sections. That's why Google and Jigsaw (an early stage incubator at Google parent company Alphabet) are working on a project called Perspective. It uses artificial intelligence to try to identify toxic comments, with an aim of reducing them. The Perspective API released Thursday will provide developers with a score of how likely users are to perceive a comment as toxic. In turn, that score could be used to develop features like automatic post filtering or to provide users with feedback about what they're writing before they submit it for publication. Starting on Thursday, developers can request access to Perspective's API for use in projects they're working on, and Jigsaw will approve them on a rolling basis.

How would you feel about an artificial intelligence system handling your taxes this year? H&R Block, the tax services company, is betting that customers will be willing to have A.I. assist their human tax preparers in getting them the biggest refunds possible or at least reduce how much they owe. The company, which has about 12,000 offices in the U.S. and prepares 24.2 million tax returns worldwide, is using IBM's A.I.-based Watson to do it. "We are introducing something this tax season that is totally new, and is in fact, a first in the tax preparation category," said Bill Cobb, H&R Block's president and CEO, in a statement. "By combining the human expertise, knowledge and judgment of our tax professionals with the cutting-edge cognitive computing power of Watson, we are creating a future where our clients will benefit from an enhanced experience and our tax pros will have the latest technology to help them ensure every deduction and credit is found."

One person can't call an industry dead and make it happen unless the vendors agree. HP Inc. is a great example of a firm rejecting that pronouncement.

After more than a year in preview R Tools for Visual Studio, the open-source extension to the Visual Studio IDE for R programming, is nearing its official release. RTVS Release Candidate 1 is now available for download, giving you the opportunity to try out the new features ahead of the official announcement. We'll cover the features in detail with the general availability release of RTVS 1.0, but in summary the new features include: Remote Execution: type R code in your local RTVS instance, but have the computations performed on a remote R server. You can also switch between local and remote workspaces at will. SQL Server Integration: work with database connections and SQL queries, and create stored procedures with embedded R code. Enhanced R Graphics Support: multiple floating and dockable plot windows, each with plot history. RTVS works with all flavours of R on Windows: CRAN R, Microsoft R Open, and Microsoft R Client & Server. It requires Visual Studio 2015 (including the free Community edition). The RTVS team welcomes your feedback: you can report issues or offer suggestions via the RTVS Github repository. To get started with RTVS, follow the link below. R Tools for Visual Studio: Welcome to R Tools for Visual Studio Preview!

Listen to Preetam Kumar as he speaks about how you can solve the real-world optimization problems with IBM Decision Optimization on Cloud.

I decided to take a break from my Cybersecurity Architecture series and CISO's View series to give my thoughts on this year's RSA conference while things are still fresh. First off, I enjoyed meeting with old colleagues and many security people that I respect which justified the trip as far as I'm concerned. I'm really amazed…

The risk is rising, but the medical industry is better recognizing the risk to patient data and is stepping up its cybersecurity efforts.

Whatever the details, Robotic Process Automation, RPA, as a category is expanding rapidly, and possibly freeing humans for more sophisticated jobs.

By providing features for large team collaboration such as native multi-user security, Spark pipelines, enriched dashboards, 3rd party integrations with Slack and Hipchat for activity notifications, interactive hierarchical clustering, and a plethora of new features, Dataiku DSS 4.0 improves the ability for organizations to develop and manage enterprise data science projects. New York, NY – February 23 – Dataiku, the maker of the enterprise-grade platform for data teams, Dataiku Data Science Studio (DSS), has today announced the release of Dataiku DSS 4.0, which introduces new functionalities that improve the production, development, and management of enterprise data science projects.

Tata's MOVE platform lets IT use an existing programmable network that comes complete with a pre-defined suite of application programming interfaces.

With the introduction of the Hortonworks Data Cloud (HDCloud), deploying clusters and starting to process data has become an order of magnitude faster. When Apache Hadoop evolved from being an on premise solution to a cloud based solution, the time it took to make a cluster went from weeks to days.

Go into your search well-informed with respect to what IT pros in your area are making, with detailed info from the Randstad 2017 Salary Guide.

Like agile software development, data science works best when models can be tested and iterated rapidly. The latest release from data science collaboration tool Dataiku adds integrations with GitHub and HipChat that will not only bring developers and data scientists together, but hopefully, some of their DevOps discipline as well.

The online user content capture system, Evernote, has moved 2.9 PBs of data into the Google Cloud and shut down its primary data center.

By: Eric Siegel, Founder, Predictive Analytics World In anticipation of his upcoming conference presentation, Listening Down the Value Chain: Using Text-based Predictive Models to Find New Opportunities for B-to-B Business, at Predictive Analytics World San Francisco, May 14-18, 2017, we asked Michael Dessauer, Data Scientist at The Dow Chemical Company, a few questions about his work in predictive analytics. Q: In your work with predictive analytics, what behavior or outcome do your models predict?

Between the weekly reports on plant performance, supplier KPIs and inventory levels, more data may be the last thing supply chain managers want to crunch.

Embracing the Multi-Cloud Approach

If self-service business intelligence initiatives are on your agenda, follow these 10 best practices for ensuring proper governance. is one of an increasing number of players that aims to offer sales teams tools to make them more efficient and effective. In its case, InsideSales came about through the post-graduate research of co-founder Dave Elkington. As Elkington studied artificial intelligence, he soon came to realize that A.I. has existed for decades — the math that is behind A.I. was used back in the mid-20th century by researchers at companies such as IBM. What is different today — and what gives the power to the disruptive companies to undermine their more conservative competitors — is the access to data. As Elkington sees it, Netflix's ability to put Blockbuster out of business was a direct result of Netflix's intentional strategy to amass information about its customers and, in doing so, to give its own predictive algorithms the best possible source data to tune its suggestions.

The big data pipeline is getting more crowded. Learn how to improve your company's big data throughput when going over the public internet.

With its Ryzen launch, AMD is avoiding making a strategic mistake it has made several times in the past, says analyst Rob Enderle.

If you're an Excel user (or any other spreadsheet, really), adapting to learn R can be hard. As this blog post by Gordon Shotwell explains, one of the reasons is that simple things can be harder to do in R than Excel. But it's worth perservering, because complex things can be easier. While Excel (ahem) excels at things like arithmetic and tabulations (and some complex things too, like presentation), R's programmatic focus introduces concepts like data structures, iteration, and functions. Once you've made the investment in learning R, these abstractions make reducing complex tasks into discrete steps possible, and automating repeated similar tasks much easier. For the full, compelling argument, follow the link to the Yhat blog, below. Yhat blog: R for Excel Users

Apple this week took administrative control of the domain, the last notable web address it did not govern that users could have linked with its online sync and storage service. According to WHOIS searches today, Apple acquired control of on Tuesday. Apple already ruled the primary top-level domains for iCloud, the cross-device, cross-OS service that stores files generated by iOS and macOS, and more importantly, synchronizes everything from Safari browser bookmarks to photographs between iPhones, iPads and Macs. Apple is on record as the owner of the domains,, and, for example.

IoT networks are unique: They will be worldwide, required to function where no established network exists, and have stringent power requirements.

Amazon Web Services is the consensus leader of the IaaS public cloud computing market according to industry watchers, but they credit Microsoft for closing the gap with Azure and say Google with its Cloud Platform has made considerable strides as well.

Prescriptive analytics (optimization) is a sophisticated analytics technology. It can deliver great business value by helping decision makers handle the tough trade-offs that arise when limited resources force choices among options. Optimization was traditionally applied by Operations Research professionals to solve operational problems, such as route optimization and logistics planning. With the advent of new technologies that make it possible to model larger, enterprise-wide problems, and provide broad support for what-if analyses, Prescriptive analytics now enables a new class of business analytics applications.

We will see shifts in ransomware, including how attackers target victims, who they'll target, and the role IoT will play in ransomware.

Using IBM Counter Fraud Management (CFM), an insurer can improve the operational effectiveness of its fraud prevention program and drive impressive fraud savings. IBM Watson helps insurers detect, respond and stop fraud with the ability to tap unstructured data.

In today's digital transformation, achieving the desired user outcome is now the driver of technology decisions, rather than the other way around.

Did political analytics firm Cambridge Analytica exaggerate the role analytics played in helping Donald Trump win? Crowdskout CEO Zack Christenson separates election tech facts from fiction.

In my spare time, I teach a fitness class that is offered in gyms around the world, and there are thousands of instructors worldwide. We all receive the same training, the same music, the same choreography. We are expected to deliver the same consistent experience to participants the world over. The expectation is that it… The post Stacking the Open Source Odds For Success appeared first on Hortonworks.

Governing bodies are pushing for an open standard to make it simpler for e-signatures to move through a process spanning apps from multiple vendors.

What are the limits of AI? And how do you go from managing data points to injecting AI in the enterprise?

Alex Sakaguchi, director, solutions marketing, Veritas: Company is committed to extend reach of software across as many relevant clouds as possible.

Before CDH 5.10, every CDH cluster had to have its own Apache Hive Metastore (HMS) backend database. This model is ideal for clusters where each cluster contains the data locally along with the metadata. In the cloud, however, many CDH clusters run directly on a shared object store (like Amazon S3), making it possible for the data to live across multiple clusters and beyond any cluster's lifespan. In this scenario clusters need to regenerate and coordinate metadata for the underlying shared data individually.

About two years ago, Hortonworks donated the entire code base of about 440,000 lines from its XA Secure acquisition to the Apache Software Foundation (ASF) in order to help jump start Apache Ranger as an Apache Incubator project. Hortonworks made this decision because our enterprise customers need an extensible and robust open source security framework…

Mandy Chessell, Distinguished Engineer & Master Inventor discuses four key perspectives on Data Lakes and introduces a new video series.

The Informatica Axon offering is the first time Informatica will provide apps specifically for IT pros asked to be the stewards of data governance.

Google recently added support for the NVIDIA Tesla K80 GPU in the Google Compute Engine and Cloud Machine Learning to improve processing power for deep learning tasks.

The Department of Homeland Security is funding new research to stop distributed denial of service (DDoS) attacks

Audiences are still very interested in hearing real-life stories about organizations that have embarked on digital transformation. But the feedback is less "how can I make that work for me?" and more "wow, that's interesting! I wonder who's going to do that in my company?!". In other words, they think it's important — for somebody else.

NoSQL uses procedural implementation-specific structures expressed in a JSON format to represent its data model. ECMA International Standards body developed JavaScript to handle tasks in the browser. They also provided an extension to JavaScript to develop a lightweight language for interchanging data over the Internet called JavaScript Object Notation (JSON). The downside of JSON is that it lacks the capabilities to provide referential integrity. These data models are neither interoperable nor standardized. Which means, no data portability. JSON doesn't provide any ability to resolve name space ambiguity in which your data is defined, or the structure and data types.

Chief Data Officers, in particular, will want to take note of Generation Z as they begin to grow up, because many of their attitudes and behavior toward data are shifting from that of previous generations. Those that prepare for Gen Z early and build a relationship with them based on good data practices may find themselves in optimal position as this new group's influence and purchasing power increases.

It's hard to find a company that does not have some form of a hybrid (cloud and on-premise) ERP system. For most, that happened by accident. Someone in the organization bypassed IT and bought a cloud service to fill a need more quickly than they could with an on-premise solution., for example, has often been the start of a company's march to a hybrid environment. Cloud applications can be relatively easy, low-cost solutions, but they do introduce new complexities when they need to be integrated with on-premise ERP systems and databases, or with each other. Ensuring that cloud and on-premise systems play nice together is just one part of the hybrid challenge. Making the right decisions about what will be in the cloud and what stays in-house is the other.

In anticipation of her upcoming conference presentation, Redefining Analytics for Marketing, at Predictive Analytics World San Francisco, May 14-18, 2017, we asked Jennifer Bertero, VP, Business Analytics at CA Technologies, a few questions about her work in predictive analytics. Q: In your work with predictive analytics, what behavior or outcome do your models predict?

It seems those interested in a career in cybersecurity should have options. But problems in education, job listings and realistic expectations abound.

Bloomberg has created an open source way to use machine-learning models to weight searches, then add their values to the Solr search engine.

Machine learning can automate the handling of huge troves of data to help companies make and save money. However, they're not without pitfalls, as the real estate tech company Redfin learned. As Redfin began building its own machine-learning capabilities, it ran into a problem: Employees weren't using them. Bridget Frey, the firm's CTO, said in an interview that there was a key reason for that: At first, Redfin didn't leave room in these systems for the real estate agents who were supposed to use them to make modifications. For example, a Listings Matchmaker feature generated a list of personalized recommendations for home buyers, based on their interests. In its initial iteration, agents weren't able to add recommendations they thought would be useful.

I'm a big fan of action movies. In particular I like the Mafia-style genre which often has a big, aggressive antagonist playing tough over a weaker opponent. You know the storyline: Unsuspecting individual gets trapped into an ever-escalating situation where the odds just keep getting worse. Antagonist takes advantage of said individual to keep tightening the screws and making more and more difficult demands. I was thinking about this sort of storyline the other day when I heard about some new licensing policies that Oracle — the technology industry's best analog for Al Capone — had announced. Specifically, the licensing applies to those Oracle customers running databases on Amazon Web Services (AWS) or Microsoft Azure.

Google's Go language was recently chosen as Tiobe's programming language of 2016, based on its rapid growth in popularity over the year, more than twice that of runners-up Dart and Perl. Tiobe's language index is based on the "number of skilled engineers worldwide, courses, and third-party vendors," using the results of multiple search engines.

Microsoft has launched Project Sangam, a cloud service integrated with LinkedIn that will help train and generate employment for middle and low-skilled workers. The professional network that was acquired by Microsoft in December has been generally associated with educated urban professionals but the company is now planning to extend its reach to semi-skilled people in India. Having connected white-collared professionals around the world with the right job opportunities and training through LinkedIn Learning, the platform is now developing a new set of products that extends this service to low- and semi-skilled workers, said Microsoft CEO Satya Nadella at an event on digital transformation in Mumbai on Wednesday.

Cybersecurity Ventures recently announced their Q3 2016 Cybersecurity 500, a directory of the hottest cybersecurity companies to watch this year.

Radiohead is known for having some fairly maudlin songs, but of all of their tracks, which is the most depressing? Data scientist and R enthusiast Charlie Thompson ranked all of their tracks according to a "gloom index", and created the following chart of gloominess for each of the band's nine studio albums. (Click for the interactive version, crated with with highcharter package for R, which allows you to explore individual tracks.) If you're familiar with the albums, this looks pretty reasonable. Radiohead's debut, "Pablo Honey" was fairly poppy musically, but contained some pretty dark lyrics (especially in the break-out hit, Creep).

Machine learning can automate the handling of huge troves of data to help companies make and save money. However, they're not without pitfalls, as the real estate tech company Redfin learned. As Redfin began building its own machine-learning capabilities, it ran into a problem: Employees weren't using them. Bridget Frey, the firm's CTO, said in an interview that there was a key reason for that: At first, Redfin didn't leave room in these systems for the real estate agents who were supposed to use them to make modifications. For example, a Listings Matchmaker feature generated a list of personalized recommendations for homebuyers, based on their interests. In its initial iteration, agents weren't able to add recommendations they thought would be useful.

Drones and balloons can provide backup services in case of disaster; supplement services at high use times; offer service economically to rural areas.

The interesting challenge that product managers and innovators grapple with everyday is balancing what customers are asking for with what customers actually need. As an example in the late 19th century, purely focusing on customer requirements might have led to trying to breed a faster, more healthy and longer living horse. Then came Henry Ford… The post A Faster Horse? appeared first on Hortonworks.

Worldwide spending on public cloud services and infrastructure will reach $122.5 billion in 2017, representing an increase of 24.4% over 2016. As enterprises across the world increased investment, overall public spending is expected to surge 21.5% by 2020, nearly seven times the rate of overall IT spending growth. By 2020, IDC research forecasts public cloud spending will reach $203.4 billion worldwide. Software as a service (SaaS) will remain the dominant cloud computing type, capturing nearly two-thirds of all public cloud spending in 2017 and roughly 60% in 2020. According to IDC, SaaS spending, which is comprised of applications and system infrastructure software (SIS), will, in turn, be dominated by applications purchases, which will make up more than half of all public cloud spending throughout the forecast period.

Development of data infrastructure on cloudy, hyperconverged lines produces an environment that is scalable with reduced operational responsibilities.

The 5G standard, which promises to wipe the gap between wireless and wired connections, will be a huge step forward. Carriers weigh in on their plans.

US companies that don't have a presence in Europe still have to be sure that they comply with the EU's privacy laws regarding personally identifiable data.

Intel aims to accelerate IoT adoption by making it affordable to deploy a turnkey endpoint with built-in sensor and networking technologies.

Additions to the Cisco Digital Network Architecture (DNA) are the latest in a Cisco effort to unify management of networks.

As consumers and employees, we tend to be haughty about our security IQ. Nearly two-thirds say they don't believe they were a victim of a cyberattack. CEO explains the advantages and potential pitfalls of DevOps and how the company used big data and AI to create the Apocalypse Index, a real-time chart forecasting the end of the world.

A number of vendors are trying to position themselves as the low-code app development platform of choice. Appian is one leader.

The big data pioneer on how to use machine learning to mine gold from your company's data stores.

Talent Analytics, Corp. has a unique approach to workforce predictive analytics. At our firm, we measure success by how our projects quantifiably benefit the Line of Business. We watch it, track it, and report success. Our algorithms get better and smarter using the best Data Science methods available. I've been involved in the predictive workforce…

Manuel Martin Marquez, lead data scientist at Cern, on how the research lab is using machine learning to mine value from the huge amounts of data it generates

While we are very excited about DataWorks Summit/Hadoop Summit Munich, which is just around the corner, we are equally thrilled to announce DataWorkSummit/Hadoop Summit San Jose, which will take place June 13-15 this year at the San Jose Convention Center. DataWorks Summit/Hadoop Summit brings together your peers and industry experts alike, to help create the…

This entry was posted in News and tagged , , , , , , , , , , , , , . Bookmark the permalink.