Big Data News – 01 Sep 2016

Today's Infographic Link: Save The Animals

Featured Article
Data is exponentially increasing while habits remain steeped in legacy systems and the past when it comes to safety and pharmacovigilance (PV) in the pharmaceutical ecosystem. Old processes and a historic approach cannot meet future requirements. Growing concerns, such as unstructured data and the emergence of patients as consumers, reveal themselves on a daily basis as new products are brought to market and new regulations come forth.

Top Stories
Companies have been bringing chief data officers (CDOs) on board at a rapid rate. Interestingly, most of those tapped to fill the high-profile position did not rise through the ranks of IT. If you are eyeing a career path toward a CDO post, here are the traits you need to succeed.

One of the key driving factors behind the various web/mobile performance initiatives is the fact that end-users' tolerance for latency has nose-dived. Several studies have been published whereby it has been demonstrated that poor performance routinely impacts the bottom line, viz,. # users, # transactions, etc. Over five years back we had written a blog on the anatomy of HTTP. In the Internet era, five years is a long time. There has been a sea change across the board and hence we thought it was time to revisit the subject for the benefit of the community as a whole. The figure below shows the key components of a HTTP request.

20th Century Fox takes help of IBM Watson to create trailer of its new fictional movie Morgan, set to release on September 2

A cybersecurity transformation discussion on how cloud security is rapidly advancing, and how enterprises can begin to prevail over digital disruption by increasingly using cloud-defined security. We'll examine how a secure content collaboration services provider removes the notion of organizational boundaries so that businesses can better extend processes. And we'll hear how less boundaries and cloud-based security together support transformative business benefits.

Some technologies are perfect targets for natural language processing (NLP) and virtual assistants.

Big data has been a game changer for almost every industry. While it has created many new opportunities for businesses to improve efficiency and provide higher quality services, it also raises some other concerns.

In this special guest feature, Victor Amin, Data Scientist at SendGrid, advises that businesses implementing machine learning systems focus on data quality first and worry about algorithms later in order to ensure accuracy and reliability in production.

I recently caught up with Ravi Mayuram, SVP Products & Engineering at Couchbase, to discuss recent developments in the NoSQL database industry such as the relationship with Hadoop and Spark, container technology, security, and much more.

From sensors that measure water leaks to smart parking meters and self-driving buses, internet of things (IoT) technologies are a major contributor to the rise of smart cities. Here is a look at 6 innovative uses of IoT in municipalities around the world.

Yesterday, SAP announced a HANA-only version of its Business Warehouse. Today, results of a survey say HANA is saving customers money. The crazy part is, it all makes sense.

From big oil to big data: inside Mukesh Ambani's $20 billion start-up

Starting salaries for big data pros will continue to rise in 2017 as companies jockey to hire skilled data professionals. Recruiting and staffing specialist Robert Half Technology studied more than 75 tech positions for its annual guide to U.S. tech salaries, including 13 jobs in the data/data administration field. In the big picture, starting salaries for newly hired IT workers are forecast to climb 3.8% next year. (See also: 14 hot network jobs/skills for 2017)

A company's collection of online systems is like a delicate ecosystem — all components must integrate with and complement each other, and one single malfunction in any of them can bring the entire system to a screeching halt. That's why, when monitoring and analyzing the health of your online systems, you need a broad arsenal of different tools for your different needs. In addition to a wide-angle lens that provides a snapshot of the overall health of your system, you must also have precise, scalpel-like tools that can isolate and analyze all of those different components (DNS, CDNs, internal and external servers, third-party tags, etc.).

The growing popularity of IoT has spawned the debate on privacy once again. Last year, Samsung stoked controversy by warning customers that their Smart TV Voice Recognition system was capable of "listening" to personal and sensitive information spoken by customers. Not only this, all of this intercepted information is transmitted over a non-encrypted connection to be stored in a third party server.

Network World's Brandon Butler and IDC's Matt Eastwood discuss major highlights from the VMWorld show in Las Vegas.

Bottlenecks are a fact of life in IT. No matter how fast you build something, somebody will find a way to max it out. While the performance headroom has been elevated dramatically since Hadoop introduced distributed computing to the commodity masses, the bottleneck has shifted, but it hasn't disappeared. So where did it go? Depending on who you ask, you'll get different answers. But one thing seems abundantly clear: it's no longer the local network. Ever since Google (NASDAQ: GOOG) developed its MapReduce framework — which Doug Cutting would go on to pair with his distributed Nutch file system to create Hadoop — the speed of the local area network (LAN) has been less and less a factor.

Hold this question in your head as you read: Is your company ready to be customer obsessed? Forrester recently put out a report that insight driven businesses are estimated to grow 27% annually. These companies are growing significantly faster than their competitors because they are able to optimize every aspect of their business. In order to get there, these companies made major changes in how they staff and look at data across the entire organization and made it a priority at the executive level. In our previous post,we admit that big data can have its challenges, but recognize the best way to move forward would be to break it into manageable pieces: People, Process, and Technology.

Check out this analysis of the implication of the new Dodd-Frank Act rule for incentive compensation, and how institutions can mitigate the dangers of excessive compensation and financial risk.

Teradata Corp. (NYSE: TDC), the big data analytics company announced four powerful software-and-service solutions that speed up the transformation of Internet of Things (IoT) data to actionable insight.




2020 seems to be an important milestone for the Internet of Things. That's the year that Cisco says there will be 50 billion connected devices and also the year Gartner notes that over 50% of major new business processes and systems will incorporate some element of the Internet of Things. That's the good news.

Is dealing with transactional and analytical workloads in a single database a pipe dream? Not according to MarkLogic, which has its own way of taking on both.

A cybersecurity manager's responsibilities will vary tremendously based on the size of the team and the industry, but here are some common responsibilities.

The shipping industry is the lifeblood of the world economy, moving 90% of merchandise valued at over $19 trillion in 2014, according to the World Trade Organization. However, the forecasts relied on by many shipping companies leave something to be desired. Now, with big data analytics settling into the wheelhouse, the industry is set to make a sizable leap in efficiency. It's easy to forget how quickly the field of big data analytics has arisen, and what steps a given organization must take to be in a position to take advantage of it. For industries like financial services that digitized their process long ago, the move up to using predictive analytics was not a traumatic one.

At VMworld, Network World chatted with Tom White, IT Director at Environmental Science Corp., about how he virtualized disaster recovery using cloud technologies.

What do you do when you once held a dominant position in a lucrative market like IT but the technology revolution you started has passed you by?

Physicists and other academic researchers from hard sciences are increasingly leading big data analytics projects, and in some cases, they are a perfect fit.

The insider threat is not really a cybersecurity problem or a data analytics issue; it's a human risk problem.

Technology consultants and vendor executives share their predictions on likely developments in BI, analytics and big data technologies, and user deployments over the next 12 months.

A recent CIO editorial by Bernard Golden regarding the future of private cloud spurred some interesting commentary in my network. The pushback seemed to focus around the viability of the term "private cloud". These individuals are well-respected thought-leaders in cloud with significant experience guiding senior IT executives transition to modern architectures, so I decided I'd engage them in a discussion regarding the future of self-managed infrastructure as a whole.

You download the data and complete your analysis with ample time to spare. Then, just before deadline, your collaborator lets you know that they've "fixed a data error". Now, you have to do your analysis all over again. This is the reproducibility horror story:   R provides many tools to make reproducibility easy, and the creators of the above video, Ecoinformatica – AEET, provide a useful list of tutorials and guides.

Learn how replication functionality for Apache Hive metadata and consistency benefits from automated HDFS snapshots benefit production environments. A robust backup solution that is both correct and efficient is necessary for all production data management systems. Backup and Disaster Recovery (BDR) is a Cloudera Manager feature in Cloudera Enterprise that allows for consistent, efficient replication and version management of data in CDH clusters. Cloudera Enterprise BDR can be used for creating efficient incremental backups of HDFS and Hive data from multiple clusters.

Did you have a good, relaxing break over the summer? Are you refreshed and re-energised, looking forward to a new start, a new you and brushing up on your data analysis skills? If so, I've thrown together a collection of a few excellent (and free!) statistics eBooks for your Kindle to sharpen up your stats while you're on the long commute to work. Just try not to read them while driving!   These books require different levels of existing knowledge, and while some are for early-stage data scientists others are for more hard-core physicists and mathematicians. Nonetheless, it's likely that you'll find something in here that will get your mental juices flowing with ideas about how to tackle your data.   There's even a bonus book at the end about the reasons why correlation does not necessarily imply causation.   All these books are free, so dive in and enjoy!

NoSQL also know as not only SQL (structured query language) is a tool used for data design and management of large quantity of distributed data. NoSQL database comprise a wide range of architecture and technologies designed to overcome problems of scalability and big data performance problems. NoSQL is very useful when organizations have to deal with large amount of unstructured and semi-structured data or remotely stored data or information on virtual servers. Some NoSQL implementations that were launched in the market include Google BigTable, MapReduce, Voldemort, SimpleDB, Apache Hadoop and MemcacheDB among others.

There's plenty of disagreement about the value of economics research. Skeptics say that economists have predicted eight of the last five recessions, or zero of them. Economists and their fans point to supposed economic achievements such as stable, low inflation and (often) steady growth. In this blog post I'll use simple data science techniques to attempt to measure progress in economics research. There are many potential ways to measure progress in research. For example, since each published academic paper ostensibly makes a contribution to a field, counting the number of published papers in a field could measure that field's progress.

Fast forward transformation process in data science with Apache Spark Data Curation : Curation is a critical process in data science that helps to prepare data for feature extraction to run with machine learning algorithms. Curation generally involves extracting, organising, integrating data from different sources. Curation may be a difficult and time consuming process depending on the complexity and volume of the data involved. Most of the time data won't be readily available for feature extraction process, data may be hidden is unobstructed and complex data sources and has to undergo multiple transformational process before feature extraction .

As data experts, it's sometimes hard for us to realize that — for the vast majority of people — issues related to big data and big data security aren't all that familiar. That's why there are so many misconceptions floating around. Unfortunately, these falsities are putting businesses around the world in grave danger. Squash These Four Myths The cybersecurity industry is growing at an astounding rate, which means the threats facing businesses are also increasing. From 2017 to 2021, Cybersecurity Ventures projects $1 trillion will be spent globally on cybersecurity.

Teradata has at long last decoupled Aster analytics from the underlying database. It reflects the fact that columnar databases alone won't differentiate a product.

SAP recently announced its BW/4HANA platform that leverages logical data warehousing to provide a path to faster data analytics and application development.

Generally speaking, there are two kinds of companies in the world: data rich and data poor. The richest of the data rich are easy to name: Google, Facebook, Amazon, Apple. But you don't need to be at the top of this list to use data to create value. You need to have the tools in place to turn information (data) into action. That's what the data rich do that the data poor and the data middle class do not.




Real Time Anomaly Detection Read the whitepaper Real-Time Anomaly Detection and Analytics for Today's Digital Business (White Paper) Detecting incidents in streaming business data is a unique challenge, and data-heavy companies are often faced with: " Static thresholds that are either meaningless or cause alert-storms for seasonal data " Dashboards and reports that lag behind " Delays in identifying business incidents that impact revenue In this White Paper, Jason Bloomberg of Intellyx discusses how real-time anomaly detection based on machine learning is a game changer for digital technology companies.

At VMWorld 2016, Network World's Brandon Butler chats with Guido Appenzeller from VMware about its move to extend NSX functionality to the public cloud.

Coupling its data warehouse more tightly with the SAP HANA in-memory database, SAP unveiled an implementation that only runs on HANA.

In a bid to enhance its enterprise cloud software offerings with expanded Hadoop services, SAP is reportedly acquiring big data analytics startup Altiscale Inc. The web site VentureBeat.com reported late last week that SAP (NYSE: SAP) is finalizing details of the acquisition estimated to be valued at more than $125 million. Citing a source familiar with the negotiations, a deal could be announced in "the next weeks." Altiscale, Palo Alto, Calif., did not respond to a request to comment.

Starting salaries for IT professionals are expected to rise an average of 3.8% in 2017 compared with 2016. Find out which IT jobs are projected to see the greatest salary increases next year.

Privileged Identity Management (PIM) is the lowest common denominator in today's most treacherous corporate and governmental security breaches. Or more accurately: Privilege Mismanagement. Sony, Target, Anthem, JP Morgan Chase, the city of San Francisco and many others succumbed to the reality that the identity of a single super-user account can be subverted for the purposes of manipulating sensitive organizational data, correspondence, commercial goods and intellectual property.

Home IoT is still reaching for mainstream use. The main backer of Z-Wave, a widely used in-home networking standard, just did something that might help take it there. On Wednesday, chip vendor Sigma Designs made the interoperability layer of Z-Wave available free to the public. This is the code that allows all Z-Wave products to work together. Now anyone can download the code, develop software with it, and give that code to others.

The debate over when and how containers will be deployed inside the enterprise is just now starting to get fired up.

Artificial intelligence chatbots aren't the norm yet, but within the next five years, there's a good chance the sales person emailing you won't be a person at all.

With an eye on the need for more scalable, real-time analytics, SAP today unveiled SAP BW/4HANA, its next-generation data warehouse product for the real-time digital enterprise. BW/4HANA will support on-premises deployments, but will also be available on Amazon Web Services (AWS) and SAP HANA Enterprise Cloud (HEC), says Neil McGovern, senior director, Product Marketing, SAP Data Warehousing. "We're trying to give our customers some options," he says. "AWS is infrastructure as a service (IaaS) — you can provision very quickly and inexpensively. With HEC, it's far more of a complete, turnkey solution — more platform as a service (PaaS)." SAP's next-generation data warehouse service, SAP BW/4HANA, provides interactivity with historical and live data whether that data lives inside or outside the enterprise. (Click for larger image.)

This entry was posted in News and tagged , , , , , , , , , , , . Bookmark the permalink.