Big Data News – 19 Oct 2016

Today's Infographic Link: Learn the History of Infographics

Featured Article
Data science may be the en vogue profession descriptor for what many who work with statistics and statistical modeling do, but being a data scientist in this era requires a unique set of skills and experiences. See why data science may not be a crystal ball capable of predicting all events, but how the data science experience enables these professionals to tackle building predictive models in the creative ways of their choosing.

Top Stories
Big Data is the term used for larger data sets that are very complex and not easily processed by the traditional devices. Today is the need of the new technology for processing these large data sets. Apache Ha…

Data science seems to be experiencing a renaissance when it comes to advanced open source tools. Get a glimpse into creative application development with IPython Notebooks, Jupyter Notebooks, Apache Spark, the PixieDust open source library and more at IBM Insight at World of Watson 2016.

We recently hosted a webinar on the topic of  HDF 2.0 and the integration between Apache NiFi, Apache Ambari and Apache Ranger.  We thought we would share the questions & answers from the webinar, and also compile relevant data into a single place to make it easy to find and reference. Highlights of integrating Apache…

IBM Insight at World of Watson 2016 offers you opportunities to explore solutions to your most challenging problems, connect with data engineers and data scientists from other organizations and find out what's new in streaming analytics. And in anticipation of the event, check out this overview to get to know several organizations that have built highly scalable streaming analytics that have had tremendous impact on their business.

It's no secret that there is a data explosion. A recent IDC analyst report from April 2014 indicated the volume of data, known as the digital universe, is doubling in size every two years. And by 2020, there will be as many digital bits as there are stars in the universe. There are many reasons…

Attend IBM Insight at World of Watson 2016 to explore the ways in which predictive analytics is helping modern businesses gain and retain the competitive advantage in their industries. To learn more, review these highlights from a CrowdChat in which industry experts discussed the central role of predictive analytics in back-end business processes as well as customer-facing initiatives.

IBM extended Big SQL, which was formerly exclusive to the IBM Open Platform (IOP), to the Hortonworks Data Platform (HDP) in September 2016. I recently spoke with Berni Schiefer, an IBM fellow in the IBM Analytics group, to learn more about the offering and the ongoing IBM focus on SQL.

Critical data studies explore the unique cultural, ethical, and critical challenges posed by Big Data. Rather than treat Big Data as only scientifically empirical and therefore largely neutral phenomena, CDS advocates the view that Big Data should be seen as always-already constituted within wider data assemblages. Assemblages is a concept that helps capture the multitude of ways that already-composed data structures inflect and interact with society, its organization and functioning, and the resulting impact on individuals’ daily lives. Critical data studies question many assumptions about Big Data that permeate contemporary literature on information and society by locating instances where Big Data may be naively taken to denote objective and transparent informational entities. In this introduction to the Big Data & Society critical data studies special theme, we briefly describe critical data studies work, its orientations, and principles.

Guest author: Jeff Kelly, Data Strategist, Pivotal The phrase "digital transformation" gets bandied about a lot these days, but what exactly does it mean? When you strip away the hyperbole, I believe digital transformation is the process by which enterprises evolve from using traditional information technology to merely support existing business models to adopting modern…

In our data-rich society, corporations of all types and sizes recognize the importance of utilizing information to understand their past and shape their future. Many organizations, however, ultimately fail to reap the powerful benefits of analytics as a business tool because, they believe, their analysts have failed to deliver valuable, pertinent insights from the data…

Historical application of vector mathematics and the study of unstructured text data can be an important approach to understanding and actualizing the value of data. See how mathematical exploration of text data can unearth insight that translates into enhanced decision making.

The movie Deepwater Horizon that depicts the oil spill disaster of the same name serves as an example of how government agencies and corporations need to collect a lot of data and disseminate information immediately as events quickly unfold. Not only are all parties involved asked for a tremendous amount of information over long time periods, but they need to ensure a reliable system of data governance to help ensure they know the information they are providing.

How can IBM evaluate the user experience of the IBM Data Science Experience (DSX) tool? How about using DSX itself. See how a dog-food approach was used to collect and assess DSX user experience data.

Provenance, Lineage & Chain of Custody The models of Provenance, Lineage and Chain of Custody are used in fine art to determine when a piece was created, the sequence of locations where it was held, how it was touched along the way, and who has owned it since creation, all with the purpose of authenticating the piece….

IBM Insight at World of Watson 2016, 24–27 October 2016, at Mandalay Bay in Las Vegas, Nevada, is the only place to be for people who work with data. Take a look at this list of top-ten reasons you wont' want to miss out on one of the most intriguing and innovative events of the year.

Advances in tools and the capability to work with cloud-based data sets are dramatically changing the nature of data science workloads. Take a look at one data scientist's quest to learn more about performing data science analysis in the cloud.

How can we ensure that metadata about all types of data is accurate, available, ubiquitous and universally accessible? Standards are certainly necessary, but we also need a new way to think about how metadata is created, managed and maintained.

The first post in this three part series on Digital Foundations introduces the concept of Customer 360 or Single View of Customer (SVC). We will discuss the need for & the definition of the SVC as part of the first step in any Digital Transformation endeavor. We will also discuss specific benefits from both a…

Data Science is not just about programming and math; the field also heavily relies on intellectual curiosity, creativity, and exploration. By creating and combing through variables from different types of data sets, data scientists can reveal new and interesting relationships, in addition to different avenues for future research. On Friday, November 4 at 12:00 pm EST, the MS in Data Analytics online degree program at the CUNY School of Professional Studies will a host a webinar that provides insights to the important and various applications of Creative Statistics. This webinar will review how empirical evidence supporting a new theory about the relationship between organizational values and industry automation was discovered, as well as demonstrate how the public availability of diverse types of data expands opportunities for innovative ideas.

Specific peer networking opportunities in the cognitive space for the developer community take shape at IBM Insight at World of Watson 2016. Check out some anticipated highlights.

Panel Discussion: How Behavioral Analytics is Redefining Engagement.




Industry leaders like to use the term "culture" to demonstrate the uniqueness of a business and to point out how, while products may be superficially imitated, can never be replicated by competitors because of this "culture" thing. What defines a culture is complex and we are not going to get into that here. But I…

People often think about cloud architecture in simplistic terms: you're either public, private, or hybrid. (In fact, there's even confusion about the meaning of the term "hybrid" itself–this video helps clear it up: In the real world, of course, virtually every implementation is hybrid–no company puts 100% of its IT environment into one single cloud…

Nancy Hensley, director of offering management for IBM Analytics speaks with Rob Thomas, vice president of development for analytics, at IBM, on the subject of business transformation, leading to a discussion of the data maturity curve.

Join our practical Statistica webcast: "What's New in Statistica 13.2?" Are more and more employees across your organization being asked to make data-driven decisions on a regular basis while retaining access to the dynamic, worldwide analytics marketplace of ideas? Join Statistica GM John Thompson at our October 18 webcast as he describes how the new tools of Statistica 13.2 respond to this trend. He will also explain how Statistica's role in the broader analytics community continues to evolve from a solution ideal for statisticians and mathematicians, to one accessible to all users, including both traditional and citizen data scientists.

The 100% open source and community driven innovation of Apache Hive 2.0 and LLAP (Long Last and Process) truly brings agile analytics to the next level. It enables customers to perform sub-second interactive queries without the need for additional SQL-based analytical tools, enabling rapid analytical iterations and providing significant time-to-value. TRY HIVE LLAP TODAY Read about…

White Paper with 2015 survey results available now. The field of data science has evolved in the past decade at a breathtaking pace. The potential applications and sources for rich datasets have expanded as has the attention the field of data science has received in the popular press. In 2015, Rexer Analytics fielded our…

  Teradata is a Leader in Big Data Hadoop Optimized Systems Analyst Report Pervasive technology and operational challenges in deploying Hadoop platforms slow down the speed of implementations. Hadoop optimized systems offer key benefits including faster time-to-value, reduced costs, minimized administration efforts and modular expansion in enterprise hadoop deployments.  Teradata Appliance for Hadoop comes integrated with latest versions of Hadoop from Cloudera or Hortonworks and value-added software from Teradata and is delivered ready to run to accelerate the hadoop deployments.

The Internet of Things (IoT) is making inroads in all areas of life, expanding the opportunities for each new generation of global citizens. What's more, cloud technology is bringing IoT capabilities to people around the world, bringing them together in a new world of data.

In a world overrun with competing local, national and global regulations touching every aspect of finance, compliance can seem elusive–even unattainable. Yet, modern financial institutions have the ability to not only survive but also thrive amid an ever-changing regulatory landscape. Discover how cognitive computing capabilities can free members of compliance teams to drive business transformation instead of forever playing catch-up with the latest rules and regulations.

Last week, we had a jam-packed webinar on Hortonworks DataFlow, with over 700 registrants and so we were unable to get back to everyone to answer their questions. We've grouped the questions (and answers) below into the following categories, and  if you have more questions, anytime, we encourage you to check out the Data Ingestion…

It has been another exciting week on Hortonworks Community Connection HCC. We continue to see great activity and recommend the following assets from last week. Top Articles from HCC One Way Trust – MIT KDC to Active Directory by:emaxwell One Way Trust – MIT KDC to Active Directory Many security environments have strict policies on…

Predictive analytics encompasses a powerful set of methods that uses all the available data an organization can gather to answer key business questions. By enabling financial institutions to make data-driven business decisions, predictive analytics helps drive profit and increase efficiency. In the rapidly changing banking industry, customer insights can also fuel initiatives to improve customer…

The benefits and pitfalls of different cloud deployment architectures can be intense topics of conversations spanning a wide range of industries. Some discussions foster new approaches to consider, and others make strong arguments as to which approach is well suited for different, industry-specific technologies and use cases. Get an overview of three well-known cloud computing categories that all organizations need to consider.

If you work in the healthcare or life sciences fields, then check out these five key reasons you won't want to miss attending IBM Insight at World of Watson 2016.

Although formerly exclusive to the IBM Hadoop Platform, the extension of Big SQL to the Hortonworks Data Platform (HDP) meets the challenge of complex data warehousing queries on Hadoop. See what Paul Yip, worldwide product strategy for Hadoop and Spark at IBM, has to say about what this transition means for industry in an interview with Adrea Braida, data and analytics communications and product marketing at IBM.




Apache Hive(™) is the most complete SQL on Hadoop system, supporting comprehensive SQL, a sophisticated cost-based optimizer, ACID transactions and fine-grained dynamic security. Though Hive has proven itself on multi-petabyte datasets spanning thousands of nodes many interesting use cases demand more interactive performance on smaller datasets, requiring a shift to in-memory. Hive 2 marks the…

 Original post in HCC I had a few hours in the morning before the Strata+ Hadoop World conference schedule kicked in, so I decided to write a little HDF 2.0 flow to grab all the tweets about the Strata Hadoop conference. First up, I used GetTwitter to read tweets and filtered on these terms: strata,…

Today's corporations collect vast quantities of data on a daily basis. With every credit card swipe and completed survey, organizations can capture critical customer information. However, the raw data alone does not generate the insights needed to drive business decisions. It's the proper analysis of this data that unlocks its true value.  In the next decade, businesses of all sizes will need savvy analysts who can predict trends and prescribe innovative market solutions. 

The Financial regulators are driving a Data Evolution Traditionally technology moves fast, regulators react slow. When technology leaps forward, it enables financial firms to change the nature of their business – often into un-regulated territory; Regulators react to pass regulation to catch up. This model can work in slow moving markets, but in todays interconnected…

Use data to drive the future of business–and your career. Follow our faculty's lead into the high-demand field of analytics. The shortage of data analysts in the U.S. represents a major career opportunity for you. Companies in all industry sectors are searching for knowledgeable, experienced analytics leaders who can help them succeed in the age of Big Data. Northeastern helps you to meet this growing market need–and expand your career prospects–by offering Graduate Programs in Analytics. Advance your knowledge through a breadth of master's degrees or our graduate certificate program–all led by faculty practitioners and innovators from across the analytics industry. Gain deeper insight and in-roads to the industry, through hands-on work experience, networking opportunities, and our vast network of accomplished alumni. Emerge from the program with a portfolio of real-world projects that proves to employers–you're ready to hit the ground running. 

Artificial Intelligence(AI) continues to be the next great topic of debate. In fact, Microsoft, Amazon, IBM, Google and Facebook announced on Thursday,Sept.29 the formation of the Partnership on Artificial Intelligence to Benefit People and Society. Within the predictive analytics discipline, though, we tend to use the term "machine learning" as our reference point for artificial…

The building blocks approach can reveal the secrets behind incentive compensation frameworks that actually work. This installment of a multipart blog series looks at defining the performance measures for outcomes that trigger an incentive payment.

Visualizing Time Data Read the whitepaper The Best Ways to Visualize Time Data For time-based data, the right chart is the one that reveals the most important insights for the audience at hand. Trying something beyond the line chart may reveal hidden insights, unknown unknowns, or surface multiple truths in a single data set. Read "Visualizing Time: Beyond the Line Chart" to learn three innovative ways to analyze time series data. " The Slope Chart: From Start to Finish " Moving data to the cloud gets closer to copy/paste. " The Highlight Table: Finding patterns in color Get this free whitepaper to explore different ways to analyze time series data. Is the line chart always the best way to visualize time?

IBM Insight at World of Watson 2016 affords a great opportunity for you to gain insight into the CDO, CIO and CTO roles. Here are some highlights from a recent CrowdChat involving industry influencers discussing the challenges and opportunities that these C-level executives face in the cognitive era.

Take a look at highlights from the IBM Chief Data Officer Strategy Summit Fall 2016 in Boston, Massachusetts, in a collection that includes a full social recap, videos, quotes and more.

Database migration is not any database administrator's idea of fun–not even close. By far, the database migration status quo can be the least interesting and most dreaded part of the job. Check out an advanced self-service, ground-to-cloud database migration offering for handling database migration quickly, easily and securely.

Unstructured information continues to grow as never before, with a preponderance of it being stored and managed by large enterprises. Much of this data is not really well understood, and its management and governance remain a formidable challenge. See how creating trusted and collaborative environments help with effective information configuration, governance and management.

New GPU instance type offers the most processing power available in the cloud for artificial intelligence, high-performance computing and big data processing New Amazon Machine Image (AMI) comes pre-installed with frameworks that help customers reduce model training time from weeks to hours September 30, 2016 02:35 AM Eastern Daylight Time SEATTLE–Amazon Web Services, Inc. (AWS), an Amazon.com company (NASDAQ:AMZN), today announced the availability of P2 instances, a new GPU instance type for Amazon Elastic Compute Cloud (Amazon EC2) designed for compute-intensive applications that require massive parallel floating point performance, including artificial intelligence, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, genomics, and rendering. With up to 16 NVIDIA Tesla K80 GPUs, P2 instances are the most powerful GPU instances available in the cloud.

The future is the ultimate unknown. It's everything that hasn't happened yet. Prediction as a capability is booming. It reinvents industries and runs the world. More and more, predictive analytics drives commerce, manufacturing, healthcare, government, and law enforcement. In these spheres, organizations operate more effectively by way of predicting behavior–i.e., the outcome for each…

By: Sean Robinson, Program Chair, Predictive Analytics World for Government In anticipation of his upcoming conference keynote presentation, 21st Century Data-Driven Environmental Protection at Predictive Analytics World for Government, October 17-20, 2016, we asked Robin Thottungal, Chief Data Scientist/Director of Analytics at the U.S. Environmental Protection Agency (EPA), a few questions about his work in predictive analytics. Q: How would you characterize your agency's current and/or planned use of predictive analytics?

Data is the world's most potent, flourishing unnatural resource. Accumulated in large part as the by-product of routine tasks, it is the unsalted, flavorless residue deposited en masse as organizations churn away. Surprise! This heap of refuse is inherently predictive. Thus begins a gold rush to dig up insightful gems. Does crime increase after…

This article is excerpted from Eric Siegel's foreword to the recently released book, Mining Your Own Business: A Primer for Executives on Understanding and Employing Data Mining and Predictive Analytics, by Jeff Deal and Gerhard Pilcher. Those two authors head up the programming of two of the event series Siegel founded, Predictive Analytics World:…

In anticipation of his upcoming conference keynote presentation, 21st Century Data-Driven Environmental Protection at Predictive Analytics World for Government, October 17-20, 2016, we asked Robin Thottungal, Chief Data Scientist/Director of Analytics at the U.S. Environmental Protection Agency (EPA), a few questions about his work in predictive analytics. Q: How would you characterize your agency's current…

It's no secret that there's a shortage of data scientists in America's workforce. Many companies look to hire overseas to help ease the domestic talent shortfall (in fact, one in three data scientists are born outside the U.S.) so understanding the ins and outs of visas is rapidly becoming a business necessity. Not all visas…

Through 2020, spending on self-service data tools will grow 2.5 times faster than spending on traditional data tools.

This entry was posted in News and tagged , , , , , , , , . Bookmark the permalink.