Big Data News – 03 Mar 2017

Featured Article
Joseph Goebbels said “If you tell a lie big enough and keep repeating it, people will eventually come to believe it.” In the era of big data, however, numerous smaller lies, guided by machine learning, may be more effective than a few big lies.

Top Stories
DataRefuge is a public collaborative, grassroots effort around the United States in which scientists, researchers, computer scientists, librarians and other volunteers are working to download, save, and re-upload government data. The DataRefuge Project, which is led by the UPenn Program in Environmental Humanities and the Penn Libraries group at University of Pennsylvania, aims to foster resilience in an era of anthropogenic global climate change and raise awareness of how social and political events affect transparency.  

In anticipation of her upcoming conference presentation, Our Success with Agile Analytics at Predictive Analytics World for Business Chicago, June 19-22, 2017, we asked Afsheen Alam, Program Manager Marketing Analytics and Big Data at Allstate Insurance, a few questions about her work in predictive analytics. Q: In your work with predictive analytics, what behavior or outcome do your models predict?

The Data Incubator, a data science fellowship program, is currently running a Data Science in 30 minutes webinar series. Next week features a free webinar with Dr. Becky Tucker of Netflix. Dr. Tucker is a Senior Data Scientist at Netflix where she specializes in predictive modeling for content demand (think what do people want to watch). The full abstract of the webinar is below. The webinar is free.

The Machine research project from HPE may usher in a new era for IT and big data processing. This guide covers what business professionals need to know about The Machine.

Taken together, however, several changes at the FCC show how quickly and decisively the new administration is setting a new course.

Amazon Web Services said today its outage earlier this week that affected major websites and apps was caused by human error. Sites including Netflix, Reddit and the Associated Press struggled for hours on Tuesday — all because of a simple typo. "While we are proud of our long track record of availability with Amazon S3, we know how critical this service is to our customers, their applications and end users, and their businesses," the company wrote in an online message. "We will do everything we can to learn from this event and use it to improve our availability even further."

One of the reasons cybercriminals have the advantage is because the incentives between the attackers and the defenders are mismatched.

A large majority of security operations centers, SOCs, have not attained the requisite level of maturity to adequately protect against cyberattacks.

I haven't been admitted to hospital many times in my life, but every time the only thing I really cared about was: when am I going to get out? It's also a question that weighs heavily on hospital managers: by knowing ahead of time how long each patient's stay is likely to be, they can better manage facilities and staff, and know whether the hospital is likely to reach maximum capacity in the near future. To help hospital administrators better predict how long patients are likely to stay, Microsoft has published the Predicting Length of Stay in Hospitals solution on the Cortana Intelligence Gallery. Clicking on "deploy" creates an instance of the Data Science Virtual Machine with simulated patient data in SQL Server, and a model implemented with R Services to predict the length of stay. The predictions are then presented as a Power BI dashboard to a Care Line Manager or a Chief Medical Information Officer as shown below. (Click the Try It Now button on this page to interact with the dashboard.)

A few weeks ago I opined, as is my want, about what I saw happening in the technology space with regard to Platform as a Service (PaaS). As I saw it, PaaS was pretty much a concept that had been superseded by newer approaches to application and infrastructure creation and management. You'd be forgiven for thinking that two of the best known PaaS offerings, Cloud Foundry and OpenShift, would be pretty antsy about such a bold claim. Surprisingly, that doesn't seem to be the case. In fact, Abby Kearns, the chief executive of the Cloud Foundry Foundation, asked to jump on a call with me to opine exactly why she, who runs a foundation which itself shepherds an initiative that most concur is PaaS, agrees wholeheartedly with my view.

There is a growing need for versatile, hybrid architectures that can combine the best of both data warehousing and big data analytics. The cloud is the perfect solution, because it makes it easier to build a robust data warehouse as a central "hub", and then add other environments that can be scaled up or down to meet the specific needs of different datasets. Nevertheless, it is important to think carefully about the design of the entire hybrid architecture, and avoid a number of common pitfalls. We spoke to Jim Kobielus at IBM for his top tips on strategizing and optimizing cloud data warehouses.

Why would you want to use blockchain to build a database solution? And how would you actually do that? BigchainDB has answers.

Hewlett Packard Enterprise has revamped its existing technology services unit to focus on helping customers adopt emerging technologies, including cloud computing, the internet of things, and big data. HPE's new Pointnext technology services division, announced Thursday, is designed to help businesses speed up their adoption of several technologies, also including hybrid IT services and analytics, the company said. HPE announced the rebranded services unit with an "unboxing" video.


Today's European financial markets hardly resemble the ones from 15 years ago. The high speed of electronic trading, explosion in trading volumes, the diverse range of instruments classes & a proliferation of trading venues pose massive challenges. With all this complexity, market abuse patterns have also become egregious. Banks are now shelling out millions of euros in…

A CSPi Packet Recorder and Packet Broker allow organizations to monitor who is accessing specific types of data.

Next week (March 6 – 9) Gartner will host their annual Data and Analytics Summit in Grapevine, TX. This is where analysts from Gartner, vendors and many leaders of businesses of all sizes all get together and talk about data and analytics. Personally, I have not attended the conference for the past few years, but…

Remember life before GPS? Having to purchase a paper map and pore over it to figure out how to get from A to B? Not being able to instantly see nearby service stations and coffee shops? It's hard to imagine how we ever managed to find our way around. A few years from now, businesses will be asking, "Remember business before geospatial analytics? When we had to pore over Excel charts without real-time location data, and try and glean insight from numbers on a sheet?'' Because when you add location information to business information, you get an unbeatable competitive edge. First, let's take a step back and define terms. Geospatial analysis involves gathering, displaying, and manipulating geographic information system (GIS) data such as imagery, GPS, satellite photographs, historical info, and so on. It uses geographic coordinates (latitudes and longitudes), and also street addresses, postal codes, and other identifiers, to create geographical models.

Data science platforms are engines for creating machine-learning solutions. This report evaluates 16 providers of data science platforms.

While Dell sold off many software assets as part of the 2016 mega merger with EMC that created Dell Technologies, it kept Dell Boomi as a critical component for helping IT shops build and run hybrid clouds. Dell Boomi offers an integration-platform-as-a-service (iPaaS) — a set of cloud-based capabilities for connecting everything from SaaS apps to EDI and internet of things applications.

Peter Voss is CTO and Andrew Brust is senior director of market strategy at Datameer. Big data shouldn't be an area for only academics, data scientists, and other specialists. In fact, it can't be. If we want big data to benefit industry at large, it needs to be accessible by mainstream information workers. Big data technology must fit into the workflows, habits, skill sets, and requirements of business users across enterprises. Datameer is a big data analytics application doing exactly that. Combining the user interface metaphors of a file browser and a spreadsheet, Datameer runs natively on open source big data technologies like Hadoop and Spark, while hiding their complexity and facilitating their use in enterprise IT environments and business user scenarios.

While representation of women and minorities at last year's useR! conference was the highest it's ever been, there is always room for more diversity. To encourage more underrepresented individuals to attend, the useR! committee has taken several steps, including asking attendees to adhere to a supportive code of conduct and by providing childcare at the conference venue in Brussels. The R Forwards taskforce is also offering diversity scholarships to under-represented individuals (such as, but not limited to, LGBTQ people, women, ethnic minorities, or those with disabilities) who might not otherwise be able to attend. If you qualify and think a scholarship might help you get to useR!2017, the deadline for applications is April 1, at the link below. user!2017: Diversity Scholarships

Strata + Hadoop World 2017 is happening in just a couple of weeks, March 13-16 in San Jose–so make your plans soon. The biggest gathering of data scientists in the world is the must-attend event of the year for most, as previous attendees can attest: "If you're serious about the business value of data, never miss a Strata Hadoop conference–the mecca of all things data." "The one event I look forward to each year–with the best mix of hands-on experience, case studies, and opportunities to test drive the latest products with experts & business users." Whatever you want to learn about data, you'll find it at Strata + Hadoop World.

Big Data for Cybersecurity Modern information security encompass broader data sets than in the past, in order to create context and generate a complete picture of network data, user behaviour pattern and business data – all combined together so that a trendline of normal operations can be created. Then from that, it is possible to…

This entry was posted in News and tagged , , , , , . Bookmark the permalink.