Illuminating the Importance of Big Data

Electricity—it’s something we usually take for granted. Whether it’s keeping our food fresh or keeping us cool in the heat, we rarely think about it. Until it goes out, that is.

You may have heard or felt the effects of the recent power outages along the East Coast, coupled with the intense heat wave. If you felt the sweltering effects, I’m sure you have a renewed appreciation for electricity.  Just as most of us don’t think about electricity every day, most of us don’t spend our days thinking about Big Data. Since I spend my days thinking about it, I can tell you that just as electricity makes the world function—so does Big Data!

As there are many great resources that describe what Big Data refers to in general, such as this article by Edd Dumbill from O’Reilly Strata, I’ll take a moment to describe what Big Data means to GE.

For GE, Big Data largely refers to Industrial Big Data—time series data from industrial equipment. This can include both monitoring high velocity streaming time series data from globally distributed sensors sending many millions of data points per second, to being able to store and analyze massive volumes (100’s of terabytes or more) of historical time series data for knowledge discovery and pattern mining. To put it simply, the reason why Industrial Big Data is so important is that it helps to keep our lights on.

GE certainly has a wide variety of Big Data, as well, from medical images and electronic medical records in Healthcare to Industrial Big Data from all sorts of equipment, including gas turbines generating electricity, CT and MRI scanners in a hospital, to water purification equipment and locomotives!

GE Energy’s Thermal Remote Monitoring & Diagnostics (RM&D) Center is responsible for monitoring roughly 1,500 electricity-generating gas and steam turbines worldwide. Being able to monitor and analyze sensor data coming from those units in real-time allows GE to identify anomalies before they lead to unplanned shutdowns of equipment, which could lead to a power outage. And being able to store and mine massive volumes of historical data helps them find patterns that led to equipment faults in the past.  These patterns can then be operationalized against the streaming data, to prevent similar faults in the future.

I’m currently leading a team of big data researchers at GE’s Global Research Center.  We’re researching technologies and solutions to address both aspects of the Industrial Big Data challenge – how to manage and analyze high velocity big data in real-time or near real-time, and how to store and analyze massive volumes of historical time series data.

Technologies we’re investigating include complex event processing engines, in-memory data systems and scalable time-series data stores.  We’ve been evaluating existing commercial and open source technologies, and have been developing new solutions to address challenges that haven’t been effectively addressed today.  While most Big Data solutions have focused on unstructured data, time series data is well structured, but not relational.

On Thursday, July 26, GE is hosting a FREE webinar where I’ll be presenting with a few folks from GE Intelligent Platforms and GE Energy. We’ll be presenting a brief overview on what we’ve been doing in the Industrial Big Data space and how it is benefiting GE Energy, which helps to keep everyone’s lights on… and air conditioning running! We’ll also be sharing with you how critical insights enabled by big data can significantly improve a company’s operational performance.

If you find this interesting and exciting, be sure to register.  Feel free to ask any questions below and we’ll try to answer them live!

Kareem


1 Comment

  1. Hari Pandalai

    Kareem,

    This a link from yesterdays Fireside Chat at TechCrunch Disrupt in SF with Vinod Khosla http://techcrunch.com/2012/09/12/vinod-khosla-y-combinator/. At around 12 minutes and thereafter he talks about breath analyzer thru chips, EKG iPhone app that does 80 % of what an real EKG machine does at a fraction of the cost. He goes on to say that in next 20 to 25 years 80 % of Doctors will be replaced. Its a fascinating talk. He also mentions how machine learning will replace 80 % of the Doctors. Vinod says if Google Cars can be driven autonomously then healthcare can be solved thru machine learning etc. He also mentions IBM Watson. I am not sure why GE is not doing this using machine learning to solve the health care challenge. This is a brilliant talk.

    Regards,
    Hari