Big Data and Analytics at Work

Big data and Analytics is an emerging field. It offers insights into the hidden aspects of hard problems that do not lend themselves to conventional analysis. Having said that we see some key emerging trends.

  • Data is fast growing
  • Industrial data will grow faster than any other big data segment as more and more devices become connected
  • There’s a need for a change in the software architecture to support storage, computing and rendering of industrial strength big data
  • To create high value, advanced and real time analytics is required

At GE we keep pace with these trends via the Industrial Internet, a highly connected ecosystem of intelligent machines, advanced analytics and people at work. As Bill Ruh, Vice President, GE Software Center says, “A single cross-country flight in the United States has the capacity to generate terabytes of big data. That data has the capability to unleash a productivity revolution. We want to deliver software services across all our machines at Silicon Valley speed.”


The power of Industrial Internet is to have systems designed to advance industry and improve lives. These systems and solutions put data to work for GE customers, giving businesses insights that ensure far more reliability and efficiency to achieve zero unplanned downtime. A 1% efficiency over 15 years in- Increasing Freight Utilization Rail is $27B industry value by reducing system inefficiency, Predictive Maintenance Healthcare is a $63B opportunity industry that reduces process inefficiency, Predictive Diagnostics Power is valued as a $66B industry with efficiency improvements in gas-fired power plant fleets.

At the Big Data Analytics Symposium “Big Data is Getting Bigger” held recently at GE’s John F Welch Technology Centre, expert speakers from 24×7, Microsoft, Flipkart, eBay,, SAP, and PESIT provided insights and trends in Big Data Technology and Analytics.

Reghunath from Flipkart spoke on India’s very own Big Data project – Aadhaar (Unique Identification for Indian citizens), how the solution has been architected, deployed and managed. An interesting point mentioned was how security is enhanced based on usage patterns – like suspicious behavior by agents gathering personal details of citizens can be immediately identified through analytics, minimizing incorrect generation of citizen identification with false attributes.

Prateek from Microsoft spoke on “Multi-Label Learning with Millions of Labels: Recommending Advertiser Bid Phrases for Web Pages.” Recommending phrases from web pages for advertisers to bid on against search engine queries is an important research problem with direct commercial impact. The proposed method demonstrates that it is possible to efficiently predict the relevant subset of queries from a large set of monetizable ones by posing the problem as a multi-label learning task with each query being represented by a separate label.

The co-founder Viral Shah of the language “Julia” spoke about how contributions in open-source have been fuelling growth in the space of analytics. Julia was compared and contrasted with more popular analytics tools. It’s development distills in itself the goodness of Python, the look alike syntax of Matlab and R and above all, the ease of hooking into native function calls from C, Python, Matlab, making it easy and more acceptable as users adopt Julia.

The Data versus Domain panel discussion by experts from industry and academia was very thought provoking and represented two schools of thought data analytics vs. domain/physics based analytics. The panel concluded with recommendations to consider data and domain (hybrid) analytics as the way to solve some of the more challenging problems in the industrial business