GE has been working with Pivotal to better understand this problem. (It’s worth noting that GE owns a 10 percent stake in the EMC/VMware spinoff). The two companies are working together to build what they are calling a “data lake.” This is a more flexible approach to large data sets than a data warehouse, which they point out was designed a decade ago with ERP and CRM data in mind. The quantity of data today is so much greater that it requires a much more flexible architecture to accommodate it.
One of the first areas where GE is testing this technology is in its jet engine division where it estimates each engine can generate 1TB of data of data from a single flight. Multiply that by many flights per day and you are facing monumental amounts of data from just one industrial device.
Ruh claims that using data lake software cuts the time they can begin working with the data from days to minutes. How much did it improve the process? According to GE, they whittled down a data warehousing approach that took 30 days to ingest, structure, integrate and process and brought it down to 20 minutes with the data lake approach. Yes, you read it correctly, 20 minutes.
To that end they combined it with technology they developed with the consulting firm Accentureto build a tool called Taleris, which they claim can actually accurately predict part failure. The trouble was that they needed a serious amount of data for the prediction platform to do its job. The data lake developed with Pivotal gives them that.
Read the full story here: http://techcrunch.com/2014/08/10/big-data-bound-to-get-really-really-big-with-the-internet-of-things/?ncid=tcdaily