This past week Ian spoke at the O’Reilly Strata conference in Santa Clara. He spoke on the topic of Big Data = Bigger Metadata. Slides are embedded below, but it was the kind of talk that does much better in person (we’ll hunt around to see if there’s an archived version). A few highlights from the conference:
- There’s incredible froth around Big Data. It isn’t completely clear what the term means, but it is certainly early on and a big tent approach–solutions by industry and products were mostly absent, reflecting the sort of data-euphoria.
- There is a big difference between Big Data and what we call Important Data. It is not an either/or argument. The value of developing schema with highly structured data cannot be understated.
- Big Data is not new. Teradata, Wall Street, Walmart and others have been dealing with large transaction levels for decades. In the rush to embrace the future, some forget the past.
- Big Data isn’t actually about data. It’s about systems and processes to find the info-ingots
- One person’s metadata is another person’s data – We’re living in a derivative world. Data exhaust can be the basis for a value-added product currently unknown to your business.
- In a Big Data era, innumeracy, or statistical illiteracy, is going to be a big problem and responsible for a great deal of cost and wasted energy. This goes to a fundamental mis- (or no-) understanding about statistics and how to employ them in making business decisions.
- Compute performance will continue to further improve, driving down costs.