Big Data Definition – What is Big Data?

big datadata structuresdefinition

Is there one?

All the definitions I can find describe the size, complexity / variety or velocity of the data.

Wikipedia's definition is the only one I've found with an actual number

Big data sizes are a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data in a single data set.

However, this seemingly contradicts the MIKE2.0 definition, referenced in the next paragraph, which indicates that "big" data can be small and that 100,000 sensors on an aircraft creating only 3GB of data could be considered big.

IBM despite saying that:

Big data is more simply than a matter of size.

have emphasised size in their definition.

O'Reilly has stressed "volume, velocity and variety" as well. Though explained well, and in more depth, the definition seems to be a re-hash of the others – or vice-versa of course.

I think that a Computer Weekly article title sums up a number of articles fairly well "What is big data and how can it be used to gain competitive advantage".

But ZDNet wins with the following from 2012:

“Big Data” is a catch phrase that has been bubbling up from the high
performance computing niche of the IT market… If one sits through
the presentations from ten suppliers of technology, fifteen or so
different definitions are likely to come forward. Each definition, of
course, tends to support the need for that supplier’s products and
services. Imagine that.

Basically "big data" is "big" in some way shape or form.

What is "big"? Is it quantifiable at the current time?

If "big" is unquantifiable is there a definition that does not rely solely on generalities?

Best Answer

There isn't one; it's a buzzword.

The delineator though is that your data is beyond the capabilities of traditional systems. The data is too large to store on the largest disk, the queries take tons too long without special optimization, the network or disk can't support the incoming traffic flow, a plain old dataview isn't going to handle visualization for the shape/size/breadth of data...

Basically, that your data is beyond some ill-defined tipping point where "just add more hardware" isn't going to cut it.

Related Topic