DEV Community

loading...

How much data do I need to have Big Data?

Luke Westby
Elm core contributor. Creator and former maintainer of ellie-app.com. elm-conf organizing team alumnus.
・1 min read

Discussion (4)

Collapse
pbouillon profile image
Pierre Bouillon

I may be mistaking but I think that you can say you are doing 'Big Data' whenever you have to process more data than your computer can handle.

When I did some researched on it I was amused to see that, for example, a huge CSV file to treat could be considered 'Big Data' if you are working on a veeeery old computer 😄

Collapse
lukewestby profile image
Luke Westby Author • Edited

Okay interesting, so big is relative to your ability to process it. Does that refer specifically to compute instances from cloud providers, or could it be, like, a laptop?

Collapse
pbouillon profile image
Pierre Bouillon

Yes ! I found it quiet funny

And yes, from what I understand it depends on where you are performing your treatment. So, if you're using the cloud then it's relative to the servers you're using

Collapse
hakkikonu profile image
Hakkı Konu

Big data can be described by the following characteristics

Volume
The quantity of generated and stored data. The size of the data determines the value and potential insight, and whether it can be considered big data or not.

Variety
The type and nature of the data. This helps people who analyze it to effectively use the resulting insight. Big data draws from text, images, audio, video; plus it completes missing pieces through data fusion.

Velocity
In this context, the speed at which the data is generated and processed to meet the demands and challenges that lie in the path of growth and development. Big data is often available in real-time. Compared to small data, big data are produced more continually. Two kinds of velocity related to big data are the frequency of generation and the frequency of handling, recording, and publishing.

Veracity
It is the extended definition for big data, which refers to the data quality and the data value. The data quality of captured data can vary greatly, affecting the accurate analysis.

Data must be processed with advanced tools (analytics and algorithms) to reveal meaningful information. For example, to manage a factory one must consider both visible and invisible issues with various components. Information generation algorithms must detect and address invisible issues such as machine degradation, component wear, etc. on the factory floor.

Source: wikipedia