Well, there are two buzz words in our days in science and technology, big data is among the most prominent and most considerably used in the field towards transforming the society. The idea is fairly simple, and well rounded, machines using colossal amount of data will be able to transform "stored" information to wealth and intelligence and in theory better interpreting them than humans. The basic assumption is that data is available and then the problem is solved.
Such assumption I think it could be one of the greatest misapprehension of this decade. Data was and is around for almost three decades now, one can cite for example stock market, where historic data are available ages. What have changed between the past decade and this one? Well three simple things: (i) computing power has extraordinary progressed while its cost has dropped down to peanuts, (ii) convergence between computer science and mathematics towards a field called machine learning where statistics, artificial intelligence, optimization and programming principles are met together, (iii) easy access to this data either in a centralized or distributed manner. The above once in conjunction with training (usually annotated though) data were able to show great promises in a number of fields.
Having said the above, the question now is what will be the future from now. Were there are two theories, (i) data "fundamentalists" believe that soon there will absolutely no need of physical understanding of the problem/origin of information and in the presence of massive amount of (annotated) data I should be able to perfectly model the complex behavior of my observations (black boxes!!!!), (ii) statistical/generative modeling where certain assumptions are made which introduce physical and humanly interpretable meaning on the solution which is then optimized using training data. It is hard to predict to what direction the cursor will move however it is certain that depending on the problem being considered most likely the solution will be different. It should be noted that we leave in an information era, which practically means that more and more (new) data will become available which could be used to better enriched existing models rather than building new ones.
From scientific view point is the big data "bubble" that different from the "internet" bubble in the nineties, or the "artificial intelligence" in seventies? Do we really need to reason on colossal amount of data towards making better predictions or taking more appropriate decisions in the general case? Is it that hard to predict everything from anything when we increase the degrees of freedom of the models? Does more info even if not relevant add something if combination is not done appropriately? Hard to say. Would that data need to be "big" not so sure, I will guess representative but not that "big". Would the answer to all problems being "black boxes" trained with massive amount of data with "complete" lack of physical interpretation? not that sure "either" even though if it seems to the trend now...
What is easy to say though is that live data thanks to the internet of
things and connected objects is become freely and massively available.
Personally, I believe that this will be the real motivation/driving
force of wealth creation. Simple day-to-day living problems associated with live
measurements transferred over the network and adaptively solved using
models based on historic training data.
Having said the above, it seems though that little attention is paid on the "scientific" issues relevant to the data interpretation that are developing new mathematical models and their computational solutions towards reasoning on this data! Very often, we have the impression that all problems will be solved once we have collected this amount of data and used them "naively" using massive computing. We also have the impression that each of us can become a "data scientists/manager" once had obtained access to such data. Well this definitely will be the greatest disillusion of our decade.