Big Data can’t tell you everything

There are few fields today that would not benefit from large, carefully selected data sets. However, lots of people – including scientists, businessmen and economists – put more stock in Big Data than they should.

Understanding the data implies that we have to understand its limitations and what we are looking at before we map it out to the reality landscape.

Big Data is without a doubt the idea of the moment. Everywhere we look, the narrative is grounded by what data sets reveal about the topic at hand.

But understanding still revolves, fundamentally, around a search of laws that describe the environment. And the one thing that Big Data isn’t particularly good at is identifying laws. Big Data is brilliant at detecting correlation; the more robust the data set, the better the chances of identifying these correlations, even complex ones involving multiple variables.

All the Big Data in the world, will not, by itself, tell you whether smoking causes lung cancer. To really understand the relationship between smoking and cancer, you need to run experiments and develop mechanistic understandings of carcinogens, oncogenes, and DNA replication.

Merely tabulating a massive database of every smoker and non-smoker in the world, along with when and how many times they smoked and how they died would not be enough to induce all the complex underlying biological machinery.

Beyond the numbers
The investment and real estate landscape is the same. In order to understand why asset prices move higher in the long term, we need to understand a combination of monetary theory, the theory of replacement value, and behavioral economics.

Even when we zoom into specifics of data sets, we need to understand that data set compilation itself has flaws; we are not really looking at “raw data”. In point of fact, there is no such thing as raw data.

When we look at transactional data, we need to account for time lag differences (between a sale and the registration), the difference in prices on account of payment plans and the “cost of money”, and other variables such as the role of narratives that get baked into prices without a clear understanding of why this is happening. Especially when the data is being generated by the private sector, which may have its own set of vested interests, whatever they may be.

Same data, different narratives
In Dubai, the data sets that are being produced by the private sector on real estate have generated contradictory narratives, from prices being at their lowest point in a decade to the fact that they are selectively rising. Meanwhile, the official price index in its first ever data release stated prices had actually risen in the month of November.

Even as we scrutinize this incongruity, what we do not know is why asset prices in Dubai rise over time given its entrepot status, and how the data would never have been able to predict the rise in importance of Dubai as a city.

Passion breaks through the bounds of logic to create outcomes that can only be understood when one examines first principles, the fact that cities rise when a set of circumstances are created and capitalized on to create economic value and sustainable growth. It is commendable that even as we are engulfed in a revolution of Big Data, Dubai has reinvented itself (yet again) to become the hub in the Middle East for emerging fields such as artificial intelligence and robotics.

But it is worthwhile to consider that it is not the reinvention that is the critical variable here, but rather it is the base paradigm of superior infrastructure, incentives and the purpose of the rulers that enable the city (and the country) to scale ever loftier heights.

This part never gets captured in the data sets. Consequently, we never truly get a true picture of the dynamics at play when we look at the data streams that are made available.

These are the limitations that data runs up against, and it is worthwhile to keep this in mind as we navigate through the investment landscape.

It is somewhat unnerving when people in the business and scientific world put too much faith in Big Data. Certain corners of academia have even taken on the “if we build it they will come” attitude, presuming that economics and science will sort itself out as soon as we have enough data. It won’t.

If we have good hypotheses, we can test them with Big Data. But Big Data should not be our first port of call; it should be where we go once we know what we are looking for.

– Sameer Lakhani is Managing Director at Global Capital Partners.