Data Generation is Growing – Start Storing Everything
The world is generating more data than ever before.
In 2013 IBM reported that 90% of the world’s data had been generated in the last two years, and this trend is continuing. So what is causing this explosion of data?
Several major factors have contributed. Chief among these is the changing way we interact with computers. The first generation of data capture involved rigorously-prepared data being hard-wired physically into the machine by the engineers. The second generation involved professionally trained computer operators who would feed data into the machines at our request. Software was designed to enforce restraints so the machine knew how to process the data. The third generation saw everyone getting access to enter their own data – Web 2.0, the interactive Web: people were given the freedom to capture whatever they wanted to capture, and the amount of data captured exploded. Now we are on a fourth generation of data capture – the Internet of Things, Machine to Machine communication. Gartner has projected that there will be 26Billion connected devices by 2020, and the number of sensors will be measured in the trillions.
The price of storing all the data we are generating has fallen dramatically. The first hard disk drive, IBM’s RAMAC 305 was launched in 1956 at a cost of $50,000, or $435,000 in today’s dollars – the drive stored 5Megabytes. To put that in context, it would cost $350Billion today to store 4 Terabytes of data using that technology, and it would take up a floor area of 2.5 times the area of Singapore. Of course, today, a 4Terabyte drive can be purchased for little more than a hundred dollars and can fit in the palm of a hand.
Since storage costs have been largely resolved, the biggest challenge remained around data processing and analysis. There are two issues here: firstly traditional database systems are designed to run on one machine. Larger databases implies scaling up to bigger machines, but the amount of data now available has outstripped our capacity to process it using traditional methods on one computer – the computers are just not powerful enough. Secondly data has become more structurally complex, and traditional database designs, which rely on the structure of the data being predefined during the design phase, no longer cope with the flexibility required when people and systems can evolve to use data in unpredictable ways.
Until the rise of next-generation database systems like MongoDB, these limitations resulted in a lot of data being thrown away: what’s the point of storing stuff if you cannot make sense of it? But MongoDB has helped change all that. MongoDB is inherently designed to work across many computers thus enabling it to handle vastly larger amounts of data. Furthermore, the structure of the data does not have to be defined in advance – MongoDB allows for the storage of anything, and yet patterns and sense can be gleaned regardless of the structure.
For example, in a database of customers, we may have access to store information about each customer that is highly specific to them as individuals – their pets, their hobbies and special interests, places they have visited, books they have read, companies where they have worked. We may not know what information we can glean, but with MongoDB we can store it today and make sense of it in the future.
Given tools now exist that can handle less structured and very large datasets, and storage costs have so dramatically reduced, it stands to reason to start storing everything. Businesses of the past were valued on their brand awareness; businesses of the future will be increasingly valued on how well they can make use of the data at their disposal to understand each customer, improve their efficiency and responsiveness, and make the best decisions.
Storing data today for use tomorrow makes a great deal of business sense. Once data has been collected, questions about how to use the data will naturally ensue. Without the data, people won’t even think about questions they could be asking, things like:
- What would be the optimum price for this product?
- How soon should we follow up a customer after they have purchased item X?
- What products act as good loss leaders?
- What are the signs that indicate a customer will churn?
- When a customer churns, who are the people most at risk of following them?
- What impact does the weather have on purchasing patterns?
MongoDB makes the perfect solution to store data. It has the flexibility to cope with unstructured and structured data, can scale to many petabytes with full replication, and has a great deal of support for analytics. Even if there is no current interest in analysing data, a business case will be much easier to make for analytical tools in the future if there is a large reservoir of data to use, rather than just an idea to grow from scratch.
The businesses who win in the future will be those that know how to harvest all the data available to them. The sooner they start storing and practising how to glean the most from their data, the sooner they can learn to pre-empt the customers’ needs, pre-empt the marketplace, learn to cope with the veritable deluge in a highly responsive manner and leave the less-informed competition behind.