Preparing for the Big Data Revolution

by AlanSBPerkins

It is no accident that we have recently seen a surge in the amount of interest in big data. Businesses are faced with unprecedented opportunities to understand their customers, achieve efficiencies and predict future trends thanks to the convergence of a number of technologies.

Businesses need to take every opportunity to store everything they can. Lost data represents lost opportunities to understand customer behaviour and interests, drivers for efficiency and industry trends.

A perfect storm
Data storage costs have fallen dramatically. For instance, in 1956 IBM released the first hard disk drive, the RAMAC 305. It allowed the user to store five megabytes of data at a cost of $50,000 – that’s around $435,000 in today’s dollars. In comparison, a four-terabyte drive today can fit in your hand and costs around $180. If you were to build the four-terabyte drive using 1956 technology, it would cost $350 billion and would take up a floor area of 1600 km2 – 2.5 times the area of Singapore. Also, 10-megabyte personal hard drives were advertised circa 1981 for $3398 – that’s $11,000 today, or $4.4 billion for four terabytes.

Gordon Moore’s prediction in 1965 that processing capacity doubles approximately every two years has proved astoundingly accurate. Yet the amount of data we can generate has far outstripped even this exponential growth rate. Data capture has evolved from requiring specialised engineers, then specialised clerical staff, to the point where the interactive web allowed people to capture their own data. While this was a revolutionary step forward in the amount of data we had at our disposal, it pales before the most recent step: the ‘Internet of Things’, which has opened the door for machines to automatically capture huge amounts of data, resulting in a veritable explosion of data, way outstripping Moore’s Law. The result: the data load became too much for our computers, so we simply threw a lot away or stopped looking for new data to store.

With the price of storage decreasing sharply, the economies of storage have meant we can afford to capture more data: it has become increasingly important to find new ways to process all the data being stored at the petabyte scale. A number of technologies have emerged to do this.

Pets versus cattle
Traditionally computer servers were all-important – they were treated like pets. Each server was named and maintained with great attention to ensure that everything was performing as expected. After all, when a server failed, bad things would happen. Under the new model, servers are more like cattle: they are expendable, easily replaced. Parallel processing technologies have superseded monolithic approaches and allow us to take advantage of using many low-cost machines rather than increasingly more powerful central servers.

Hadoop is one project that has emerged to handle very large data sets using the cattle approach. Hadoop uses a ‘divide and conquer’ approach, which enables extremely large workloads to be distributed across multiple computers, with the results brought back together for aggregation once each intermediate step has been performed. To illustrate Hadoop: imagine having a deck of cards and someone asks you to locate the Jack of Diamonds. Under a traditional approach you have to search through the cards until you locate the card. With Hadoop, you can effectively give one card each to 52 people, or four cards each to 13 people, and ask who has the Jack of Diamonds. Much faster and much simpler when complex processes can be broken into manageable steps.

NoSQL, which was intended to mean “not only SQL”, is a collection of database technologies designed to handle large volumes of data – typically with less structure required than in a typical relational database like SQL Server or MySQL. Databases like this are designed to scale out to multiple machines, whereas traditional relational databases are more suited to scaling up on single bigger servers. NoSQL databases can handle semi-structured data; for example, if you need to capture multiple values of one type or obscure values for one person. In a traditional database, the structure of the database is typically more rigid. NoSQL databases are great for handling large workloads but they are typically not designed to handle atomic transactions: relational SQL databases are better designed for workloads where you have to guarantee that all changes are made to the database at the same time, or no changes are made.

Network science
Network science studies the way relationships between nodes develop and behave in complex networks. Network concepts apply in many scenarios; examples include computer networks, telecommunications networks, airports or social networks. Given a randomly growing network, some nodes emerge as the most significant and, like gravity, continue to attract additional connections from new nodes. For example, some airports develop into significant hubs while others are left behind. As an airport grows, with more connections and flights, there are increasingly compelling reasons why new airlines will decide to fly to that airport. Likewise, in social networks, some people are far more influential either due to the number of associations they develop or because of the effectiveness of their communication skills or powers of persuasion.

Big data can help us to identify the important nodes in any contextual network. Games console companies have identified the most popular children in the playground and given them a free console on the basis that they will have a lot of influence over their friends. Epidemiologists can identify significant factors in the spread of diseases by looking at the significant nodes and then take steps to prevent further contamination or plan for contagion. Similarly, marketers can use the same approaches to figure out what is more likely to ‘go viral’.

Benefits
Big data assists businesses to gain a better understanding of customers, treating each customer as an individual – the so-called marketing segment of one. Understanding what moves customers can build strong brand loyalty and evoke an emotional response that can be very powerful. Imagine an airline that recognises that a particular passenger travels from A to B every Monday to Thursday. However, if that passenger plans to stay in B for two weeks, imagine how much loyalty could be generated by offering them a free flight over the weekend to C, a discounted flight for their spouse from A to C, and a discounted hire car and room for the weekend away together.

Digital body language and buying habits can lead online retailers to be able to make astute decisions about what product to offer customers. Target was able to identify pregnant customers very early by their shopping patterns: customers buying certain combinations of cosmetics, magazines, clothes would go on to buy certain maternity products months later.

Big data can be used to drive efficiencies in a business. The freight company UPS, for example, was able to save almost 32 million litres of fuel and shave 147 million km off the distance its trucks travelled in 2011 by placing sensors throughout the trucks. As a side benefit, they learned that the short battery life of their trucks was due to the drivers leaving the headlights on.

By analysing customer relationships, T-Mobile was able to mitigate the risk of a domino effect when one customer decided to leave its service. It did this by identifying the customers who were most closely related digitally to the person churning and making a very attractive offer to those people, preventing the churn from spreading. Further, by analysing people’s billing, call dropout rates and public comments, they were able to act in advance to reduce churn by 50% in a quarter.

CERN conducts physics experiments at the Large Hadron Collider involving sending 3.5 trillion electron volts in each direction around an underground ring, resulting in particle collisions that provide an understanding of the basic building blocks of matter. The Higgs-Boson was proven by analysing the data that was generated in smashing the particles together. 15,000 servers are used to analyse the one petabyte of data that is generated per second and 20 gigabytes is actually stored. This is orchestrated using cloud techniques built on OpenStack and designed and supported by Rackspace.

Conclusion
We have reached a point where it is now better to start storing everything today so that we have a business case for analytical tools tomorrow. Once we start getting used to the idea that everything is available to us, we will find new ways to think about how we leverage our information. The businesses that succeed in the future will be those that constantly look for ways to mine the information they have gleaned.

[This article has been slightly modified from an article I wrote that was previously published in Technology Decisions magazine.]

Read more from Big Data, Innovation

IT News: Inchcape's crack at uniting legacy, manual IT	18th Jan 2017
IT News: Who will win Consumer CIO of the Year	25th Nov 2016
CRN Magazine: Meet the buyer: Inchcape Australia	Sep 2016
ComputerWorld: Rackspace CTO Retorts to Gartner’s Disapproval of OpenStack	4th Dec 2013
e27: Cloud can help Asian startups slay giants	21st Nov 2013
ZDNet: Biggest cloud risk for CIOs is being blind to potential	15th Nov 2013
Delimiter: Rackspace hires high-profile cloud CIO Perkins	5th Feb 2013
Asia Pacific Security:Cloud guru Alan Perkins joins Rackspace in new Asia Pacific role	5th Feb 2013
Technology Spectator: CeBIT kicks off with cloud computing focus	22nd May 2012
Australian Financial Review: Conference offers peek into future	22nd May 2012
CeBIT: Interview with Alan Perkins	26th Apr 2012
Sydney Morning Herald: Are CIOs Scared of the Cloud?	6th Mar 2012
The Australian: People to Watch in 2012	23rd Feb 2012
IT Wire: Altium Cloud Guru Weighs his Options	20th Feb 2012
The Australian: Cloud computing set for IT industry baptism	29th Nov 2011
Delimiter: CIO gives top seven tips for cloud adoption	17th Nov 2011
Delimiter: Does Australia need a cloud computing visionary?	13th Nov 2011
The Sydney Morning Herald: Companies not investing enough in IT security.	11th Nov 2011
The Australian Financial Review: Gatekeepers cultivate a new image. Also in print	8th Nov 2011	pS10
Australian IT (The Australian): New frontier in digital sock drawer	18th Oct 2011
MIS Australia: The Right Foundations (Also published in MIS Magazine)	26th Aug 2011	p30
IT News: Salesforce's Chatter goes private	9th Jun 2011
BRW: Internet Advantage	12th May 2011	p45
cio.com.au: Opinion: The buck stops with you on Cloud	22nd Apr 2011
BRW: Smart IT Outsourcing	21st Apr 2011	p33
Delimiter: Cloud Vendors Need to Communicate Better: CIO	13th Apr 2011
Sramana Mitra: Thought Leaders in the Cloud	21st Dec 2010

March 19, 2014

Preparing for the Big Data Revolution

Leave a comment Cancel reply

Alan Perkins

Selected Media

Recent Posts

Recent Tweets

Archives

About

Pages

Email Subscription

March 19, 2014

Subscribe

Preparing for the Big Data Revolution

Share this:

Related

Leave a comment Cancel reply

Alan Perkins

Selected Media

Recent Posts

Recent Tweets

Archives

About

Pages

Email Subscription