A Better Way to Tackle All That Data

The single biggest challenge any organization faces in a world awash in data is the time it takes to make a decision. We can amass all of the data in the world, but if it doesn’t help to save a life

A Better Way to Tackle All That Data
A Better Way to Tackle All That Data

The biggest challenge any organisation faces in a world awash in data is the time it takes to decide. We can amass all of the data in the world, but if it doesn’t help to save a life, allocate resources better, fund the organisation, or avoid a crisis, what good is it? Hampered by a shortage of qualified data scientists to perform the analysis, big data’s rise is outstripping our ability to conduct research and reach conclusions fast enough. At the root of this problem is our concept of what constitutes data. Existing boundaries of what we can digitise and analyse are moving outward daily.

Taking Gartner’s prediction that the Internet of Things (essentially, sensors that share data with the Internet) will add 50 billion machine voices to today’s 2 billion connected users, we have to believe that the ability of humans to manage the process of amassing the correct data and performing the proper analysis is headed for trouble. The measure of how long it takes analytics to conclude is often called “time to decision.” Suppose we accept that big data’s holy grail is, as Randy Bean says in Information Week, better, faster decisions. In that case, we have to believe that as data grow in volume, velocity, and variety, making management more complex and potentially slowing decision-making time, something has to give. This problem is crying out for a solution that has long been in development but has only recently begun to become practical and economically feasible enough for widespread adoption — machine learning.

As the term suggests, machine learning is a branch of computer science where algorithms learn from and react to data just as humans do. Machine-learning software identifies hidden patterns in data and uses those patterns both to group similar data and to make predictions. Each time new data are added and analysed, the software gains a clearer view of data patterns and gets closer to making the optimal prediction or reaching a meaningful understanding. It does this by turning the conventional data-mining practice on its head. Rather than scientists beginning with a (possibly biased) hypothesis that they seek to confirm or disprove in a body of data, the machine starts with a definition of an ideal outcome which it uses to decide what data matter and how they should factor into solving problems. If we know the optimal way for something to operate, we can figure out exactly what to change in a suboptimal situation.

Thus, for example, a complex system like a commuter train service has targets for the on-time, safe delivery of passengers that present an optimisation problem in real-time based on various fluctuating variables, ranging from the weather to load size to even the availability and cost of energy. Machine-learning software onboard the trains themselves can consider all these factors, running hundreds of calculations a second to direct an engineer to operate at the proper speed. The Nest thermostat is a well-known example of machine learning applied to local data. As people turn the dial on the Nest thermostat, it learns their temperature preferences. It begins to automatically manage the heating and cooling, regardless of time and day of the week. The system never stops learning, allowing people to define the optimum continuously.

Applying machine learning in health care is essential to achieving the goal of personalised medicine (the concept that every patient is subtly different and should be treated uniquely). Nowhere is this more easily seen than in cancer treatment, where genomic medicine enables highly customised therapy based on an individual’s type of tumour and myriad other factors. Here machine-learning algorithms help sort the various treatments available to oncologists, classifying them by cost, efficacy, toxicity, etc. As patients are treated, these systems grow in intelligence, learning from outcomes and additional evidence-based guidelines.

This leaves oncologists free to optimise treatment plans and share information with their patients. With the rise of off-the-shelf software, such as LIONsolver, the winner of a recent crowdsourcing contest to find better ways to recognise Parkinson’s disease, machine learning is at last entering the mainstream, available to a wider variety of businesses than the likes of Yahoo, Google, and Facebook that first made big data headlines. More and more companies may now see it as a viable alternative to addressing the rapid proliferation of data, with increasing numbers of data scientists spending more and more time analysing data. Expect to see machine learning used to train supply chain systems, predict the weather, spot fraud, and, especially in customer experience management, help decide what variables and context matter for customer response to marketing.