Harnessing big data to address the world’s problems
An earthquake has struck a developing country, devastating a large city. Relief agencies and the local government have responded, but they’re struggling to get an accurate picture of the situation so they can rush food, water, and medical supplies to where they’re needed most. First-hand observations and aerial photos are not enough to capture a fluid situation in which tens of thousands of people are on the move, making it hard to deliver supplies effectively or to track the spread of disease as sanitation breaks down.
Fortunately, data saves the situation. Analysis of millions of anonymous call records from a mobile-phone provider reveals where the SIM cards (and therefore subscribers) are, allowing responding organizations to track the daily movement of the population in and out of the zone. When combined with census data from before the disaster, it becomes much easier to assess needs accurately and prepare for a possible outbreak of disease. Many lives are saved.
This dramatic solution is now a real possibility. Around the world, social innovators are starting to use “big data”—datasets whose size is beyond the ability of typical software programs to capture, manage, and analyze—to address critical problems. As the McKinsey Global Institute (MGI) has written elsewhere, big data can create significant value in the world economy, enhancing the productivity and competitiveness of companies and the public sector and creating economic surplus for consumers.1 But big data may be equally important in the social sector, where they can be used to identify and address important societal issues. The disaster response scenario outlined above has already been tested by researchers; others are proving the utility of big data in fields like microcredit, education, and public health.2
There’s plenty of data for the sector to work with. Many organizations with a social purpose have been collecting their own big data for years, and there is significant data in the hands of institutions such as developing country governments, various branches of the UN, and public health organizations such as the Global Fund to Fight AIDS, Tuberculosis and Malaria. Some organizations such as the World Bank are moving aggressively to open this data to the public. And new data collection initiatives will produce even more data—at 4 to 8 MB of data per person, India’s Unique Identification Number database could quickly grow to as much as 20 petabytes of data, more than 80 times the size of the entire Library of Congress.3 Finally, many “big” datasets in the private sector might be adapted to help produce social impact, even if they were not designed for that purpose. The group of researchers that tested the disaster response scenario above also used location data from 138,000 cell phones to understand the magnitude and trend of population movements over the course of a cholera outbreak in Haiti.
For social innovators and other organizations interested in producing social change, the question is where to start. When is big data valuable in the social sector? How much impact could it have? And what do we need to do to produce that impact? Some of our current research in McKinsey’s Social Sector Office aims to answer these questions.
MGI’s original report argued that big data could create value in five ways in the private sector. Our research suggests that these five sources of value, slightly modified, apply in the social sector as well.
- First, making big data more readily available can increase the speed and accuracy with which we deploy social interventions. In Nairobi, for example, the Engineering Social Systems project has used geo-coded mobile-phone transaction data to model the growth of slums and informal settlements. This type of data could help city governments to optimize their allocation of scarce infrastructure funding and other municipal resources rather than waiting months for survey results that may already be out of date.
- Big data also enables assessment, allowing practitioners to expose variability in the effectiveness of a solution and potentially improve it. In the United States, the National Center for Analysis of Longitudinal Data in Education Research (CALDER) has consolidated longitudinal databases from multiple states in order to understand how teacher and governance polices affect student outcomes.
- Big data can classify target populations of beneficiaries into smaller segments, helping organizations to identify specific needs and tailor initiatives to ensure maximum impact. New York City’s Department of Education does this with its Achievement Reporting and Innovation System (ARIS), which houses all student, school, and system-level data, enabling teachers and parents to take actions for particular groups of students.
- The analysis of big data can also improve decision making, for example, by making risk assessment faster and more accurate. Half of the world’s economically active population lacks access to formal financial services. In many cases this is due to a lack of accurate credit histories, which prevents lenders from serving this segment. Microfinance has been able to overcome this, but largely with labor-intensive group lending approaches. A big data approach could allow much faster, while still responsible, credit scoring. For example, one Asian lender has used telecom data—information on delinquencies and preferred payment plans—to develop a predictive model for small loan defaults.
- Finally, analysis of big data can enable the creation of new products, services, and business models to serve the “bottom of the pyramid” or other disadvantaged consumers. The Insurance Association of Malawi, in partnership with the World Bank and Opportunity International, is continuously analyzing large sets of weather and rainfall data to provide weather-indexed insurance to farmers that helps them to secure loans.4
The social sector has barely begun to tap the potential of big data. As these examples suggest, organizations that systematically integrate big data into their strategies can generate significant new impact. But in order to do so, they must overcome several barriers.
The biggest, by far, is simply gaining access to data. Large amounts of potentially important data are held by organizations that may not make full use of it themselves (such as governments) or that use it mostly for proprietary reasons (such as many private companies), and privacy concerns hold more back. The “open data” movement now in evidence at organizations such as the World Bank is one step forward, but much more thought must be given to policies and incentives that would put more data into circulation in a sustainable and responsible way.
There are also questions about data compatibility and quality. Much of the data stored is not easily useable. Legacy systems and inconsistent data formats make it more difficult to integrate datasets in the ways that generate value. In addition, many existing data sets often suffer from poor quality—incorrect or mismatched information, gaps, or extensive “noise.” Even if they overcome these hurdles, many interested organizations lack the computing power and software tools required to integrate, analyze, and visualize big data—although the advent of cloud computing could address some of the problem.
A major shortage of talent—deep analytical talent, data-savvy managers, and supporting technology personnel—presents an additional hurdle. This is already an issue in the private sector, where MGI estimates that there will be an annual shortage of graduates in deep analytical fields of 140,000 to 190,000 by 2018 in the United States alone. Despite the efforts of organizations such as Data Without Borders, which matches scientists and statisticians with nonprofits for pro bono data work, shortages in these roles may be even more challenging in the social sector.
Philanthropists and practitioners can take a first step toward removing these barriers by starting to think strategically about big data—where big data might help advance their goals, what datasets and analysis would be required to do so, who they could partner with to produce the analysis, and what the partner’s incentives might be. They should also consider how to build a stronger infrastructure around big data in the social sector: for example, how to move to common policies and priorities, share best practices, and develop talent. The findings from our ongoing research will highlight some of these steps, as well as those that can be taken by partner organizations in the public and private sectors.
Done well, big data has the potential to advance important social goals in areas such as disease surveillance, student curricula, and microcredit. Understanding big data’s potential for social impact and the barriers to capturing it are the first steps toward its effective use.
1See Big data: The next frontier for innovation, competition, and productivity.
2For the disaster relief example, see Linus Bengtsson et al., “Improved response to disasters and outbreaks by tracking population movements with mobile phone network data: A post-earthquake geospatial study in Haiti,” PLoS Medicine, 2011, Volume 8, Number 8, e1001083, doi:10.1371/journal.pmed.1001083.
2Vince Beiser, “Massive biometric project gives millions of Indians an ID,” Wired, August 19, 2011.
4Joanne Linnerooth-Bayer, Reinhard Mechlar, Pablo Suarez, and Marjorie Victor, “Drought insurance for subsistence farmers in Malawi,” Natural Hazards Observer, 2009, Volume 33, Number 5, pp. 1–8.