Today's Editorial

18 September 2017

When economists look to the sky

 

Source: By Sumit Mishra: Mint

 

The use of satellite data is new in Indian policymaking but a growing body of economic research has relied on mining such data over the past few years to answer questions relating to growth and poverty in regions where official data is either unavailable or unreliable. What started as a satellite programme run by the US department of defence to gauge cloud cover in the 1960s has increasingly become an important resource for economists.

Night-time lights or night-lights data contain the data on energy emitted or reflected back from the surface of the earth to the sky. Economists figured that this data tends to correlate with economic activity. Thus, for countries with poor data, night-lights data has been particularly useful in estimating the level of economic activity and gross domestic product (GDP). A classic example is Myanmar, which stopped publishing its national accounts statistics in 1989.

Using night-lights data, economists have shown that the country’s economy is growing at a very slow pace. A 2012 research paper by the Japan-based think-tank IDE-JETRO used night-lights data to show that most economic activity is concentrated in regions surrounding the capital Yangon. Also, the regions bordering China and Thailand grew at much faster pace than those bordering Bangladesh and India, the study suggests.

There has been similar uncertainty about growth in North Korea after sanctions were imposed against it. Night-lights data have once again come to the rescue. As a celebrated image confirms, there is a stark difference between South and North Korea; the North is almost completely dark while the South seems to be well-lit. A recent study by Yong Suk Lee of Stanford University using night-lights data shows that economic activity appears to be concentrated in urban areas, and particularly so in the capital Pyongyang. Lee found that economic sanctions decreased luminosity in the hinterlands but increased luminosity in urban areas, especially the urban core, suggesting that the dictatorship may have “countered the effects of sanctions by reallocating resources to the urban areas”. Also, economic activity as measured by luminosity or night-lights seemed to have increased in regions bordering China, Lee’s study showed.

In India, night-lights data has been used to understand the effects of the reorganization of states. In 2000, three new states were created—Uttarakhand, Jharkhand and Chhattisgarh. A 2015 study by Sam Asher of the World Bank and Paul Novosad of Dartmouth College, using census and night-lights data, suggested that there has been marked improvement in economic activity in the newly created states. A recent research paper by analysts Praveen Chakravarty and Vivek Dehejia of the IDFC Institute in Mumbai used night-lights data to show that both inter-state and intra-state inequality in India has been growing. The Economic Survey this year also documented the widening regional inequality in India, and attributed it to the differing quality of governance. But Chakravarty and Dehejia argue that much of the differences in economic activity may be driven by network effects.

The conventional channel of economic convergence is diminishing returns to capital, but we would argue that this is offset by the opposite phenomenon, of agglomeration economies in capital accumulation and network externalities which increase, rather than decrease, the marginal productivity of capital as its stock increases,” the duo write. “…That is why Apple has chosen to locate its new manufacturing unit in India in wealthy and expensive Karnataka rather than (say) poor and cheaper Bihar, and why, by extension, it has chosen to locate in wealthy and expensive Bengaluru rather than (say) poor and cheaper Shimoga. If low labour costs and a putatively higher return to capital were drivers, as conventional theory suggests, we ought to have seen Apple locate in Bihar rather than Karnataka or at least within Shimoga rather than Bengaluru.”

There are two serious limitations of night-lights data that researchers have been grappling with for some time. One, satellites that record this data do not have the capacity to detect artificial lights with precision and only capture the lights emitted from vehicular traffic, rooftops and streets. The second is more serious: night-lights data do not distinguish between the poorest and the poor region. Beyond a threshold, all is dark in the satellite images.

Research by Charlotta Mellander of the Jonkoping International Business School and co-authors, based on a study in Sweden, suggests that while the correlation between night-time lights and economic activity is strong enough to make it a relatively good proxy for population and establishment density, the correlation is weaker in relation to wages. The researchers found the link between light and economic activity, especially estimated by wages, to be “slightly overestimated in large urban areas, and underestimated in rural areas”.

To get around some of these issues, researchers have begun to use data from daytime satellite images, which are fed into a machine learning algorithm to estimate levels of poverty, capital stock, and economic activity. Multiple images, captured on several different days, are often combined to obtain a cloud-free composite image. The machine learning algorithm helps categorize the composite image data—in the form of pixels, each of which is a vector of quantities in different bands—into a discrete set of land cover categories. How far such workarounds capture the real level of economic activity still remains a matter of further research.

Even as satellite-based data on lights are being mined, other sources are also being harnessed to understand the dynamism of economies, especially urban economies. For instance, Google Street View offers a rich source of visual snapshots of cities across the world. Harvard University economist Edward Glaeser and others have used Google Street View images on the quality of roads and type of dwellings to determine income at a much disaggregated level. They find that Google Street View data predict income and housing prices within New York pretty well.

Google Street View images can also help us understand gentrification in cities. In their 2014 American Journal of Sociology research paper, Harvard University sociologists Jackelyn Hwang and Robert Sampson scoured thousands of images for 23 cities in the US to show that gentrification raised inequality in American cities, with the blacks bearing the brunt of it. Even after having controlled for a number of factors including crime rate, perception, access to amenities, race still explains why certain neighbourhoods tend to be poor and others tend to be rich.

As cell phones become ubiquitous in developing countries, mobile data is also being used to measure wealth and urban commuting. Using an anonymized database containing call records of billions of interactions in Rwanda, Joshua Blumenstock of the University of California, Berkeley, in a 2015 research paper published in Science, created a measure of wealth based on the length and duration of calls, to find that it closely tracked the socio-economic status of individuals, and at an aggregate level, the wealth level of regions.

Although these studies are quite innovative in their application of modern data mining techniques to get around the problem of irregular or patchy economic data, it is worth noting that they are meant to be workarounds for the most part. Like any other modelling exercise, there are implicit assumptions hidden in most economic estimations using satellite imagery. One typical assumption is that the economic activity or luminosity of each distinct geographical unit is independent of each other (or spatial independence, as economists term it). But this assumption can be violated for satellite images given that the value of a variable for a particular location is affected by the value of neighbouring locations.

Secondly, all satellite-based data are dependent upon the orbits that satellites take around the earth. And, the quality of images captured by a satellite varies over space and time. How this affects analysis is still not entirely clear.

Thirdly, as Dave Donaldson and Adam Storeygard emphasize in a recent review paper on the use of satellite data, the use of machine learning techniques imposes additional costs in terms of resources and analysis that a researcher has to deploy on the ground to arrive at robust conclusions.

A critical input to these (machine learning techniques) and other methods is the availability of training data on the variable of interest that assigns ground truth values to sample sites,” the duo point out. “For example, delineating imaged urban neighbourhoods as residential, or even more specifically as slums, requires first providing a set of areas pre-defined as slums by other means. Doing so well requires a training dataset that reflects the full diversity of distinct neighbourhoods within the category of slums. This is especially challenging when the object of interest is heterogeneous or imprecisely defined….One could imagine economists using remotely sensed information on buildings to estimate a region’s capital stock; in such a case, the ideal training data would concern building values instead of building types. Because these training data sets are used to define the classes underlying a classification algorithm, they must be produced outside the algorithm. Thus, they are typically a labour-intensive analog constraint on a technology that otherwise can operate with all the scale benefits of computer processing.”

Finally and perhaps most importantly, most satellite-based data can potentially identify individuals and households. Cell phone data are the most problematic in this regard, and have serious repercussions on privacy. It is also worth noting that most early proponents of the use of night-lights data advocated the use of such data as a substitute for national accounts and household survey data where such data is either not reliable or is irregular, and as a complement where such data is indeed available. The reason for the continued preference for old-fashioned data collection techniques is that they generate thick layers of information, which collectively can convey a richer sense of an economy than mere satellite images can. Traditional databases thus help us form inferences based on a wider variety of data. The flip side of course is that it may not be possible to disaggregate the traditional measures in the same manner as satellite data, which is available at a granular level.

To sum up, new and exciting data-sets are helping us understand the world better. However, it is erroneous to believe that these new data-sets can substitute existing survey-based or national accounts data. A satellite can hardly tell us anything about intra-household allocation of resources, for instance, or the level of discrimination in a rural labour market in a country such as India. The use of satellite data in a heterogeneous country such as ours also requires intensive use of on-ground resources in several cases, as discussed above. Moreover, economists are still grappling with challenges in interpreting the information from these new and big data-sets, which means inferences must be drawn with greater caution. Undoubtedly, the understanding of such data will evolve over time. At this point, it is best to think of these new data-sets as complements to the traditional data sources collected by the regular statistical machinery.