The Bottom Line on Ground Truth Data - Climate TRACE
News & Insights
The Bottom Line on Ground Truth Data
By Ann Marie Gardner and Daisy Simmons
Since our inception, Climate TRACE’s approach has been based on using satellite imagery and other forms of remote sensing — paired with machine learning — to develop independent, empirical, accurate estimates of greenhouse gas (GHG) emissions. A crucial part of that approach involves also leveraging ground truth data. In this article, we’ll take a closer look at what it is; when, where, and how we use it; and some common challenges we’ve had to address.
What are ground truth data?
When you measure something on site — in our case, GHG emissions — you’re collecting what are called ground truth data. Ground truth data are direct measurements from the source that can be considered reliable and accurate. For example, in the United States, smokestacks at coal-fired power plants are outfitted with sensors that measure the flow of carbon dioxide and other gasses as part of a U.S. Environmental Protection Agency (EPA) continuous emissions monitoring system (CEMS) requirement.
These ground truth data are important in their own right. When available, they can feed directly into emissions inventories methodologies to help paint a fuller picture, complementing other data sources to arrive at better estimates of total emissions
But they also play a crucial role for approaches that focus on remote sensing, including how Climate TRACE uses satellite imagery and artificial intelligence (AI) and machine learning (ML). That’s because ground truth data serve as “training data” for the AI/ML algorithms.
Using ground truth data to train AI/ML algorithms
Climate TRACE’s approach starts by analyzing satellite imagery for signs of emissions-causing activities. For example, returning to the power plant example above, we might look for visibly emitted water vapor plumes from the power plant cooling towers and flue stacks as signs that the plant is running (i.e., identifying emissions-causing activity). This activity can then be linked to the corresponding GHGs released into the atmosphere.
Next, software algorithms need to translate the duration and magnitude of that activity into actual GHG emissions estimates. But how do the algorithms know how to make that calculation? That’s where ground truth data come into play.
AI/ML algorithms “train” by comparing satellite imagery and other remote sensing imagery of vapor plumes with the corresponding ground truth data, such as the aforementioned flow sensors on the smokestacks of coal-fired power plants. The more training data the algorithms are able to use, the better and faster they ascend the learning curve to develop increasingly accurate emissions estimates.
Once the AI/ML software has sufficiently trained, it can start to estimate emissions for similar power plants anywhere, without the need for ground truth data about those emissions sources — and without the need for self-reported emissions numbers from the emitters themselves.
That’s part of what makes the Climate TRACE approach so powerful: it unlocks the unprecedented ability to independently track GHG emissions globally, at scale.
Sourcing ground truth data
Our project is uniquely able to harness ground truth data in emissions modeling thanks not only to technical expertise but also a spirit of collaboration and transparency. To date we’ve collected ground truth data from more than 11,000 sensors, representing a mix of public sources as well as proprietary ones.
Former U.S. Vice President Al Gore, a founding member of the Climate TRACE coalition, has inspired many organizations and private-sector companies to donate ground truth data toward our efforts.
We’re also proud to turn to dedicated members and partners for ground truth data in different sectors. For example:
— Power: WattTime uses ground truth data extensively from U.S. power plants, in part because the U.S. EPA has robust monitoring requirements. With WattTime’s help we can track site-specific carbon dioxide (CO2), nitrous oxide (N2O), and carbon monoxide (CO) levels, as well as other markers like heat. Additionally, ground truth data is also used from Europe and Australia.
— Shipping: OceanMind estimates GHG emissions for the global shipping industry by mining data from Automatic Identification Systems (AIS). These transmissions, which ships use to convey information to other ships and coastal authorities, give OceanMind visibility into a ship’s location, speed, engine power, size, and other factors that can help predict emissions. OceanMind also gathers and tracks actual emissions data donated anonymously by ship owners.
— Cement and steel: TransitionZero uses ground truth from cement and steel production reports to generate emissions estimates. For plants that don’t report production, TransitionZero trains satellite data to see hotspots generated by cement and steel production — information data scientists can in turn use to estimate production levels. This approach can then help gauge production capacity even in countries that don’t report.
Ground truth that — wait for it — aren’t on the ground
In some regions and sectors, emissions sensors are so few and far between that the next best thing is to counterintuitively use satellites as “ground truth” measurements. They’re not literally ground truth, but how we use them has many similarities.
Take the TROPOspheric Monitoring Instrument (TROPOMI) mounted on the Sentinel-5 Precursor (S5P) satellite, for instance. TROPOMI can detect methane emissions from sites as varied as coal mines and oil and gas fields.
This approach uses the satellite sensor’s measurement to create a “ground truth” measurement that can be more reflective of physical reality than assumptions used in a model and when such on-site measurements can be difficult to come by.
An example of this is Climate TRACE member RMI’s approach to model emissions from oil and gas production and refining. They incorporated TROPOMI methane measurements into their model to adjust default model estimates of methane fugitive loss rates for specific oil fields globally. This adjustment better reflected what was happening in reality, in which TROPOMI measured higher methane emissions, instead of using default model estimates, which indicated relatively lower methane emissions, for specific oil fields. As a result, higher methane emissions were estimated for specific oil fields that would have been underestimated if this approach was not used.
This approach of incorporating satellite methane measurements (not just TROPOMI but others launching soon) will be increasingly necessary to accurately account for super-emitting methane sources that have gone historically underestimated.
Good ground truth data can be hard to find
It’s easy to make the case for combining satellite and ground truth data — but sourcing good ground truth data can be difficult.
For starters, ground truth data are simply not available for certain sectors in under-resourced Global South countries.
Ground truth data are also valuable economically, and therefore might be privatized behind a commercial paywall. These data may also contain sensitive information — such as information that would reveal polluters breaking the law — and therefore are closely guarded from public scrutiny.
Also, training the algorithms requires that the satellite imagery and the ground truth data be of “like kind.” For example, you don’t want to train the software to estimate emissions based on satellite imagery from steel factories, but using ground truth data from coal-fired power plants. That’s a purposefully absurd example to make the point. But it’s an important reminder: getting good ground truth data for the sheer number and variety of emissions sources found around the world is no small task.
Conclusion
Although Climate TRACE starts with a view from above, as you can see, our approach also depends upon boots — or, at least, data — on the ground. If your organization might be interested in supplying ground truth data to support our collective efforts, please contact our team.