An analysis of U.S. CO2 emissions by sector using ClimateTrace data
CO2 is by far the most abundant human-emitted greenhouse gas. Roughly 40 billion metric tons of CO2 are generated each year by transportation, electrical generation, cement manufacturing, deforestation, agriculture, and many other practices.
ClimateTrace
Over time, I’ve been increasingly curious about the topic of climate change. Although there is much international effort to understand emissions, a lack of transparency can hinder our ability to accurately measure them. ClimateTrace is an organization attempting to leverage satellites, remote sensing, and artificial intelligence to provide more accurate estimates of global emissions.
Addressing the problem: largest contributors of CO2 emissions
A comprehensive database of CO2 emissions released by ClimateTrace can help answer a few question I have about CO2 emissions, such as:
- How much CO2 does the U.S. emit compared to other countries?
- What parts of the economy contribute the most CO2 pollution?
- What is the current state of those sectors and how have they changed over recent years?
Investigating these questions can help me understand what sectors of the U.S. economy are in most need of cleaner products and practices.
Exploration and Cleaning
emissions <- read_csv('climatetrace.csv', show_col_types = FALSE)
glimpse(emissions)
## Rows: 58,500
## Columns: 7
## $ `Tonnes Co2e` <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, ~
## $ country_full <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "A~
## $ country <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", "ABW", ~
## $ sector <chr> "agriculture", "agriculture", "agriculture", "agricultur~
## $ subsector <chr> "cropland fires", "cropland fires", "cropland fires", "c~
## $ start <date> 2020-01-01, 2019-01-01, 2018-01-01, 2017-01-01, 2016-01~
## $ end <date> 2021-01-01, 2020-01-01, 2019-01-01, 2018-01-01, 2017-01~
On my first impression of the data, I’m surprised to see null values for emissions in the Tonnes Co2e column. Let’s get a better look at the distribution of emissions.
skim(emissions$'Tonnes Co2e')
Name | emissions$“Tonnes Co2e” |
Number of rows | 58500 |
Number of columns | 1 |
_______________________ | |
Column type frequency: | |
numeric | 1 |
________________________ | |
Group variables | None |
Data summary
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
data | 14413 | 0.75 | 6894622 | 69037274 | 0 | 0 | 9000 | 865200 | 4386245000 | ▇▁▁▁▁ |
A quick skim shows us that about 25% of the 58,500 rows are null, and that the emission distribution is skewed right. For now, I’ll save the null entries in a separate tibble before dropping them so I can have a closer look later. Also, having two country identifiers and two date columns might be superfluous, so I’ll keep the full country name and end date only.
emissions <-
emissions %>%
select(!c(country, start))
emiss_null <- # Saved null values for further analysis
emissions %>%
filter(is.na(emissions$`Tonnes Co2e`))
emissions <-
emissions %>% na.omit
glimpse(emissions)
## Rows: 44,087
## Columns: 5
## $ `Tonnes Co2e` <dbl> 0, 0, 0, 0, 0, 0, 13300, 13300, 13300, 13700, 13000, 138~
## $ country_full <chr> "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "Aruba", "A~
## $ sector <chr> "agriculture", "agriculture", "agriculture", "agricultur~
## $ subsector <chr> "rice cultivation", "rice cultivation", "rice cultivatio~
## $ end <date> 2021-01-01, 2020-01-01, 2019-01-01, 2018-01-01, 2017-01~
Now the data is condensed and relevant to our first question: how much CO2 does the U.S. emit compared to other countries?
Top Polluters by Country, Sector, and Subsector
Let’s take a look at the top 6 emitters of CO2 by the country_full column and save it to a new tibble called top_country.
top_country <-
emissions %>%
group_by(country_full) %>%
summarize(total_emissions = sum(`Tonnes Co2e`)) %>%
arrange(desc(total_emissions))
head(top_country)
## # A tibble: 6 x 2
## country_full total_emissions
## <chr> <dbl>
## 1 China 79505597168
## 2 United States of America 38453419159
## 3 India 22026290510
## 4 Russian Federation 14687431292
## 5 Indonesia 8233503249
## 6 Japan 8208617507
These totals indicate that, over the last 5 years, the U.S. has placed runner-up for the most CO2 emissions, second only to China. Using similar code, let’s dive deeper into the sectors most contributing to emissions in the U.S.
## # A tibble: 6 x 2
## sector total_emissions
## <chr> <dbl>
## 1 power 82195880544
## 2 manufacturing 57042411063
## 3 transport 43085579679
## 4 agriculture 38273277305
## 5 oil and gas 33263211325
## 6 buildings 24883138209
Over the last 5 years, the power sector has contributed the most to CO2 emissions globally. This may come at no surprise; we utilize vast amounts of power to live our modern lives, and there is no real fossil fuel alternatives that can provide base load power at scale.
Let’s take a look at the top contributors by subsector.
## # A tibble: 6 x 2
## subsector total_emissions
## <chr> <dbl>
## 1 electricity generation 74149861000
## 2 roads 36063812889
## 3 other manufacturing 29076461911
## 4 residential commercial onsite heating 21409922971
## 5 enteric fermentation 16732518899
## 6 oil and gas production 15672465055
No surprises here, as electricity generation is the main driver of the power industry. Even though coal, oil, and natural gas have long been known to be heavy CO2 emitters, the data shows these economic behemoths still press on as modern leaders in CO2 pollution today.
Conclusions and Further Analysis
We have reached conclusions for our first two questions:
-
How much CO2 does the U.S. emit compared to other countries?
The U.S. emits roughly 38.5 billion tons of CO2, about half of that of China, the world’s CO2 emissions leader. -
What parts of the economy contribute the most CO2 pollution?
Electricity generation is by far the world’s largest subsector of CO2 emissions, followed by road transport and other manufacturing around the world.
Initial analysis of ClimateTrace’s data inspired has inspired questions such as:
- What countries have null values and why?
- How does a country’s population correlate with their CO2 emissions?
Further analysis will track trend variation by subsector over time, and look deeper at factors driving CO2 emissions world wide.