mirror of https://github.com/fabianfrank-dev/carbon_analysis.git synced 2026-07-14 04:53:53 +00:00

This is a student's project, analysing the emmisions of a country in correlation with their gdp.

Jupyter Notebook 99.7%
Python 0.3%

Find a file

Fabian Frank 4f3d26f07e Included the output of the notebooks		2026-06-04 16:51:31 +03:00
data	First fetching of csv files, as well as preparing dataframe for later processing	2026-02-04 16:29:22 +02:00
notebooks	Included the output of the notebooks	2026-06-04 16:51:31 +03:00
.DS_Store	Added electricity data for now	2026-02-09 13:50:49 +02:00
.gitignore	Implemented some GNI analytics by scraping Wikipedia tables.	2026-02-07 20:59:48 +02:00
README.md	Polished the documentation for submission	2026-06-03 20:19:11 +03:00
requirements.txt	Polished the documentation for submission	2026-06-03 20:19:11 +03:00

README.md

Carbon Emissions and Economic Development

Data Analytics course project analyzing whether countries can grow economically while reducing carbon intensity or weakening the link between GDP growth and CO2 emissions.

Submission Guide

The recommended grading path is the pandas version:

Open notebooks/pandas/05_summary_and_key_findings.ipynb for the consolidated story, plots, and final interpretation.
Open the four supporting pandas notebooks for the full workflow:
- notebooks/pandas/01_data_prep_and_core_relationships.ipynb
- notebooks/pandas/02_income_groups_and_carbon_efficiency.ipynb
- notebooks/pandas/03_electricity_mix_and_energy_use.ipynb
- notebooks/pandas/04_clustering_and_cluster_dynamics.ipynb
Treat the polars notebooks as an alternative implementation / extension of the same analysis.

The summary notebook is intentionally the main submission artifact because it can be reviewed quickly while still showing the main data preparation, feature engineering, visualizations, clustering, and conclusions.

Research Question

Can countries grow economically while reducing their carbon intensity or weakening the link between GDP growth and CO2 emissions?

The project answers this through exploratory analysis rather than causal inference. It identifies patterns across countries and income groups, but it does not claim that any single variable fully explains national emissions outcomes.

Analytical Workflow

The project demonstrates a full data analytics workflow:

Data loading and cleaning: CO2 and GDP data are loaded from local CSV files and filtered to country-year observations.
Reshaping and merging: World Bank GDP data is reshaped from wide to long format and merged with emissions data using ISO country codes and years.
Feature engineering: Per-capita and efficiency metrics are created, including co2_per_capita, gdp_per_capita, and co2_per_gdp.
Missing-data inspection: Missingness is visualized and summarized after the merge.
Relationship analysis: GDP per capita and CO2 per capita are compared over time using country-level Pearson correlations.
Income-group comparison: Countries are grouped by income level to compare emissions trajectories and carbon efficiency.
Energy-structure analysis: Electricity mix and energy-use measures are integrated to inspect whether cleaner energy shares align with lower carbon intensity.
Unsupervised learning: KMeans clustering, PCA, and UMAP-style dimensionality reduction are used to explore country archetypes.
Interpretation: Results are summarized with limitations and policy-relevant insights.

Key Findings

High-income countries show the widest range of GDP-CO2 relationships, including more cases consistent with relative decoupling.
Lower-income countries more often show positive GDP-CO2 relationships, suggesting that growth and emissions remain more tightly linked.
Carbon intensity generally declines over time, but the speed and starting point differ substantially across income groups.
Electricity mix matters, but it does not fully explain differences in carbon efficiency by itself.
Clustering reveals country archetypes that combine economic scale, emissions intensity, energy structure, and outlier behavior.

Data Sources

Dataset	Source	Use
CO2 emissions	Our World in Data	Main emissions panel and population values
GDP	World Bank	National GDP values reshaped from wide to long format
Electricity production / mix	Our World in Data	Green and non-green electricity share analysis
GNI per capita	Wikipedia table based on World Bank-style reporting	Income-group assignment for comparison
Energy use per capita	World Bank XML API	Energy-use extension section

Project Structure

carbon_analysis/
├── data/
│   ├── co2_data.csv
│   └── gdp_data.csv
├── notebooks/
│   ├── functions.py
│   ├── pandas/
│   │   ├── 01_data_prep_and_core_relationships.ipynb
│   │   ├── 02_income_groups_and_carbon_efficiency.ipynb
│   │   ├── 03_electricity_mix_and_energy_use.ipynb
│   │   ├── 04_clustering_and_cluster_dynamics.ipynb
│   │   ├── 05_summary_and_key_findings.ipynb
│   │   └── README.md
│   └── polars/
│       ├── 01_data_prep_and_core_relationships.ipynb
│       ├── 02_income_groups_and_carbon_efficiency.ipynb
│       ├── 03_electricity_mix_and_energy_use.ipynb
│       ├── 04_clustering_and_dimensionality_reduction.ipynb
│       └── README.md
├── requirements.txt
└── README.md

Setup

Use Python 3.10 or newer.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
jupyter notebook notebooks/pandas/05_summary_and_key_findings.ipynb

On Windows, activate the environment with:

.venv\Scripts\activate

Reproducibility Notes

The summary notebook and early data-preparation sections run from the local files in data/.
Some supporting notebooks fetch live external data from Wikipedia, Our World in Data, and the World Bank API.
Internet access is therefore required for the full extended workflow.
External web tables and APIs can change structure over time; if that happens, the local CSV-based summary remains the most stable submission artifact.
Income groups are assigned from currently available country-level income information, so they should be interpreted as a practical comparison framework rather than a perfect historical classification.

Why This Fits Data Analytics

This project demonstrates:

multi-source data handling
wide-to-long reshaping
country-year panel merging
missing-data inspection
feature engineering for comparability
visual analysis and interpretation
grouped aggregation and ranking
unsupervised clustering and dimensionality reduction
careful limitations around exploratory analysis

AI Use Disclosure

AI assistance was used for brainstorming, debugging, code-quality feedback, library usage questions, and improving explanatory wording. The analysis choices, implementation review, interpretation, and final submission decisions were checked by the author, and the work is presented as the author's own course project.