This is a student's project, analysing the emmisions of a country in correlation with their gdp.
  • Jupyter Notebook 96.3%
  • Python 3.7%
Find a file
2026-05-17 22:40:48 +03:00
data First fetching of csv files, as well as preparing dataframe for later processing 2026-02-04 16:29:22 +02:00
notebooks Implemented some of the key analysises and takeaways in a summary notebook 2026-05-17 22:40:48 +03:00
.DS_Store Added electricity data for now 2026-02-09 13:50:49 +02:00
.gitignore Implemented some GNI analytics by scraping Wikipedia tables. 2026-02-07 20:59:48 +02:00
README.md Improve external data parsing and JSON analysis 2026-05-15 13:02:41 +03:00
requirements.txt Improve external data parsing and JSON analysis 2026-05-15 13:02:41 +03:00

Carbon Emissions and Economic Development

Work in progress. This repository is being developed as a university data analytics project focused on clear data handling, thoughtful visualization, and methodical analysis.

Overview

This project examines whether economic growth can occur without a proportional increase in carbon emissions. More specifically, it studies the relationship between GDP development, CO2 emissions, and energy structure across countries over time.

The notebook combines multiple public datasets, reshapes and merges them at the country-year level, engineers per-capita and efficiency metrics, and then explores whether countries appear to remain tightly coupled to emissions growth or show signs of relative decoupling.

The project is exploratory rather than causal: the goal is to identify patterns, contrasts, and analytically useful country groupings, not to claim that any one factor alone explains national emission outcomes.

Research Question

The central question is:

Can countries grow economically while reducing their carbon intensity or weakening the link between GDP growth and CO2 emissions?

This question is explored through:

  • country-level GDP-CO2 correlation analysis
  • per-capita normalization
  • income-group comparisons
  • carbon-intensity trends
  • electricity-mix comparisons
  • selected clustering and dimensionality-reduction analyses

Current Scope

The notebook currently demonstrates the following workflow:

  1. Data loading and cleaning CO2 and GDP data are loaded, reduced to relevant columns, and cleaned to remove aggregate entities and unusable rows.

  2. Reshaping and merging GDP data is transformed from wide to long format, then merged with emissions data using ISO country codes and years.

  3. Feature engineering The analysis constructs co2_per_capita, gdp_per_capita, and co2_per_gdp to make countries more comparable across population size and economic scale.

  4. Missing-data inspection Missing values are visualized and summarized after merging to make data quality issues explicit rather than implicit.

  5. Correlation analysis GDP per capita and CO2 per capita are compared over time at the country level using Pearson correlation.

  6. Income-group classification Countries are assigned to income groups using available GNI-per-capita data and then compared across key climate-economic metrics.

  7. Comparative visualization The project includes violin plots, faceted time series, dual-axis plots, bar charts, and scatter-style comparative views.

  8. Carbon-intensity analysis CO2 per GDP is used as an emissions-efficiency measure, both across time and for ranking countries in recent years.

  9. Energy-structure analysis Electricity-source data is used to compare green and non-green energy shares with carbon intensity.

  10. Extended exploratory analysis Additional sections include z-score style outlier analysis, clustering of country profiles, and dimensionality reduction to inspect broader country archetypes.

Key Findings So Far

The current notebook suggests several recurring patterns:

  • High-income countries show the widest spread of GDP-CO2 correlations, including more cases consistent with decoupling.
  • Lower-income countries are more concentrated around positive correlations, indicating that growth and emissions often still move together.
  • Carbon intensity tends to decline over time across income groups, although the speed and level of improvement differ substantially.
  • Electricity mix matters, but it does not fully explain efficiency differences on its own.
  • Some countries behave as clear outliers, which makes anomaly detection and clustering analytically useful.

These findings should be read as exploratory patterns rather than final conclusions.

Data Sources

Dataset Source Coverage / Use
CO2 emissions Our World in Data Main emissions data, filtered to modern country-year observations
GDP World Bank National GDP, reshaped from wide to long format
Electricity production / mix Our World in Data Used to derive green and non-green electricity shares
GNI per capita Wikipedia / World Bank thresholds Used for income-group assignment
Energy use per capita World Bank XML API Used in the extended exploratory energy-use section

Tech Stack

This project is designed to showcase practical data-analysis skills in Python. Depending on the current notebook version, the workflow may combine:

  • Pandas / Polars for tabular wrangling and reshaping
  • NumPy for numeric operations and derived metrics
  • Matplotlib / Seaborn for visual analysis
  • scikit-learn for scaling, clustering, and exploratory ML-style preprocessing
  • BeautifulSoup / ElementTree / requests for more explicit HTML and XML ingestion

The focus is less on model performance and more on analytical clarity, data preparation, and interpretable structure in the data.

Project Structure

carbon_analysis/
├── data/
│   ├── co2_data.csv
│   └── gdp_data.csv
├── notebooks/
│   ├── data_analytics_project_pandas.ipynb
│   ├── data_analytics_project_polars.ipynb
│   ├── functions.py
│   ├── pandas/
│   │   ├── 01_data_prep_and_core_relationships.ipynb
│   │   ├── 02_income_groups_and_carbon_efficiency.ipynb
│   │   ├── 03_electricity_mix_and_energy_use.ipynb
│   │   └── 04_clustering_and_cluster_dynamics.ipynb
│   └── polars/
│       ├── 01_data_prep_and_core_relationships.ipynb
│       ├── 02_income_groups_and_carbon_efficiency.ipynb
│       ├── 03_electricity_mix_and_energy_use.ipynb
│       └── 04_clustering_and_dimensionality_reduction.ipynb
├── requirements.txt
└── README.md

Installation

Prerequisites

  • Python 3.10+

Setup

git clone https://github.com/fabianfrank-dev/carbon_analysis
cd carbon_analysis

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt

Running the Notebooks

jupyter notebook notebooks/data_analytics_project_pandas.ipynb

The two top-level notebooks in notebooks/ now act as lightweight indexes:

  • notebooks/data_analytics_project_pandas.ipynb
  • notebooks/data_analytics_project_polars.ipynb

From those index notebooks you can open the smaller thematic parts, or open a part directly if you only want one section of the analysis.

Reproducibility Notes

  • Some notebook sections rely on live external data fetching such as Wikipedia tables, Our World in Data CSV endpoints, or World Bank API responses.
  • Because of that, internet access may be required to run the full notebook end to end.
  • External tables or APIs may change structure over time, so some ingestion steps may need small adjustments in the future.
  • Income groups are assigned using the available GNI-per-capita source used in the notebook; this is useful for comparison, but should not be interpreted as a perfect historical classification for every year in the panel.

Why This Project Fits Data Analytics

This repository is intended to demonstrate:

  • multi-source dataset handling
  • reshaping and merging on composite keys
  • groupby-based transformation and aggregation
  • feature engineering for comparability
  • visual communication of quantitative patterns
  • exploratory clustering and preprocessing workflows
  • careful interpretation of real-world, imperfect data

Next Steps

Planned improvements include:

  • strengthening the missing-data and preprocessing section
  • improving documentation of the more advanced exploratory analyses
  • refining the clustering / dimensionality-reduction interpretation
  • expanding reproducibility and dependency documentation
  • polishing the notebook narrative for professor review

On the usage of AI

I have used AI to help me with this project. AI was used in order to help me understand the data, economic backgrounds and interpret some plots. It was also used to help me with debugging and giving me advice on how to improve my code quality. Furthermore, I have brainstormed with AI to come up with new ideas and approaches to the problem. However, I have always made sure that I understand the code and the reasoning behind it before implementing it. I have also made sure that I am not plagiarizing and that the work is my own. In addition to that AI was used in case documentation of certain libraries wasn't clear, for example while migrating my codebase to polars from pandas, due to some functionalities having different names I asked questions like "What is the polars equivalent to melt/dropna/etc" Although the thoughts in the analysis were my own, I also used AI to improve my sentences and be more clear about what I meant.