Deforestation-Risk-Classifier

๐ŸŒณ Deforestation Risk Classification Project


๐ŸŒ Why This Matters

Deforestation is one of the most pressing environmental challenges of our time. Every year, 10 million hectares of forestโ€”an area roughly the size of Icelandโ€”are lost to logging, agriculture, and urban expansion. This destruction doesnโ€™t just harm trees; it:

The problem? By the time we detect deforestation through satellite imagery, the damage is already done.

Our solution? Use machine learning to predict deforestation risk BEFORE it happens, allowing conservationists to intervene proactively rather than reactively.

This project demonstrates how data science can transform environmental protection from a reactive practice into a predictive, preventative strategy.


๐Ÿ“Š The Data

Dataset Overview

Key Features Used

After feature engineering and selection, we focused on:

Data Challenges


๐Ÿ”ฌ Methodology

1. Data Preprocessing

โœ“ Train/Test Split (80/20) with stratification
โœ“ Feature engineering (Latitude โ†’ Abs_Latitude)
โœ“ Feature selection (dropped 3 low-importance features)
โœ“ Standardization (for distance-based models)

2. Models Compared

We implemented and rigorously tested 5 different algorithms:

Model Type Complexity Best For
Logistic Regression Linear Low Baseline, interpretability
Linear Discriminant Analysis (LDA) Linear Low Small datasets, assumes normality
Random Forest Ensemble (Trees) High Non-linear patterns
Support Vector Machine (SVM) Kernel-based Medium Small datasets, clear margins
Neural Networks Deep Learning High Complex patterns (2 architectures tested)

3. Rigorous Evaluation Protocol

To ensure fair comparison and prevent data leakage, we followed academic best practices:

Phase 1: Model Comparison (Cross-Validation Only)

Phase 2: Final Model Selection

Phase 3: Final Evaluation

4. Special Considerations for Small Datasets

Given our limited data (n=138), we took special precautions:


๐Ÿ“ˆ Model Performance

Cross-Validation Results (Training Data)

Model Recall Precision F1-Score Accuracy
SVM โญ 80.0% 68.5% 72.3% 81.9%
LDA 80.0% 65.4% 70.8% 79.1%
Logistic Regression 80.0% 46.5% 58.8% 62.8%
Random Forest 56.0% 67.3% 57.7% 76.3%
Neural Network (v1) 56.0% 65.5% 54.8% 74.0%
Neural Network (v2) 56.0% 51.7% 52.0% 67.6%

โญ Winner: Linear SVM - Best overall F1-score and balanced performance (but in the end it is all contextual, we are just talking metrics for our main and target class aka the class 1)

Key Insights

โœ… What Worked:

โŒ What Didnโ€™t Work:

๐Ÿ” Domain Context:


๐Ÿ› ๏ธ Technologies Used

Core Libraries

Interactive Components

Development Tools

Key Techniques


๐ŸŽฎ Interactive App: Model Arena

Want to play with the models and predict deforestation risk yourself?

We built a fun, interactive Streamlit app with two modes:

๐ŸฅŠ Mode 1: Model Arena

โ€œPick your fighters and watch them battle!โ€

Perfect for understanding model tradeoffs!

๐Ÿ”ฎ Mode 2: Risk Predictor

โ€œPredict deforestation risk for any region!โ€

Perfect for testing hypothetical scenarios!

How to Launch

streamlit run model_arena_app.py

Or just double-click:

./launch_arena.sh

Opens in browser at localhost


Project Files:


**Built with ๐Ÿ’š for the Planet**