Machine Learning, Data Science and AI Portfolio
This portfolio represents a comprehensive collection of predictive modeling, statistical analysis, and exploratory data science projects. Each repository focuses on extracting actionable insights from complex datasets.
π Featured Projects
π Deforestation Risk Classifier (Final Synthesis)
- Overview: The culmination of the classification series, synthesizing insights across linear models, tree-based ensembles, and deep learning to establish the most robust, deployment-ready ecological early-warning system.
- Technical Highlights: Comprehensive algorithm benchmarking, rigorous independent test-set validation, and translating mathematical metrics (Recall vs. Precision) into actionable environmental policy recommendations.
- Index & Deliverables:
π€ Deforestation Multi-Model Classification
- Overview: Stress-testing diverse machine learning architectures (Random Forest, XGBoost, SVM, and Neural Networks) to uncover the true underlying geometry of global deforestation threats.
- Technical Highlights: Model-specific preprocessing pipelines, stratified cross-validation, Neural Network early stopping (val_loss), and minority-sensitive F1-score evaluation.
- Index & Deliverables:
π² LDA & Decision Trees: Deforestation Risk
- Overview: A comparative study of Linear Discriminant Analysis (LDA) and Decision Trees to identify deforestation drivers and test if contrasting mathematical algorithms converge on the same ecological signals.
- Technical Highlights: LDA assumption auditing (multicollinearity, KDE plots), Discriminant Scaling Analysis, leakage-free scaling, and Gini Feature Importance extraction.
- Index & Deliverables:
π― Deforestation Logistic: Critical Risk Classification
- Overview: A binary classification approach using Logistic Regression to predict critical deforestation risk based on socioeconomic indicators and ecologically validated thresholds.
- Technical Highlights: Target variable engineering (literature-validated cutoffs), class imbalance handling (class_weight=βbalancedβ), leakage-free pipelines, and operational decision threshold analysis to maximize crisis detection.
- Index & Deliverables:
- Overview: A predictive analysis of global deforestation drivers and forest cover loss trends.
- Technical Highlights: Implemented linear regression models and designed a data-driven approach to environmental monitoring.
- Index & Deliverables:
- Overview: Investigation of socioeconomic factors (GDP, social support, life expectancy) and their impact on national happiness scores.
- Technical Highlights: Multi-variable linear regression and statistical significance testing.
- Index & Deliverables:
- Overview: A deep dive into health metrics and lifestyle habits to identify patterns in obesity across different demographics.
- Technical Highlights: Advanced data cleaning, feature correlation matrices, and multivariate visualization.
- Index & Deliverables:
- Overview: Utilizing Ordinary Least Squares (OLS) regression to predict student outcomes based on study habits and external variables.
- Technical Highlights: Model validation using R-squared metrics and residual analysis.
- Index & Deliverables:
- GitHub: rebeca-bc
- University: Universidad de Monterrey (UDEM)
- Linkedin: www.linkedin.com/in/rebecaborregoc