Back to Projects
demoDatadataml

Housing Price Drivers Analysis

Urban data science study analyzing how school quality and crime rates influence housing prices, achieving 95% F1-score with Random Forest models.

2025-012025-05
95% F1-score accuracy

Key Highlights

  • Led 6-person team analyzing 30K+ records
  • Achieved 95% F1-score for price-category prediction
  • Identified price disparities up to 4.7× between neighborhoods

Overview

A comprehensive urban analytics project analyzing the relationship between school quality, crime rates, and housing prices across Atlanta ZIP codes.

Problem

Understanding what drives housing prices helps buyers make informed decisions and helps policymakers address inequities. This required integrating multiple data sources and building predictive models.

Solution

Led a team project that combined housing, crime, and education datasets to identify key price drivers and build accurate prediction models.

My Contributions

  • Led a 6-person team through the complete data science lifecycle
  • Integrated and cleaned 30K+ records from multiple sources
  • Performed spatial analysis across Atlanta ZIP codes
  • Built and tuned Random Forest, Logistic Regression, and KNN models
  • Achieved 95% F1-score for price-category prediction
  • Dataset & Evaluation

    Dataset: 30K+ records combining housing sales, crime statistics, and school quality metrics

    Evaluation: Train/test split with cross-validation, F1-score, precision, recall

    Results: Random Forest achieved 95% F1-score; identified price disparities up to 4.7× between neighborhoods

    Limitations

  • Analysis limited to Atlanta metro area
  • Historical data may not reflect current market conditions
  • School quality metrics rely on standardized test scores
  • Challenges & Tradeoffs

    Challenge: Merging datasets with different geographic granularities (street addresses vs ZIP codes vs school districts).

    Solution: Standardized to ZIP code level as the common denominator, accepting some loss of precision for data integration.