Key Highlights
- 65% high-confidence customer geolocation coverage
- Derived 6 sales territories capturing 75% of revenue
- End-to-end pipeline from QuickBooks exports to ML analysis
Overview
A comprehensive data analytics project that transforms raw financial data into actionable business intelligence through geospatial clustering and time-series analysis.
Problem
Raw QuickBooks exports contain valuable business data but require significant processing to extract insights about customer distribution, revenue patterns, and optimal sales territories.
Solution
Built an end-to-end analytics pipeline that normalizes raw data, recovers customer locations, identifies sales territories, and analyzes seasonal revenue patterns.
My Contributions
Technical Details
Python with Pandas handles data wrangling and transformation. GeoPandas enables spatial analysis and visualization. Scikit-learn provides clustering algorithms. SQL manages the normalized relational structure.
Dataset & Evaluation
Dataset: Real QuickBooks transaction exports with customer and invoice data
Evaluation: Hopkins statistic for cluster tendency, silhouette scores for cluster quality
Results: 6 distinct territories, one region accounting for ~75% of revenue
Challenges & Tradeoffs
Challenge: Many customer records lacked explicit geolocation data.
Solution: Implemented multi-stage resolution using address parsing, zip code lookups, and city-level approximations, achieving 65% high-confidence coverage.