
DoorDash EDA Challenge
Led a data science project for DoorDash predicting delivery durations using a dataset of 200,000+ orders and 20+ columns. Utilized Python (Pandas, NumPy, Scikit-learn, Matplotlib, Seaborn, XGBoost) for data preprocessing, feature engineering, and modeling. Implemented an optimized XGBoost model achieving an average deviation of 13.87 minutes from actual delivery times, explaining 40% of variability. Uncovered insights: peak hours (7 PM - 4 AM) account for 90.4% of orders, with 177.2% higher weekend volume. Identified 6 markets with high peak-to-off-peak delivery time ratios (>75th percentile) and 6 with low ratios (<25th percentile). Created visualizations including correlation heatmaps, order density plots, and time series analysis. Findings informed market-specific strategies for staffing, pricing, and operational improvements, enhancing DoorDash's delivery efficiency.



