From Forests to Funnels: A Complete ML Workflow Across Regression, Classification, and Churn Prediction

Week 16 of my data science internship at DataraFlow pushed everything to a new level. We moved beyond individual model building into end-to-end machine learning workflows; complete with multi-model comparisons, class imbalance handling, threshold tuning, and production deployment strategy. Here's how it all unfolded.

Overview

This week was structured in three layers of increasing complexity:

Part 1 (Tasks): Foundational model building - Random Forest regression, comprehensive metric evaluation, and binary spam classification.
Part 2 (Assignments): Comparative analysis across multiple models for house price prediction, imbalanced marketing campaign classification, and multi-class credit risk modeling.
Part 3 (Assessment): A full end-to-end churn prediction project with preprocessing pipelines, hyperparameter tuning, threshold optimization, feature importance analysis, and business recommendations.

Let's get into it.

Part 1 — The Foundations

Task 1: Random Forest Regression - Crop Yield Prediction

The first task introduced Random Forest Regression using a simple two-column dataset with a single Feature and a Target representing crop yield values.

from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import r2_score
from sklearn.model_selection import train_test_split

X = rf_data[['Feature']]
Y = rf_data['Target']

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=42)

rf_model = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)
rf_model.fit(X_train, Y_train)
Y_pred = rf_model.predict(X_test)

print(f'R² Score: {r2_score(Y_test, Y_pred):.3f}')  # R² = 0.878

Result: R² = 0.878; a strong result, especially with only 20 data points and a single predictor. The feature importance confirmed that Feature was the sole driver (importance = 1.0), which is expected in a univariate setup.

Key takeaway: Even on minimal data, Random Forest can capture non-linear relationships effectively, though with very few samples the model's generalizability should always be questioned.

Task 2: Comprehensive Model Evaluation - Salary Prediction

This task shifted focus from building to measuring, requiring all five standard regression metrics on a 3-feature salary dataset (Experience, Training Hours, Previous Projects).

salary_rf_model = RandomForestRegressor(n_estimators=50, random_state=42)
salary_rf_model.fit(X_train, Y_train)
sal_Y_pred = salary_rf_model.predict(X_test)

r2 = r2_score(Y_test, sal_Y_pred)
n, k = X_test.shape
adj_r2 = 1 - (1 - r2) * (n - 1) / (n - k - 1)
mae = mean_absolute_error(Y_test, sal_Y_pred)
rmse = np.sqrt(mean_squared_error(Y_test, sal_Y_pred))

Metric	Value
R² Score	0.9912
Adjusted R²	0.9890
MAE	2.1513
RMSE	2.5821

Near-perfect scores across the board. The model correctly learned that salary scales predictably with experience, training hours, and prior project exposure. The actual vs. predicted scatter plot showed points tightly clustering around the ideal diagonal line; a visual confirmation of model quality.

Why Adjusted R² matters: Unlike R², Adjusted R² penalizes unnecessary features. When you have multiple predictors, Adjusted R² tells the truer story by adjusting for the number of variables used.

Task 3: Binary Classification - Spam Detection

For spam detection, Logistic Regression was applied to a 100-sample dataset of emails with features like word count, link count, sender reputation, capital ratio, and exclamation count.

spamLog_reg = LogisticRegression(max_iter=1000, random_state=42)
spamLog_reg.fit(X_train, Y_train)
spamY_pred = spamLog_reg.predict(X_test)

Results: Accuracy, Precision, Recall, and F1-Score all returned 1.0; a perfect classifier on the test set.

While this sounds almost too good, the dataset was synthetically generated and well-separated. The confusion matrix heatmap showed zero misclassifications across both classes.

The metric interpretation discussion: In spam detection, Precision tends to be the more critical metric. A False Positive (classifying a legitimate email as spam) is far costlier than a False Negative (spam slipping through). Losing an important work email to a spam folder is a real consequence. However, a high F1-Score confirms both precision and recall are simultaneously strong; the ideal outcome.

Part 2 — Deeper Analysis and Comparisons

Assignment 1: Comparative Regression - House Price Prediction

This assignment compared three regression approaches on a 150-sample house price dataset with features including Square Footage, Bedrooms, Bathrooms, Age, Distance to City, Garage, and Pool.

EDA Insights:

The price distribution was approximately normal, peaking in the 310–350 range. The correlation heatmap told a clean story:

Square_Feet dominated with a correlation of 0.80
Age showed a moderate negative correlation of -0.33 (older = cheaper)
Distance_to_City had a mild negative effect of -0.17

Other features like bedrooms, bathrooms, and amenities offered incremental contributions.

Model Comparison:

# Three models trained and evaluated
lin_reg = LinearRegression()
house_dtree_reg = DecisionTreeRegressor(max_depth=10, random_state=42)
house_rf_reg = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)

Model	R²	Adj. R²	MAE	RMSE
Linear Regression	0.894	0.861	27.721	32.505
Random Forest	0.862	0.818	31.848	37.137
Decision Tree	0.563	0.424	53.913	66.076

The Surprising Winner: Linear Regression outperformed both ensemble methods.

This is the "when simpler is better" lesson in action. When the dominant relationship in data is predominantly linear (as Square Footage vs. Price clearly is), adding tree complexity doesn't help — it introduces minor bias and variance trade-offs that actually hurt performance. Random Forest shines when relationships are non-linear and feature interactions are complex. Here, they weren't.

Feature Importance (Random Forest):

Feature	Importance
Square_Feet	0.681
Age	0.161
Distance_to_City	0.063
Bathrooms	0.038
Bedrooms	0.028
Pool	0.018
Garage	0.011

Square footage alone accounts for 68% of the model's predictive power. Size is king.

Assignment 2: Binary Classification - Marketing Campaign Conversion

This assignment tackled class imbalance head-on. The dataset of 300 customers had a striking imbalance: 87% responded (Class 1) vs 13% did not respond (Class 0).

Two Logistic Regression models were compared:

# Model A: Default
log_modelA = LogisticRegression(random_state=42, max_iter=1000)

# Model B: Class-balanced
log_modelB = LogisticRegression(class_weight='balanced', random_state=42, max_iter=1000)

Results Comparison:

Metric	Model A (Default)	Model B (Balanced)
Accuracy	85.33%	64.00%
Precision	92.75%	95.74%
Recall	91.43%	64.29%
F1-Score	92.09%	76.92%
AUC	0.637	0.631

Key Insight: class_weight='balanced' is designed for severe imbalance. With an 87/13 split, the imbalance is real but not extreme. Forcing balance caused the model to sacrifice recall significantly. Model A naturally benefits from the class distribution and performs better across all key metrics.

Exploratory boxplots revealed that responders tended to be younger, higher earners, with longer membership, more prior purchases, and stronger digital engagement (more email opens, more website visits). These are the segments to target with personalized marketing.

Business Recommendation: Deploy Model A for campaign targeting. Focus on younger, digitally active, high-income customers with established membership history.

Assignment 3: Multi-Class Classification - Credit Risk Prediction

Three risk levels were predicted (Low, Medium, High) across 400 customer applications using features like Credit Score, Income, Debt-to-Income ratio, Employment Years, and Previous Defaults.

Class Distribution:

Medium Risk: 40.5%
Low Risk: 32.0%
High Risk: 27.5%

Relatively balanced — no aggressive resampling needed.

Three classifiers were trained:

log_model = OneVsRestClassifier(LogisticRegression(random_state=42, max_iter=1000))
credit_dt_model = DecisionTreeClassifier(max_depth=10, random_state=42)
rf_model = RandomForestClassifier(n_estimators=100, max_depth=10, random_state=42)

Accuracy Summary:

Model	Overall Accuracy
Logistic Regression	78.33%
Random Forest	77.50%
Decision Tree	75.00%

All three models struggled most with the Medium Risk class; the transitional zone between Low and High where feature patterns overlap significantly.

Model choice depends on business priority:

Logistic Regression if High Risk detection is the non-negotiable priority (92% recall for High Risk)
Random Forest if balanced performance across all three risk levels is needed in production

Feature Importance (Random Forest):

Feature	Importance
Credit_Score	0.2046
Previous_Defaults	0.1926
Debt_to_Income	0.1644
Credit_History_Length	0.0953
Income	0.0947
Age	0.0898
Employment_Years	0.0818
Loan_Amount	0.0769

Credit behavior and repayment history dominate. The actual loan amount is the least predictive; it's not what you borrow but how reliably you've repaid in the past.

Part 3 — End-to-End Assessment: Customer Churn Prediction

This was the capstone; a full production-style ML workflow for a telecommunications company seeking to predict and prevent customer churn.

The Dataset

500 customer records with 19 features:

Demographics: Age, Gender
Account: Tenure, Contract Type, Payment Method
Services: Internet, Streaming, Online Security, Tech Support
Engagement: Support Calls, Customer Satisfaction Score
Target: Churn (0 = Active, 1 = Churned)

Churn rate: 45.6%; nearly balanced, but with real class signal.

Phase 1: EDA Insights

Key churn patterns from the data:

Month-to-month contracts had significantly higher churn than one or two-year commitments
Fiber optic customers churned more; likely a price-vs-value perception gap
Electronic check users showed elevated churn, suggesting billing friction
Two-year contract holders and bank transfer/mailed check payers were the most stable

Gender analysis showed females retained at a higher rate; males had a smaller gap between churn and no-churn counts.

Phase 2: Preprocessing Pipeline

A ColumnTransformer + Pipeline approach handled encoding and scaling in a clean, reproducible way:

preprocessor = ColumnTransformer(transformers=[
    ('num', StandardScaler(), numeric_features),
    ('cat', OneHotEncoder(drop='first'), categorical_features),
    ('bin', 'passthrough', binary_features)
])

pipeline_log = Pipeline(steps=[
    ('preprocessor', preprocessor),
    ('classifier', LogisticRegression(random_state=42, max_iter=1000))
])

Pipelines are production-friendly: they prevent data leakage (fit only on training data, transform test data), and they make deployment straightforward.

Phase 3: Model Building and Comparison

Five models were built and evaluated:

Model	Accuracy	Churn Recall	Churn F1
Logistic Regression (Baseline)	0.63	0.52	0.56
Logistic Regression (GridSearchCV)	0.58	0.61	0.57
LR (Threshold = 0.3)	0.53	0.85	0.61
Random Forest	0.57	0.46	0.49
Decision Tree	0.56	0.54	0.53

The threshold trick: Instead of tuning model parameters, adjusting the decision threshold from 0.5 to 0.3 had the most dramatic impact. The model became much more sensitive to churn signals, achieving 85% recall at the cost of lower precision.

threshold = 0.3
Y_pred_thresh = (pipeline_log.predict_proba(X_test)[:, 1] >= threshold).astype(int)

Why this matters for business: In churn prediction, the asymmetry of error costs is critical. Missing a churner (false negative) means losing a customer entirely; a costly outcome. Incorrectly flagging a loyal customer (false positive) means sending them an unnecessary retention offer (a minor cost). Lowering the threshold is a deliberate business tradeoff, not a model failure.

Phase 4: Feature Importance

Using Logistic Regression coefficients to interpret drivers:

Feature	Coefficient	Direction
Two-Year Contract	-1.317	Reduces churn (strongest signal)
Streaming Movies	-0.588	Reduces churn
Fiber Optic Internet	+0.427	Increases churn
Credit Card Payment	+0.368	Increases churn
Customer Satisfaction Score	-0.323	Higher score = less churn
Support Calls	+0.251	More calls = more churn

The negative coefficient on Two-Year contracts is the standout; it's the largest single driver by magnitude. Locking customers into longer commitments is the most powerful lever for retention.

Phase 5: Business Recommendations

Six targeted actions emerged directly from the model insights:

1. Push Two-Year Contract Upgrades Month-to-month customers are the highest churn risk. Offer discounted upgrades with loyalty perks. Expected impact: significant reduction in price-sensitive churners.

2. Address Fiber Optic Dissatisfaction Fiber customers churn more; likely a perceived value gap. Launch targeted satisfaction surveys and offer loyalty discounts or service quality improvements.

3. Migrate Electronic Check Users to Auto-Pay Billing friction drives churn. Incentivize auto-pay adoption (bank transfer or credit card) with a small monthly discount.

4. Proactively Engage Low-Satisfaction Customers Customer Satisfaction Score is among the strongest churn predictors. Trigger retention outreach when satisfaction scores drop below 3/5; before the decision to leave is made.

5. Bundle Tech Support and Online Security Customers without these add-ons show higher churn. Offer free trials or discounted bundles to increase product attachment.

6. Proactive Callback for High-Support Customers Customers with 3+ support calls are frustrated. Implement a dedicated outreach program for this segment, offering account managers and service credits.

Phase 5: Implementation Plan

A production deployment would follow this structure:

Retraining: Quarterly, plus trigger-based retraining if recall or AUC drops below thresholds on live data
Monitoring metrics: Churn recall, precision, F1, AUC-ROC monthly; data drift detection on feature distributions
Business impact measurement: A/B testing (treatment vs. control groups), revenue retained calculation, campaign ROI tracking
Next steps: Collect more data (1,000+ records), engineer time-series behavioral features, explore XGBoost and LightGBM for improved accuracy

The Week's Biggest Lessons

1. Simpler models often win when the signal is linear. Linear Regression outperformed Random Forest on house prices. Never assume complexity equals performance.

2. Class imbalance handling requires judgment, not automation. class_weight='balanced' hurt the marketing model because the imbalance wasn't severe enough. Always check whether balancing actually improves the metrics that matter for your use case.

3. Threshold tuning is an underrated lever. Adjusting the decision threshold from 0.5 to 0.3 achieved 85% recall on churn without any architectural changes. Sometimes the right tool is already in your pipeline; you just need to calibrate it.

4. Feature importance is a business conversation starter. Credit Score, Previous Defaults, and Debt-to-Income ratio drove credit risk. Two-Year contracts and Customer Satisfaction drove churn. These aren't just model outputs; they're strategic insights that tell retention, risk, and product teams where to focus.

5. Pipelines are the bridge between experimentation and production. Wrapping preprocessing and modeling in Pipeline objects prevents data leakage, simplifies deployment, and makes the workflow reproducible. It's a habit worth building from day one.

Final Thoughts

Week 16 didn't just add more models to the toolkit; it forced a shift in thinking from "build a model" to "solve a business problem." Every metric decision, threshold choice, and feature insight was anchored to a real-world consequence: reducing churn, flagging credit risk, catching spam, or understanding what drives house prices.

The next stage involves deploying these insights beyond notebooks; into APIs, dashboards, and real-time scoring systems. That's where the real work begins.

If you found this useful, follow along for more weekly deep-dives into data science at DataraFlow.

From Forests to Funnels: A Complete ML Workflow Across Regression, Classification, and Churn Prediction

Overview

Part 1 — The Foundations

Task 1: Random Forest Regression - Crop Yield Prediction

Task 2: Comprehensive Model Evaluation - Salary Prediction

Task 3: Binary Classification - Spam Detection

Part 2 — Deeper Analysis and Comparisons

Assignment 1: Comparative Regression - House Price Prediction

Assignment 2: Binary Classification - Marketing Campaign Conversion

Assignment 3: Multi-Class Classification - Credit Risk Prediction

Part 3 — End-to-End Assessment: Customer Churn Prediction

The Dataset

Phase 1: EDA Insights

Phase 2: Preprocessing Pipeline

Phase 3: Model Building and Comparison

Phase 4: Feature Importance

Phase 5: Business Recommendations

Phase 5: Implementation Plan

The Week's Biggest Lessons

Final Thoughts

Comments

More from this blog

I Built a Multi-Agent AI System That Analyses Amazon Product Reviews

Clustering, NLP, and Dimensionality Reduction: When Your Data Has Stories to Tell

Decision Trees, Random Forests, and the Art of Knowing When Simple Is Enough

KNN, SVM, and Naive Bayes: A Week of Classification Algorithms, Perfect Scores, and the Lesson Behind Them

Command Palette

Overview

Part 1 — The Foundations

Task 1: Random Forest Regression - Crop Yield Prediction

Task 2: Comprehensive Model Evaluation - Salary Prediction

Task 3: Binary Classification - Spam Detection

Part 2 — Deeper Analysis and Comparisons

Assignment 1: Comparative Regression - House Price Prediction

Assignment 2: Binary Classification - Marketing Campaign Conversion

Assignment 3: Multi-Class Classification - Credit Risk Prediction

Part 3 — End-to-End Assessment: Customer Churn Prediction

The Dataset

Phase 1: EDA Insights

Phase 2: Preprocessing Pipeline

Phase 3: Model Building and Comparison

Phase 4: Feature Importance

Phase 5: Business Recommendations

Phase 5: Implementation Plan

The Week's Biggest Lessons

Final Thoughts

Comments

More from this blog