- MLflow + FastAPI + $12 DigitalOcean droplet serves 70k predictions/month at $2,100 revenue with 18ms median latency
- Gradient boosting (AUC 0.83) beats deep learning for tabular churn prediction on 50k samples with 2-minute training time
- Feature engineering (MRR per tenure month, tickets per month) improved AUC from 0.72 to 0.83—more impactful than model architecture
- Stripe webhooks + tiered pricing ($29-$199/month) is simpler than AWS usage metering for side project billing
- Skip Kubernetes, dashboards, and free tiers early—boring infrastructure and paid-only access made the project profitable
The Stack That Actually Works
I’ll cut to the chase: MLflow tracking + FastAPI serving + DigitalOcean droplet + Stripe webhooks. That’s the blueprint. The model is a gradient boosting classifier predicting customer churn for small SaaS companies at $0.03 per prediction. Monthly revenue hovers around $2,100 with 70k API calls.
Why this stack? Because it’s boring, well-documented, and doesn’t break at 3am. I tried the “modern” approach first—containerized everything with Kubernetes, set up auto-scaling, added Prometheus metrics. Burned through my first month’s revenue on infrastructure costs before a single paying customer.
Here’s the actual production setup running right now.
# app/main.py
from fastapi import FastAPI, HTTPException, Depends, Header
from pydantic import BaseModel, Field
import mlflow
import numpy as np
from typing import Optional
import hashlib
import time
from functools import lru_cache
app = FastAPI(title="Churn Prediction API", version="1.2.3")
# Load model once at startup - not on every request
@lru_cache(maxsize=1)
def get_model():
model_uri = "models:/churn-predictor/production"
model = mlflow.pyfunc.load_model(model_uri)
return model
class ChurnRequest(BaseModel):
customer_id: str
monthly_charges: float = Field(..., gt=0, description="MRR in USD")
tenure_days: int = Field(..., ge=0)
support_tickets: int = Field(default=0, ge=0)
feature_usage_pct: float = Field(..., ge=0, le=100)
payment_failures: int = Field(default=0, ge=0)
class Config:
json_schema_extra = {
"example": {
"customer_id": "cus_ABC123",
"monthly_charges": 49.99,
"tenure_days": 180,
"support_tickets": 3,
"feature_usage_pct": 42.5,
"payment_failures": 1
}
}
class ChurnResponse(BaseModel):
customer_id: str
churn_probability: float
risk_category: str # low, medium, high
model_version: str
inference_time_ms: float
# Dead simple API key check - good enough for now
async def verify_api_key(x_api_key: str = Header(...)):
# In production this hits Redis with hashed keys
# For the first 3 months I just had a hardcoded set in an env var
if not hashlib.sha256(x_api_key.encode()).hexdigest().startswith("a7b3c"):
raise HTTPException(status_code=401, detail="Invalid API key")
return x_api_key
@app.post("/predict", response_model=ChurnResponse)
async def predict_churn(
request: ChurnRequest,
api_key: str = Depends(verify_api_key)
):
start = time.perf_counter()
model = get_model()
# Feature engineering - same transformations as training
features = np.array([[
request.monthly_charges,
request.tenure_days,
request.support_tickets,
request.feature_usage_pct / 100.0, # normalize
request.payment_failures,
request.monthly_charges / max(request.tenure_days / 30, 1), # MRR per month tenure
1 if request.payment_failures > 0 else 0, # had_payment_issue flag
request.support_tickets / max(request.tenure_days / 30, 1) # tickets per month
]])
# MLflow models return weird shapes sometimes - this shouldn't happen but it does
pred = model.predict(features)
if pred.ndim > 1:
pred = pred[0]
churn_prob = float(pred[0]) if hasattr(pred[0], '__float__') else float(pred)
# Business logic thresholds - tuned based on client feedback
if churn_prob < 0.3:
risk = "low"
elif churn_prob < 0.6:
risk = "medium"
else:
risk = "high"
elapsed_ms = (time.perf_counter() - start) * 1000
return ChurnResponse(
customer_id=request.customer_id,
churn_probability=round(churn_prob, 4),
risk_category=risk,
model_version=model.metadata.run_id[:8], # shortened for cleanliness
inference_time_ms=round(elapsed_ms, 2)
)
@app.get("/health")
async def health_check():
# Actually test model loading - not just return 200
try:
model = get_model()
return {"status": "healthy", "model_loaded": True}
except Exception as e:
raise HTTPException(status_code=503, detail=f"Model load failed: {str(e)}")
The model loads once at startup via lru_cache. First version loaded on every request—median latency was 340ms. Now it’s 18ms.

Why MLflow Instead of Just Pickling Models
Because you will retrain. And when you do, you need to know exactly which features, hyperparameters, and data version produced the current production model.
MLflow gives you that automatically. Here’s the training script:
# training/train.py
import mlflow
import mlflow.sklearn
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import roc_auc_score, precision_recall_curve
import pandas as pd
import numpy as np
mlflow.set_tracking_uri("http://localhost:5000") # MLflow server
mlflow.set_experiment("churn-prediction")
def engineer_features(df):
"""Same transformations as API - keep this DRY in real projects"""
df['mrr_per_tenure_month'] = df['monthly_charges'] / np.maximum(df['tenure_days'] / 30, 1)
df['had_payment_issue'] = (df['payment_failures'] > 0).astype(int)
df['tickets_per_month'] = df['support_tickets'] / np.maximum(df['tenure_days'] / 30, 1)
df['usage_normalized'] = df['feature_usage_pct'] / 100.0
return df
def train_model(data_path, n_estimators=200, learning_rate=0.1, max_depth=5):
with mlflow.start_run(run_name=f"gbm_n{n_estimators}_lr{learning_rate}"):
# Log parameters
mlflow.log_param("n_estimators", n_estimators)
mlflow.log_param("learning_rate", learning_rate)
mlflow.log_param("max_depth", max_depth)
mlflow.log_param("data_source", data_path)
# Load and split data
df = pd.read_csv(data_path)
mlflow.log_param("n_samples", len(df))
mlflow.log_param("churn_rate", df['churned'].mean())
df = engineer_features(df)
feature_cols = [
'monthly_charges', 'tenure_days', 'support_tickets',
'usage_normalized', 'payment_failures', 'mrr_per_tenure_month',
'had_payment_issue', 'tickets_per_month'
]
X = df[feature_cols]
y = df['churned']
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42, stratify=y
)
# Train model
model = GradientBoostingClassifier(
n_estimators=n_estimators,
learning_rate=learning_rate,
max_depth=max_depth,
random_state=42
)
model.fit(X_train, y_train)
# Evaluate
y_pred_proba = model.predict_proba(X_test)[:, 1]
auc = roc_auc_score(y_test, y_pred_proba)
# Find optimal threshold for F1
precision, recall, thresholds = precision_recall_curve(y_test, y_pred_proba)
f1_scores = 2 * (precision * recall) / (precision + recall + 1e-8)
best_threshold = thresholds[np.argmax(f1_scores)]
mlflow.log_metric("test_auc", auc)
mlflow.log_metric("best_f1_threshold", best_threshold)
mlflow.log_metric("max_f1", np.max(f1_scores))
# Log feature importances
feature_importance = pd.DataFrame({
'feature': feature_cols,
'importance': model.feature_importances_
}).sort_values('importance', ascending=False)
mlflow.log_text(feature_importance.to_string(), "feature_importances.txt")
# Log model
mlflow.sklearn.log_model(
model,
"model",
registered_model_name="churn-predictor"
)
print(f"Run ID: {mlflow.active_run().info.run_id}")
print(f"Test AUC: {auc:.4f}")
print(f"Best F1: {np.max(f1_scores):.4f} at threshold {best_threshold:.3f}")
return mlflow.active_run().info.run_id
if __name__ == "__main__":
run_id = train_model(
"data/churn_training_2026_01.csv",
n_estimators=300,
learning_rate=0.05,
max_depth=6
)
Every training run logs hyperparameters, metrics, and the model artifact. When you want to promote a model to production:
import mlflow
from mlflow.tracking import MlflowClient
client = MlflowClient()
# Promote specific run to production
run_id = "a7f3c9d2e1b8"
model_uri = f"runs:/{run_id}/model"
result = mlflow.register_model(model_uri, "churn-predictor")
version = result.version
# Transition to production
client.transition_model_version_stage(
name="churn-predictor",
version=version,
stage="Production"
)
The FastAPI app automatically picks up the new production model on next startup. No manual file copying, no “which pickle was production again?”
The Math That Actually Matters
Churn prediction is binary classification. The model outputs a probability where is the feature vector. Gradient boosting builds an ensemble of weak learners (decision trees) by iteratively minimizing the loss:
That’s binary cross-entropy loss. Each new tree is fit to the negative gradient of this loss, and the final prediction is:
where is the learning rate (I use 0.05) and is the number of trees (300 in production).
But here’s what actually moves the needle: feature engineering. The raw features have AUC ~0.72. Adding mrr_per_tenure_month (monthly revenue normalized by tenure) bumps it to 0.79. Adding tickets_per_month gets to 0.83. The model architecture is less important than the features you feed it.
ROC-AUC is fine for model comparison, but clients care about precision at high probability thresholds. If you predict 80% churn probability, you better be right 85%+ of the time, or they’ll stop trusting the API. I track precision-at-90 as the key metric:
Current production model: 0.87 precision at 90% threshold. Good enough that clients act on those predictions.

Deployment: DigitalOcean Over AWS
$12/month droplet. 2 vCPUs, 2GB RAM. Ubuntu 22.04, nginx reverse proxy, systemd service for FastAPI.
Why not AWS Lambda? Cold starts. The MLflow model takes 800ms to load. Lambda’s cold start + model load = 3-5 second response times. Clients won’t wait.
Why not ECS/Fargate? Overkill. Traffic is predictable—mostly 9am-5pm EST weekdays. A single always-on server handles 70k monthly requests without breaking a sweat.
Deployment script:
#!/bin/bash
# deploy.sh
set -e
SSH_HOST="churn-api.example.com"
SSH_USER="deploy"
echo "Building Docker image..."
docker build -t churn-api:latest .
echo "Saving image..."
docker save churn-api:latest | gzip > churn-api.tar.gz
echo "Uploading to server..."
scp churn-api.tar.gz $SSH_USER@$SSH_HOST:/tmp/
echo "Loading and restarting on server..."
ssh $SSH_USER@$SSH_HOST << 'EOF'
cd /tmp
docker load < churn-api.tar.gz
docker stop churn-api || true
docker rm churn-api || true
docker run -d \
--name churn-api \
--restart unless-stopped \
-p 8000:8000 \
-v /opt/mlflow-models:/models:ro \
-e MLFLOW_TRACKING_URI=http://localhost:5000 \
churn-api:latest
rm churn-api.tar.gz
EOF
echo "Deployed successfully"
MLflow tracking server runs in a separate container on the same droplet. Models are stored in a volume mount. Not fancy, but it works.
Monetization: Stripe Webhooks Over Usage Metering
I tried AWS API Gateway usage plans first. The billing integration was a nightmare—had to poll CloudWatch metrics, reconcile with customer records, manually generate invoices.
Stripe is stupid simple:
# billing/stripe_handler.py
from fastapi import FastAPI, Request, HTTPException
import stripe
import os
stripe.api_key = os.getenv("STRIPE_SECRET_KEY")
app = FastAPI()
@app.post("/stripe-webhook")
async def stripe_webhook(request: Request):
payload = await request.body()
sig_header = request.headers.get("stripe-signature")
endpoint_secret = os.getenv("STRIPE_WEBHOOK_SECRET")
try:
event = stripe.Webhook.construct_event(
payload, sig_header, endpoint_secret
)
except ValueError:
raise HTTPException(status_code=400, detail="Invalid payload")
except stripe.error.SignatureVerificationError:
raise HTTPException(status_code=400, detail="Invalid signature")
if event["type"] == "invoice.payment_succeeded":
customer_id = event["data"]["object"]["customer"]
# Reset rate limit quota for customer
# (simplified - real version updates Redis)
print(f"Payment succeeded for {customer_id}")
elif event["type"] == "invoice.payment_failed":
customer_id = event["data"]["object"]["customer"]
# Suspend API access
print(f"Payment failed for {customer_id} - suspending access")
return {"status": "success"}
Pricing tiers:
– $29/month: 1,000 predictions
– $79/month: 3,000 predictions
– $199/month: 10,000 predictions
– Enterprise: custom (largest client is 25k/month at $450)
Most customers are on the $79 tier. The economics work because inference is cheap—compute cost is ~$0.0003 per prediction (model is only 14MB, inference is 18ms). Stripe takes $0.30 + 2.9%, so on a $2,1000 charge I net $2,1001. Subtract $2,1002 for hosting, I’m at $2,1003 gross margin per customer.
40 paying customers = ~$2,1004/month revenue. Not life-changing, but it covers rent.
What Doesn’t Scale (And Why I Don’t Care Yet)
The API key verification is hilariously insecure by enterprise standards. It’s a sha256 hash prefix check against a hardcoded value. A proper system would use Redis with short-lived JWTs, key rotation, and rate limiting per customer.
But building that would’ve taken two weeks. Current system took 45 minutes and has had zero security incidents in 8 months. When I hit 100 customers, I’ll refactor.
Monitoring is just nginx access logs + a daily cron that greps for 5xx errors and emails me. No Datadog, no Grafana dashboards, no on-call rotation. If the server goes down, I get a Pingdom alert. Uptime is 99.4% over the last 6 months—good enough.
The model retraining is manual. I download fresh data monthly, run train.py, eyeball the metrics, promote to production if AUC improves. A real MLOps setup would have automated retraining pipelines, A/B testing, gradual rollouts. That’s a future problem.
Mistakes I Made So You Don’t Have To
Month 1: Built a beautiful React dashboard for customers to visualize churn trends. Zero customers used it. They just wanted the raw API. Wasted 3 weeks.
Month 2: Tried to add a “why is this customer at risk?” explainability feature using SHAP values. Inference latency went from 18ms to 340ms. Customers complained. Rolled back.
Month 3: Offered a free tier (100 predictions/month). Got hammered by bot traffic and students using it for school projects. Killed the free tier, implemented API key verification. Revenue actually went up because serious users were happy to pay for reliability.
Month 5: Tried to expand to e-commerce churn prediction. Different feature set, different data distribution, model performed terribly (AUC 0.61). Stuck with SaaS churn—domain specificity matters more than I thought.
When This Approach Breaks Down
If you hit 1M+ requests/month, a single droplet won’t cut it. You’ll need horizontal scaling, which means:
– Load balancer (another $2,1005-20/month)
– Multiple API servers (2-3 droplets minimum for redundancy)
– Shared model storage (S3 or similar)
– Proper API key database (Redis or PostgreSQL)
At that point, just use AWS ECS or Google Cloud Run. The complexity tax is worth it.
If you need sub-10ms latency, you’ll need to optimize the model itself—switch from GBM to a lightweight neural network, quantize weights, use ONNX runtime instead of scikit-learn. But for most B2B SaaS use cases, 18ms is plenty fast.
If you need real-time retraining (model updates multiple times per day), MLflow’s model registry becomes a bottleneck. You’d want to look at feature stores (Feast, Tecton) and streaming pipelines (Kafka + Flink). Way beyond side project scope.
FAQ
Q: Why gradient boosting instead of deep learning?
GBM trains in 2 minutes on 50k samples, needs zero hyperparameter tuning, and is trivially interpretable (feature importances just work). A neural network would require GPU training, careful architecture search, and way more data. For tabular data under 100k rows, GBM wins every time.
Q: How do you handle model versioning in production?
MLflow’s model registry tracks every version. When I promote a new model to “Production” stage, the FastAPI app picks it up on next restart (usually Sunday night). I keep the last 3 production versions in the registry so I can roll back instantly if something breaks. Never had to roll back yet, but it’s comforting.
Q: What’s the hardest part of running this as a side project?
Customer support. Not the code, not the infrastructure—answering questions like “why did customer X get a different score today vs. yesterday?” The model is stochastic (different predictions on same input if you retrain), and explaining that to non-technical clients is harder than writing the FastAPI code. I now include a model_version field in every response so I can debug discrepancies.
What I’d Build Next (If I Had More Time)
A feedback loop. Right now, clients get predictions but I never learn if they were accurate. If I added a /feedback endpoint where clients could report actual churn outcomes, I could continuously retrain with live data and improve precision over time.
The math is straightforward—online learning with incremental model updates:
But the engineering is messier. You need a feedback database, a retraining pipeline, safeguards against poisoned data (what if a client reports fake outcomes to game the system?). It’s on the roadmap for Q2 2026.
The other big gap: uncertainty quantification. Right now the model outputs a point estimate. Ideally it would output a confidence interval—”75% churn probability ± 8%”. Conformal prediction could do this without retraining the model, just requires calibration on a holdout set. I’m curious if clients would actually use that information or if it’d just add noise.
Did you find this helpful?
☕ Buy me a coffee
Leave a Reply