+22%
forecast accuracy
Demand and revenue forecasting
Multi-horizon forecasting with explicit calibration to promotion calendar, seasonal patterns, and external-event sensitivity. Integration with replenishment and capacity planning systems.
AI & Machine Learning · CORTEX
Production ML for forecasting, recommendations, churn, anomaly detection, and risk modeling — with calibrated probabilities, drift monitoring, and the equity-aware evaluation regulated industries actually require.
The problem
The familiar pattern: a forecasting model that beats the legacy heuristic by 8% on backtest, then drops to parity under real load because the data pipeline lags by two days. A churn model that hits 0.84 AUC on validation, then fails the fair-lending review because nobody stratified the eval. A recommendation engine that improves CTR but tanks downstream revenue. An anomaly detector that fires on every Tuesday morning batch job because the baseline excluded weekend traffic.
Prosigns ships predictive analytics with the operational reality as primary scope. Calibrated probabilities (not just point estimates). Drift monitoring against the eval set, not the training set. Equity-aware evaluation stratified across the subgroups regulators actually examine. Production-grade feature stores so the model sees the same features at training and serving time. The model that ships is the model that survives the audit.
Where it ships
Specific applications we’ve built and operated. Not speculative — every example below is grounded in a real shipped engagement.
+22%
forecast accuracy
Multi-horizon forecasting with explicit calibration to promotion calendar, seasonal patterns, and external-event sensitivity. Integration with replenishment and capacity planning systems.
Customer churn prediction with calibrated probabilities, intervention-cost-aware ranking, and lift-curve evaluation. Equity-aware where retention spend is regulated.
+47%
premium-tier conversion
Real-time personalization with merchant-curated guardrails, A/B test infrastructure, and explainability dashboards. Privacy-respecting by design with explicit consent management.
Streaming anomaly detection for fraud, operational health, and security signals. Two-tier ensembles with fast linear scoring on the hot path and deeper models for review queues.
Credit risk, claim risk, fraud risk — with SR 11-7-aligned model documentation, calibrated probabilities, adverse action reasoning where applicable, and disparate-impact testing in CI.
How we engage
Each phase has a deliverable, an owner, and an acceptance criterion. Not slogans — operating rules.
Curated eval set built before model selection — historical periods that represent the operating regime, edge cases the post-mortem corpus has documented, and equity-aware subgroup stratification. Eval is the contract; the model is what wins on the eval.
Every production model ships with probability calibration (Platt scaling, isotonic regression, or modern calibration techniques). Stakeholders get well-calibrated probabilities they can compose with downstream cost models, not raw scores that look like probabilities but aren't.
Features are computed once and consumed at training and serving with identical semantics. Drift monitoring runs continuously against the eval set; alerts fire when subgroup performance regresses, not just when aggregate metrics shift.
Production retraining cadence calibrated to the workload — weekly for fast-moving recommendation systems, quarterly for slower-moving forecasting. Each retrain runs the full eval gate; regressions block deployment. Quarterly model risk review with second-line where applicable.
Capabilities
Stack
Selected work
Common questions
Documentation produced as a side-effect of building, not a deliverable assembled before second-line review. Development methodology, conceptual soundness, performance metrics, validation results, and ongoing monitoring captured continuously. We co-author with the second-line MRM function rather than handing off a black box.
Stakeholders compose model output with downstream cost models — campaign spend per likely-churn customer, intervention cost per high-risk patient, fraud-investigation cost per flagged transaction. Raw model scores look like probabilities but aren't well-calibrated; downstream decisions made against uncalibrated scores systematically over- or under-act on borderline cases.
Disparate-impact testing in CI on every model change. Per-subgroup metrics surfaced rather than averaged. Prohibited-basis variable handling enforced at the data layer. Adverse action reasoning generated from the model rather than overlaid post-hoc. We've supported clients through fair-lending exam and CFPB inquiries.
Use case dependent. Prophet wins for medium-volume time series with clear seasonality and exogenous regressors. DeepAR / global models win when you have many related time series (per-SKU, per-region demand). Custom XGBoost / LightGBM with engineered features often beats both for irregular patterns. We benchmark on your workload before recommending.
Yes — when latency demands it. Streaming architectures on Kafka / Flink for sub-100ms decisions; batch / micro-batch for slower-moving workloads where the operational complexity of streaming isn't justified. We tell you which fits during discovery.
Discovery + eval-set curation: 3–5 weeks, $50K–$120K. Single-use-case production deployment: 4–8 months, $300K–$1M. Multi-use-case ML platform builds: $1M–$3M. Managed Services for ongoing model operations: $30K–$150K monthly retainer.
Within AI & Machine Learning
Talk to us
A senior engineer plus the CORTEX department lead joins the first call. No discovery gauntlet, no junior reps.