How do you handle SR 11-7 model risk management?

Documentation produced as a side-effect of building, not a deliverable assembled before second-line review. Development methodology, conceptual soundness, performance metrics, validation results, and ongoing monitoring captured continuously. We co-author with the second-line MRM function rather than handing off a black box.

Why calibrated probabilities specifically?

Stakeholders compose model output with downstream cost models — campaign spend per likely-churn customer, intervention cost per high-risk patient, fraud-investigation cost per flagged transaction. Raw model scores look like probabilities but aren't well-calibrated; downstream decisions made against uncalibrated scores systematically over- or under-act on borderline cases.

How do you handle equity / fair-lending in credit models?

Disparate-impact testing in CI on every model change. Per-subgroup metrics surfaced rather than averaged. Prohibited-basis variable handling enforced at the data layer. Adverse action reasoning generated from the model rather than overlaid post-hoc. We've supported clients through fair-lending exam and CFPB inquiries.

Forecasting — Prophet, DeepAR, or custom?

Use case dependent. Prophet wins for medium-volume time series with clear seasonality and exogenous regressors. DeepAR / global models win when you have many related time series (per-SKU, per-region demand). Custom XGBoost / LightGBM with engineered features often beats both for irregular patterns. We benchmark on your workload before recommending.

Can you do real-time / streaming inference?

Yes — when latency demands it. Streaming architectures on Kafka / Flink for sub-100ms decisions; batch / micro-batch for slower-moving workloads where the operational complexity of streaming isn't justified. We tell you which fits during discovery.

What does a predictive analytics engagement cost?

Discovery + eval-set curation: 3–5 weeks, $50K–$120K. Single-use-case production deployment: 4–8 months, $300K–$1M. Multi-use-case ML platform builds: $1M–$3M. Managed Services for ongoing model operations: $30K–$150K monthly retainer.

AI & Machine Learning · CORTEX

Predictive Analytics.

Production ML for forecasting, recommendations, churn, anomaly detection, and risk modeling — with calibrated probabilities, drift monitoring, and the equity-aware evaluation regulated industries actually require.

Practice: AI & Machine Learning
Department: CORTEX

The problem

Most predictive models look great in a notebook and break under operational reality.

The familiar pattern: a forecasting model that beats the legacy heuristic by 8% on backtest, then drops to parity under real load because the data pipeline lags by two days. A churn model that hits 0.84 AUC on validation, then fails the fair-lending review because nobody stratified the eval. A recommendation engine that improves CTR but tanks downstream revenue. An anomaly detector that fires on every Tuesday morning batch job because the baseline excluded weekend traffic.

Prosigns ships predictive analytics with the operational reality as primary scope. Calibrated probabilities (not just point estimates). Drift monitoring against the eval set, not the training set. Equity-aware evaluation stratified across the subgroups regulators actually examine. Production-grade feature stores so the model sees the same features at training and serving time. The model that ships is the model that survives the audit.

Where it ships

5 use cases, in production.

Specific applications we’ve built and operated. Not speculative — every example below is grounded in a real shipped engagement.

01
+22%
forecast accuracy
Demand and revenue forecasting
Multi-horizon forecasting with explicit calibration to promotion calendar, seasonal patterns, and external-event sensitivity. Integration with replenishment and capacity planning systems.
02
Churn and retention
Customer churn prediction with calibrated probabilities, intervention-cost-aware ranking, and lift-curve evaluation. Equity-aware where retention spend is regulated.
03
+47%
premium-tier conversion
Recommendation engines
Real-time personalization with merchant-curated guardrails, A/B test infrastructure, and explainability dashboards. Privacy-respecting by design with explicit consent management.
04
Anomaly detection
Streaming anomaly detection for fraud, operational health, and security signals. Two-tier ensembles with fast linear scoring on the hot path and deeper models for review queues.
05
Risk and credit modeling
Credit risk, claim risk, fraud risk — with SR 11-7-aligned model documentation, calibrated probabilities, adverse action reasoning where applicable, and disparate-impact testing in CI.

How we engage

4 phases, named in the SOW.

Each phase has a deliverable, an owner, and an acceptance criterion. Not slogans — operating rules.

01
Eval before model
Curated eval set built before model selection — historical periods that represent the operating regime, edge cases the post-mortem corpus has documented, and equity-aware subgroup stratification. Eval is the contract; the model is what wins on the eval.
02
Calibrated probabilities, not point estimates
Every production model ships with probability calibration (Platt scaling, isotonic regression, or modern calibration techniques). Stakeholders get well-calibrated probabilities they can compose with downstream cost models, not raw scores that look like probabilities but aren't.
03
Feature store + drift monitoring
Features are computed once and consumed at training and serving with identical semantics. Drift monitoring runs continuously against the eval set; alerts fire when subgroup performance regresses, not just when aggregate metrics shift.
04
Operate with retraining cadence
Production retraining cadence calibrated to the workload — weekly for fast-moving recommendation systems, quarterly for slower-moving forecasting. Each retrain runs the full eval gate; regressions block deployment. Quarterly model risk review with second-line where applicable.

Capabilities

What’s in scope.

Demand forecasting: ARIMA, Prophet, DeepAR, hierarchical reconciliation
Recommendation engines: collaborative filtering, neural ranking, vector retrieval
Churn / retention prediction with calibrated probabilities
Anomaly detection: streaming, two-tier ensembles, drift-aware
Risk modeling: credit, claim, fraud — with SR 11-7 documentation
Feature stores: feature parity training ↔ serving, with lineage
Drift monitoring with equity-aware subgroup stratification
A/B testing infrastructure for production model comparison

Stack

Tools we use in production.

ML frameworks: PyTorchscikit-learnLightGBMXGBoostJAX
Forecasting: ProphetDeepARNixtlaGluonTSStatsforecast
Recommendations: Vertex AI RecommendationsAWS PersonalizeTFXCustom embedding pipelines
Feature stores: TectonFeastAWS SageMaker Feature StoreCustom
ML platform: SageMakerVertex AIDatabricksMLflowWeights & Biases
Streaming: Apache KafkaAWS KinesisApache Flink

Selected work

Quantified outcomes, not adjectives.

All case studies

01Financial Services
+37%
fraud catch rate
Real-time fraud detection for a US regional bank.
Replaced a rules-based engine with a streaming ML pipeline on AWS. Reduced false positives 42% while raising true catches. SR 11-7-aligned governance frame, regulator-ready audit logs.
9 months

MLOps

Building the platform under the models?

Production predictive analytics rests on an MLOps substrate — feature stores, model registries, A/B testing, drift monitoring. Our MLOps practice co-staffs engagements when the substrate is part of scope.

MLOps & AI Infrastructure

Common questions

Asked before the first call.

01
How do you handle SR 11-7 model risk management?
Documentation produced as a side-effect of building, not a deliverable assembled before second-line review. Development methodology, conceptual soundness, performance metrics, validation results, and ongoing monitoring captured continuously. We co-author with the second-line MRM function rather than handing off a black box.
02
Why calibrated probabilities specifically?
Stakeholders compose model output with downstream cost models — campaign spend per likely-churn customer, intervention cost per high-risk patient, fraud-investigation cost per flagged transaction. Raw model scores look like probabilities but aren't well-calibrated; downstream decisions made against uncalibrated scores systematically over- or under-act on borderline cases.
03
How do you handle equity / fair-lending in credit models?
Disparate-impact testing in CI on every model change. Per-subgroup metrics surfaced rather than averaged. Prohibited-basis variable handling enforced at the data layer. Adverse action reasoning generated from the model rather than overlaid post-hoc. We've supported clients through fair-lending exam and CFPB inquiries.
04
Forecasting — Prophet, DeepAR, or custom?
Use case dependent. Prophet wins for medium-volume time series with clear seasonality and exogenous regressors. DeepAR / global models win when you have many related time series (per-SKU, per-region demand). Custom XGBoost / LightGBM with engineered features often beats both for irregular patterns. We benchmark on your workload before recommending.
05
Can you do real-time / streaming inference?
Yes — when latency demands it. Streaming architectures on Kafka / Flink for sub-100ms decisions; batch / micro-batch for slower-moving workloads where the operational complexity of streaming isn't justified. We tell you which fits during discovery.
06
What does a predictive analytics engagement cost?
Discovery + eval-set curation: 3–5 weeks, $50K–$120K. Single-use-case production deployment: 4–8 months, $300K–$1M. Multi-use-case ML platform builds: $1M–$3M. Managed Services for ongoing model operations: $30K–$150K monthly retainer.

Within AI & Machine Learning

Other capabilities in this practice.

Back to AI & Machine Learning

Talk to us

Bring a predictive analytics problem. We’ll bring a senior engineer.

A senior engineer plus the CORTEX department lead joins the first call. No discovery gauntlet, no junior reps.

Book a discovery call Request a proposal

What’s in scope.

Demand forecasting: ARIMA, Prophet, DeepAR, hierarchical reconciliation

Recommendation engines: collaborative filtering, neural ranking, vector retrieval

Churn / retention prediction with calibrated probabilities

Anomaly detection: streaming, two-tier ensembles, drift-aware

Risk modeling: credit, claim, fraud — with SR 11-7 documentation

Feature stores: feature parity training ↔ serving, with lineage

Drift monitoring with equity-aware subgroup stratification

A/B testing infrastructure for production model comparison

Tools we use in production.

ML frameworks

PyTorchscikit-learnLightGBMXGBoostJAX

Forecasting

ProphetDeepARNixtlaGluonTSStatsforecast

Recommendations

Vertex AI RecommendationsAWS PersonalizeTFXCustom embedding pipelines

Feature stores

TectonFeastAWS SageMaker Feature StoreCustom

ML platform

SageMakerVertex AIDatabricksMLflowWeights & Biases

Streaming

Apache KafkaAWS KinesisApache Flink

Most predictive models look great in a notebook and break under operational reality.

5 use cases, in production.

Demand and revenue forecasting

Churn and retention

Recommendation engines

Anomaly detection

Risk and credit modeling

Eval before model

Calibrated probabilities, not point estimates

Feature store + drift monitoring

Operate with retraining cadence

What’s in scope.

Tools we use in production.

Quantified outcomes, not adjectives.

Real-time fraud detection for a US regional bank.

Building the platform under the models?

How do you handle SR 11-7 model risk management?

Why calibrated probabilities specifically?

How do you handle equity / fair-lending in credit models?

Forecasting — Prophet, DeepAR, or custom?

Can you do real-time / streaming inference?

What does a predictive analytics engagement cost?

Other capabilities in this practice.

Bring a predictive analytics problem. We’ll bring a senior engineer.

Most predictive models look great in a notebook and break under operational reality.

5 use cases, in production.

Demand and revenue forecasting

Churn and retention

Recommendation engines

Anomaly detection

Risk and credit modeling

Eval before model

Calibrated probabilities, not point estimates

Feature store + drift monitoring

Operate with retraining cadence

What’s in scope.

Tools we use in production.

Quantified outcomes, not adjectives.

Real-time fraud detection for a US regional bank.

Building the platform under the models?

How do you handle SR 11-7 model risk management?

Why calibrated probabilities specifically?

How do you handle equity / fair-lending in credit models?

Forecasting — Prophet, DeepAR, or custom?

Can you do real-time / streaming inference?

What does a predictive analytics engagement cost?

Other capabilities in this practice.

Bring a predictive analytics problem. We’ll bring a senior engineer.