The Innovation Layer

Contextualized backtesting infrastructure that works with 5-10 real events. Three innovations that no existing PM analytics tool has.

1
Structural Event Clustering
Group events by behavioral fingerprint, not by topic. Turn 8 data points into 200+.
Building
2
LLM Counterfactual Expansion
Generate 50 synthetic scenarios per event, calibrated against real outcomes.
Planned
3
Cross-Vertical Transfer Learning
Use football (306 matches/season) to calibrate finance models (8 Fed meetings/year).
Planned
Why this matters

Hashdive shows you whale trades. Oddpool shows you cross-platform odds. Verso builds a terminal. Nobody lets you TEST whether a PM trading idea actually works — with proper out-of-sample validation and small-N compensation. That's the gap we fill.

The N-Problem: too few events to backtest

The most interesting prediction markets have the fewest data points. The markets with abundant data have the least institutional interest.

Bundesliga Matches
600+
YES
Poly Whale Signals
200+
YES
US Fed Decisions
~16
BARELY
CPI Releases
~24
MARGIN
NVIDIA Earnings
~8
NO
Tariff Announcements
3-5
NO
US Elections
2-3
NO

Data points since PM mainstream (~2024). The paradox: the most tradeable events have the least testable data. Our infrastructure solves this.

Structural Event Clustering

Don't group events by topic. Group them by how prediction markets behave around them. A tariff announcement and a coaching dismissal are structurally identical — both are surprise-binary-fast events.

Five clustering dimensions

Predictability

Scheduled (Fed, CPI, Matchday) vs. Surprise (Tariffs, Injuries, Sanctions)

Binariness

Clean Yes/No (Win/Lose, Rate Cut) vs. Continuous (S&P level, vote share)

Info Dynamics

Fast (injury: minutes) vs. Slow (championship race: months)

Market Depth

Deep (US Election: $3B+) vs. Thin (Bundesliga transfer: <$50K)

Resolution

Oracle-automated (match result, CPI) vs. Human judgment (ambiguous politics)

Cluster examples

Cluster A: Scheduled Macro
Scheduled · Binary · Fast · Deep · Automated
Members:
Fed, ECB, BOE, BOJ, RBA decisions, CPI, NFP, PMI
All central bank and macro data events. Same microstructure: known date, binary outcome space, pre-event compression, institutional positioning, media anticipation cycle.
N = 80-100/year globally
Cluster B: Scheduled Sports
Scheduled · Binary · Slow · Deep · Automated
Members:
Bundesliga, CL, Premier League, NBA, NFL matches
All scheduled sporting events with known participants. Predictable timing, binary resolution, deep markets for big leagues, automated oracle resolution.
N = 1000+/year
Cluster C: Surprise Political
Surprise · Binary · Fast · Thin · Judgment
Members:
Tariffs, sanctions, executive orders, regulatory actions, coach dismissals
Surprise political/organizational decisions with immediate market impact. No advance scheduling, fast information cascade, thin liquidity, often ambiguous resolution.
N = 30-50/year across domains
Cluster D: Scheduled Continuous
Scheduled · Continuous · Slow · Deep · Automated
Members:
NVIDIA/Apple/TSMC earnings, GDP, election vote shares
Known date but continuous outcome space. Market prices bracket ranges, not binary outcomes. Requires different backtesting approach.
N = 200+/year
The key insight

A "tariff announcement" with only 3-5 historical data points becomes backtestable when clustered with 30-50 structurally similar surprise-binary-fast events. The clustering turns untestable into testable.

LLM Counterfactual Expansion

Traditional synthetic data generates artificial price paths. We generate artificial event contexts — "what if the tariff was 25% instead of 10%?" — and model PM reactions.

1
Real Event
10% tariff, PM 35→72%
5
Parameters
Surprise · Magnitude · Retaliation · Media · Position
50
LLM Scenarios
Counterfactual contexts via Claude
50
Calibrated Reactions
Estimated PM price paths
Backtest
51 events from 1 real event

Why LLMs, not GANs?

Traditional synthetic data methods (GANs, Monte Carlo) operate on numbers. They generate plausible-looking price paths but cannot generate plausible-looking event contexts. A language model can understand that a 25% tariff is qualitatively different from 10% — not just numerically higher. It models second-order effects (retaliation chains), media reactions, and institutional positioning.

This is not hallucination. It is parametric scenario design with an LLM as domain expert, calibrated against real data points.

Calibration protocol

Synthetic scenarios are only valuable if they're calibrated against reality. For each cluster with 5+ real events: generate 50 counterfactuals per event, hold out 20% of real events as test set, train on 80% real + all synthetics, validate against held-out events. Track prediction error and publish calibration scores alongside all backtest results.

If the synthetic expansion doesn't improve predictions on held-out real events, it gets discarded. Radical transparency — the system earns trust through measurable calibration.

Cross-Vertical Transfer Learning

Certain PM behavioral patterns are domain-agnostic. Football (306 matches/season) calibrates finance models (8 Fed meetings/year). The high-N domain trains the low-N domain.

306
Football
matches per season
High-N calibration laboratory
Statistically robust patterns
22 club sites × 60 years of data
Measurable: pre-event compression, news latency, surprise overreaction, thin-market drift
→ → →
Transfer
Patterns
→ → →
8
Finance
Fed meetings per year
Low-N, high institutional interest
Insufficient data for standalone backtesting
But structurally similar PM dynamics
Hypothesis: same patterns with domain-specific adjustments

What transfers across verticals

Pre-Event Compression

PM spreads narrow 2-4h before kickoff as late info arrives. Same pattern before FOMC?

Measurable in football

News-to-Price Latency

Injury news: 15-45min to full pricing. Tariff tweet: similar latency adjusted for liquidity?

Measurable in football

Surprise Overreaction

Red card causes 15-20% overshoot that mean-reverts. Surprise tariff: same overshoot?

Hypothesis

Thin Market Drift

Low-liquidity transfer markets drift 5-8% on one large order over 24h. Niche politics: same?

Hypothesis

Expert vs Crowd Gap

kicker-Tipps vs Polymarket on Bundesliga. Superforecasters vs Polymarket on macro. Same gap structure?

Measurable both sides
The network effect

Every football prediction that runs through the system improves the accuracy of finance backtesting, and vice versa. This creates a compounding data asset that no single-vertical competitor can match.

Nobody is building the strategy layer

The PM infrastructure stack has three layers. Layer 1 (execution) and Layer 2 (data) are being built. Layer 3 (strategy) is wide open.

1
Execution Layer
Order routing, matching, settlement. Kalshi, Polymarket, ForecastEx, Robinhood.
Built
2
Data Layer
Market data, analytics, screening. Hashdive, Oddpool, Verso, PrediEdge.
Building
3
Strategy Layer
Backtesting, hypothesis testing, systematic strategy development. Altus Alpha.
Us
Why we can build this and others can't

1. Domain intelligence — 22 Bundesliga club dossiers, AI infrastructure research (31 entries), Soccer Economics. Years of accumulated content that powers meaningful event clustering and counterfactual generation.

2. Live prediction data — Finance and Football verticals with Brier Score tracking provide the real calibration data.

3. Cross-vertical bridge — No competitor operates across both finance and sports with prediction data in both.

4. The Strategy Factory — Working methodology since February 2026. Not an idea — a system.

The content is the moat. The infrastructure is the business.

We sell the Factory. We keep the recipes.

← The Strategy Factory Trading Tools → Partner / Investor Inquiry →