Innovation Layer — Contextualized Backtesting Infrastructure

1

Structural Event Clustering

Group events by behavioral fingerprint, not by topic. Turn 8 data points into 200+.

Building

2

LLM Counterfactual Expansion

Generate 50 synthetic scenarios per event, calibrated against real outcomes.

Planned

3

Cross-Vertical Transfer Learning

Use football (306 matches/season) to calibrate finance models (8 Fed meetings/year).

Planned

Why this matters

Hashdive shows you whale trades. Oddpool shows you cross-platform odds. Verso builds a terminal. Nobody lets you TEST whether a PM trading idea actually works — with proper out-of-sample validation and small-N compensation. That's the gap we fill.

THE PROBLEM

The N-Problem: too few events to backtest

The most interesting prediction markets have the fewest data points. The markets with abundant data have the least institutional interest.

Bundesliga Matches

600+

YES

Poly Whale Signals

200+

YES

US Fed Decisions

~16

BARELY

CPI Releases

~24

MARGIN

NVIDIA Earnings

~8

NO

Tariff Announcements

3-5

NO

US Elections

2-3

NO

Data points since PM mainstream (~2024). The paradox: the most tradeable events have the least testable data. Our infrastructure solves this.

LAYER 1

Structural Event Clustering

Don't group events by topic. Group them by how prediction markets behave around them. A tariff announcement and a coaching dismissal are structurally identical — both are surprise-binary-fast events.

Five clustering dimensions

Predictability

Scheduled (Fed, CPI, Matchday) vs. Surprise (Tariffs, Injuries, Sanctions)

Binariness

Clean Yes/No (Win/Lose, Rate Cut) vs. Continuous (S&P level, vote share)

Info Dynamics

Fast (injury: minutes) vs. Slow (championship race: months)

Market Depth

Deep (US Election: $3B+) vs. Thin (Bundesliga transfer: <$50K)

Resolution

Oracle-automated (match result, CPI) vs. Human judgment (ambiguous politics)

Cluster examples

Cluster A: Scheduled Macro

Scheduled · Binary · Fast · Deep · Automated

Members:

Fed, ECB, BOE, BOJ, RBA decisions, CPI, NFP, PMI

All central bank and macro data events. Same microstructure: known date, binary outcome space, pre-event compression, institutional positioning, media anticipation cycle.

N = 80-100/year globally

Cluster B: Scheduled Sports

Scheduled · Binary · Slow · Deep · Automated

Members:

Bundesliga, CL, Premier League, NBA, NFL matches

All scheduled sporting events with known participants. Predictable timing, binary resolution, deep markets for big leagues, automated oracle resolution.

N = 1000+/year

Cluster C: Surprise Political

Surprise · Binary · Fast · Thin · Judgment

Members:

Tariffs, sanctions, executive orders, regulatory actions, coach dismissals

Surprise political/organizational decisions with immediate market impact. No advance scheduling, fast information cascade, thin liquidity, often ambiguous resolution.

N = 30-50/year across domains

Cluster D: Scheduled Continuous

Scheduled · Continuous · Slow · Deep · Automated

Members:

NVIDIA/Apple/TSMC earnings, GDP, election vote shares

Known date but continuous outcome space. Market prices bracket ranges, not binary outcomes. Requires different backtesting approach.

N = 200+/year

The key insight

A "tariff announcement" with only 3-5 historical data points becomes backtestable when clustered with 30-50 structurally similar surprise-binary-fast events. The clustering turns untestable into testable.

LAYER 2

LLM Counterfactual Expansion

Traditional synthetic data generates artificial price paths. We generate artificial event contexts — "what if the tariff was 25% instead of 10%?" — and model PM reactions.

1

Real Event

10% tariff, PM 35→72%

→

5

Parameters

Surprise · Magnitude · Retaliation · Media · Position

→

50
LLM Scenarios
Counterfactual contexts via Claude

→

50

Calibrated Reactions

Estimated PM price paths

→

✓

Backtest

51 events from 1 real event

Why LLMs, not GANs?

Traditional synthetic data methods (GANs, Monte Carlo) operate on numbers. They generate plausible-looking price paths but cannot generate plausible-looking event contexts. A language model can understand that a 25% tariff is qualitatively different from 10% — not just numerically higher. It models second-order effects (retaliation chains), media reactions, and institutional positioning.

This is not hallucination. It is parametric scenario design with an LLM as domain expert, calibrated against real data points.

Calibration protocol

Synthetic scenarios are only valuable if they're calibrated against reality. For each cluster with 5+ real events: generate 50 counterfactuals per event, hold out 20% of real events as test set, train on 80% real + all synthetics, validate against held-out events. Track prediction error and publish calibration scores alongside all backtest results.

If the synthetic expansion doesn't improve predictions on held-out real events, it gets discarded. Radical transparency — the system earns trust through measurable calibration.

LAYER 3

Cross-Vertical Transfer Learning

Certain PM behavioral patterns are domain-agnostic. Football (306 matches/season) calibrates finance models (8 Fed meetings/year). The high-N domain trains the low-N domain.

306

Football

matches per season
High-N calibration laboratory
Statistically robust patterns
22 club sites × 60 years of data
Measurable: pre-event compression, news latency, surprise overreaction, thin-market drift

→ → →

Transfer

Patterns

→ → →

8

Finance

Fed meetings per year
Low-N, high institutional interest
Insufficient data for standalone backtesting
But structurally similar PM dynamics
Hypothesis: same patterns with domain-specific adjustments

What transfers across verticals

Pre-Event Compression

PM spreads narrow 2-4h before kickoff as late info arrives. Same pattern before FOMC?

Measurable in football

News-to-Price Latency

Injury news: 15-45min to full pricing. Tariff tweet: similar latency adjusted for liquidity?

Measurable in football

Surprise Overreaction

Red card causes 15-20% overshoot that mean-reverts. Surprise tariff: same overshoot?

Hypothesis

Thin Market Drift

Low-liquidity transfer markets drift 5-8% on one large order over 24h. Niche politics: same?

Hypothesis

Expert vs Crowd Gap

kicker-Tipps vs Polymarket on Bundesliga. Superforecasters vs Polymarket on macro. Same gap structure?

Measurable both sides

The network effect

Every football prediction that runs through the system improves the accuracy of finance backtesting, and vice versa. This creates a compounding data asset that no single-vertical competitor can match.

LANDSCAPE

Nobody is building the strategy layer

The PM infrastructure stack has three layers. Layer 1 (execution) and Layer 2 (data) are being built. Layer 3 (strategy) is wide open.

1

Execution Layer

Order routing, matching, settlement. Kalshi, Polymarket, ForecastEx, Robinhood.

Built

2

Data Layer

Market data, analytics, screening. Hashdive, Oddpool, Verso, PrediEdge.

Building

3

Strategy Layer

Backtesting, hypothesis testing, systematic strategy development. Altus Alpha.

Us

Why we can build this and others can't

1. Domain intelligence — 22 Bundesliga club dossiers, AI infrastructure research (31 entries), Soccer Economics. Years of accumulated content that powers meaningful event clustering and counterfactual generation.

2. Live prediction data — Finance and Football verticals with Brier Score tracking provide the real calibration data.

3. Cross-vertical bridge — No competitor operates across both finance and sports with prediction data in both.

4. The Strategy Factory — Working methodology since February 2026. Not an idea — a system.

The content is the moat. The infrastructure is the business.

We sell the Factory. We keep the recipes.

← The Strategy Factory Trading Tools → Partner / Investor Inquiry →

The Innovation Layer

The N-Problem: too few events to backtest

Structural Event Clustering

Five clustering dimensions

Predictability

Binariness

Info Dynamics

Market Depth

Resolution

Cluster examples

LLM Counterfactual Expansion

Why LLMs, not GANs?

Calibration protocol

Cross-Vertical Transfer Learning

What transfers across verticals

Pre-Event Compression

News-to-Price Latency

Surprise Overreaction

Thin Market Drift

Expert vs Crowd Gap

Nobody is building the strategy layer

The content is the moat. The infrastructure is the business.

Related