Die Innovationsschicht

Kontextualisierte Backtesting-Infrastruktur, die mit 5-10 realen Events funktioniert. Drei Innovationen, die kein bestehendes PM-Analytics-Tool hat.

1
Structural Event Clustering
Events nach Verhaltens-Fingerabdruck gruppieren, nicht nach Thema. Aus 8 Datenpunkten 200+ machen.
Im Bau
2
LLM Counterfactual Expansion
50 synthetische Szenarien pro Event generieren, kalibriert an realen Ergebnissen.
Geplant
3
Cross-Vertical Transfer Learning
Fußball (306 Spiele/Saison) nutzen, um Finanzmodelle zu kalibrieren (8 Fed-Sitzungen/Jahr).
Geplant
Warum das wichtig ist

Hashdive zeigt Whale Trades. Oddpool zeigt Cross-Platform-Quoten. Verso baut ein Terminal. Niemand lässt dich TESTEN, ob eine PM-Trading-Idee tatsächlich funktioniert — mit ordnungsgemäßer Out-of-Sample-Validierung und Small-N-Kompensation. Diese Lücke füllen wir.

Das N-Problem: zu wenige Events zum Backtesten

Die interessantesten Prognosemärkte haben die wenigsten Datenpunkte. Die Märkte mit reichlich Daten haben das geringste institutionelle Interesse.

Bundesliga Matches
600+
YES
Poly Whale Signals
200+
YES
US Fed Decisions
~16
BARELY
CPI Releases
~24
MARGIN
NVIDIA Earnings
~8
NO
Tariff Announcements
3-5
NO
US Elections
2-3
NO

Data points since PM mainstream (~2024). The paradox: the most tradeable events have the least testable data. Our infrastructure solves this.

Structural Event Clustering

Events nicht nach Thema gruppieren. Nach dem Verhalten der Prognosemärkte um sie herum gruppieren. A tariff announcement and a coaching dismissal are structurally identical — both are surprise-binary-fast events.

Fünf Clustering-Dimensionen

Vorhersagbarkeit

Scheduled (Fed, CPI, Matchday) vs. Surprise (Tariffs, Injuries, Sanctions)

Binarität

Clean Yes/No (Win/Lose, Rate Cut) vs. Continuous (S&P level, vote share)

Info-Dynamik

Fast (injury: minutes) vs. Slow (championship race: months)

Markttiefe

Deep (US Election: $3B+) vs. Thin (Bundesliga transfer: <$50K)

Auflösung

Oracle-automated (match result, CPI) vs. Human judgment (ambiguous politics)

Cluster-Beispiele

Cluster A: Scheduled Macro
Scheduled · Binary · Fast · Deep · Automated
Members:
Fed, ECB, BOE, BOJ, RBA decisions, CPI, NFP, PMI
All central bank and macro data events. Same microstructure: known date, binary outcome space, pre-event compression, institutional positioning, media anticipation cycle.
N = 80-100/year globally
Cluster B: Scheduled Sports
Scheduled · Binary · Slow · Deep · Automated
Members:
Bundesliga, CL, Premier League, NBA, NFL matches
All scheduled sporting events with known participants. Predictable timing, binary resolution, deep markets for big leagues, automated oracle resolution.
N = 1000+/year
Cluster C: Surprise Political
Surprise · Binary · Fast · Thin · Judgment
Members:
Tariffs, sanctions, executive orders, regulatory actions, coach dismissals
Surprise political/organizational decisions with immediate market impact. No advance scheduling, fast information cascade, thin liquidity, often ambiguous resolution.
N = 30-50/year across domains
Cluster D: Scheduled Continuous
Scheduled · Continuous · Slow · Deep · Automated
Members:
NVIDIA/Apple/TSMC earnings, GDP, election vote shares
Known date but continuous outcome space. Market prices bracket ranges, not binary outcomes. Requires different backtesting approach.
N = 200+/year
Die Schlüsselerkenntnis

A "tariff announcement" with only 3-5 historical data points becomes backtestable when clustered with 30-50 structurally similar surprise-binary-fast events. The clustering turns untestable into testable.

LLM Counterfactual Expansion

Traditional synthetic data generates artificial price paths. We generate artificial event contexts — "what if the tariff was 25% instead of 10%?" — and model PM reactions.

1
Real Event
10% tariff, PM 35→72%
5
Parameters
Surprise · Magnitude · Retaliation · Media · Position
50
LLM Scenarios
Counterfactual contexts via Claude
50
Calibrated Reactions
Estimated PM price paths
Backtest
51 events from 1 real event

Warum LLMs, nicht GANs?

Traditional synthetic data methods (GANs, Monte Carlo) operate on numbers. They generate plausible-looking price paths but cannot generate plausible-looking event contexts. A language model can understand that a 25% tariff is qualitatively different from 10% — not just numerically higher. It models second-order effects (retaliation chains), media reactions, and institutional positioning.

This is not hallucination. It is parametric scenario design with an LLM as domain expert, calibrated against real data points.

Kalibrierungsprotokoll

Synthetic scenarios are only valuable if they're calibrated against reality. For each cluster with 5+ real events: generate 50 counterfactuals per event, hold out 20% of real events as test set, train on 80% real + all synthetics, validate against held-out events. Track prediction error and publish calibration scores alongside all backtest results.

If the synthetic expansion doesn't improve predictions on held-out real events, it gets discarded. Radical transparency — the system earns trust through measurable calibration.

Cross-Vertical Transfer Learning

Certain PM behavioral patterns are domain-agnostic. Football (306 matches/season) calibrates finance models (8 Fed meetings/year). The high-N domain trains the low-N domain.

306
Football
matches per season
High-N calibration laboratory
Statistically robust patterns
22 club sites × 60 years of data
Measurable: pre-event compression, news latency, surprise overreaction, thin-market drift
→ → →
Transfer
Patterns
→ → →
8
Finance
Fed meetings per year
Low-N, high institutional interest
Insufficient data for standalone backtesting
But structurally similar PM dynamics
Hypothesis: same patterns with domain-specific adjustments

Was über Vertikale hinweg transferiert wird

Pre-Event-Kompression

PM spreads narrow 2-4h before kickoff as late info arrives. Same pattern before FOMC?

Measurable in football

News-to-Price-Latenz

Injury news: 15-45min to full pricing. Tariff tweet: similar latency adjusted for liquidity?

Measurable in football

Überraschungs-Überreaktion

Red card causes 15-20% overshoot that mean-reverts. Surprise tariff: same overshoot?

Hypothese

Thin-Market-Drift

Low-liquidity transfer markets drift 5-8% on one large order over 24h. Niche politics: same?

Hypothese

Experten-vs-Crowd-Gap

kicker-Tipps vs Polymarket on Bundesliga. Superforecasters vs Polymarket on macro. Same gap structure?

Measurable both sides
Der Netzwerkeffekt

Jede Fußball-Prognose, die durch das System läuft, verbessert die Genauigkeit des Finanz-Backtestings und umgekehrt. Dies erzeugt einen sich verdichtenden Daten-Asset, den kein Single-Vertical-Wettbewerber erreichen kann.

Niemand baut die Strategie-Schicht

Der PM-Infrastruktur-Stack hat drei Schichten. Schicht 1 (Ausführung) und Schicht 2 (Daten) werden gebaut. Schicht 3 (Strategie) steht weit offen.

1
Ausführungsschicht
Order Routing, Matching, Settlement. Kalshi, Polymarket, ForecastEx, Robinhood.
Gebaut
2
Datenschicht
Marktdaten, Analytics, Screening. Hashdive, Oddpool, Verso, PrediEdge.
Im Bau
3
Strategie-Schicht
Backtesting, Hypothesentests, systematische Strategieentwicklung. Altus Alpha.
Wir
Warum wir das bauen können und andere nicht

1. Domänen-Intelligenz — 22 Bundesliga-Club-Dossiers, KI-Infrastruktur-Research (31 Einträge), Soccer Economics. Jahre akkumulierter Content, der bedeutungsvolles Event-Clustering und Counterfactual-Generierung antreibt.

2. Live-Prognose-Daten — Finanz- und Fußball-Vertikale mit Brier-Score-Tracking liefern die realen Kalibrierungsdaten.

3. Cross-Vertical-Brücke — Kein Wettbewerber operiert sowohl in Finanzen als auch Sport mit Prognosedaten in beiden.

4. Die Strategy Factory — Funktionierende Methodik seit Februar 2026. Keine Idee — ein System.

Der Content ist der Burggraben. Die Infrastruktur ist das Geschäft.

Wir verkaufen die Factory. Die Rezepte behalten wir.

← The Strategy Factory Trading-Werkzeuge → Partner- / Investorenanfrage →