Calibration log
What we filter, what we score, and how the calls have aged.
Form 4 AI is a triage layer over SEC Form 4 insider filings. The first job is filtering mechanical noise — option exercises, tax withholding, scheduled grants — out of the day's flow. The second job is generating an AI thesis on the filings that survive that filter. This page is our open log of how that's going. The forward-return numbers below are research transparency for an early sample, not a performance claim.
Returns last computed 6/18/2026, 3:00:21 PM · 760 settled rows in store · 169 after current filter
Early calibration, not a performance claim
We publish every settled call so you can audit our progress as the sample grows. The cohort is small and most rows have only crossed the 30–60 day window so far. Treat the bucket numbers below as how the scoring is calibrating against real outcomes — not as evidence of future performance.
Triage performance
What the system filters and surfaces.
Before any scoring, the first job is separating noise from signal. These are the live, defensible numbers: how much routine compensation activity we drop on the floor, how many filings get a full AI thesis, and how often the system flags a high-conviction call.
The mechanical-filter step is the largest concrete operational win: the bulk of incoming Form 4s have no economic content (option exercises, tax withholding, grants) and never reach the AI analysis layer. The forward-return calibration buckets below are still maturing — small samples are flagged as such rather than reported as if the numbers were stable.
Calibration buckets — excess return vs SPY
open-market buys · grouped by conviction score, aged across horizons. Cells with 30–49 settled rows render as preview · low n (dimmed); below 30 they render as calibrating; ≥50 render full numbers.
| Bucket | Total | 5d | 15d | 30d | 60d | 90d |
|---|---|---|---|---|---|---|
| 70-79 | 1 | calibrating n=1 | calibrating n=1 | calibrating n=1 | — | — |
| 50-69 | 56 | -1.36% med -2.00% · n=56 win 32% | -1.43% med -2.14% · n=56 win 38% | -2.75% med -3.16% · n=56 win 34% | calibrating n=23 | calibrating n=16 |
| 30-49 | 85 | -0.60% med -1.07% · n=85 win 38% | -1.88% med -2.83% · n=85 win 31% | +0.21% med -2.70% · n=85 win 40%single outlier | -7.74% med -7.77% · n=44 preview · low nwin 32% | calibrating n=27 |
| <30 | 27 | calibrating n=27 | calibrating n=27 | calibrating n=27 | calibrating n=14 | calibrating n=11 |
Cells render in three tiers: published (n ≥ 50, full color), preview · low n (30 ≤ n < 50, muted color — directional only), calibrating (n < 30, no number rendered). Horizons with no matured rows yet (typically 90d / 180d on a young cohort) are hidden entirely.
Recent ≥50-conviction filings — performance vs SPY where settled
The actual filings behind the high-conviction bucket above. Each column is excess return vs SPY at that horizon — em-dash means the horizon hasn't matured yet for that filing (e.g. a filing from last week has no 30d settled return). Click any row to read the full thesis.
| Filed | Ticker | Dir | Conv. | 5d | 15d | 30d | 60d | Filing |
|---|---|---|---|---|---|---|---|---|
| 2026-05-05 | JEF | buy | 72 | +2.59% | +0.66% | +5.74% | — | Open → |
| 2026-03-02 | MYGN | buy | 68 | -2.81% | -2.32% | -3.85% | -9.18% | Open → |
| 2026-05-05 | CSGP | buy | 68 | -7.82% | -4.97% | -8.16% | — | Open → |
| 2026-05-11 | BOT | buy | 68 | -27.94% | -35.92% | -9.21% | — | Open → |
| 2026-02-06 | MANE | buy | 68 | +14.26% | +18.14% | +35.16% | +51.92% | Open → |
| 2026-03-13 | AMR | buy | 68 | +4.51% | +18.62% | -0.28% | -11.37% | Open → |
| 2026-05-06 | PS | buy | 67 | +57.10% | +13.09% | +7.83% | — | Open → |
| 2026-02-10 | BMI | buy | 66 | +5.35% | +1.64% | -1.95% | +6.06% | Open → |
| 2026-05-05 | OPCH | buy | 66 | -9.05% | +1.95% | -10.38% | — | Open → |
| 2026-05-05 | FND | buy | 65 | -1.19% | -5.02% | -3.36% | — | Open → |
| 2026-05-06 | WTW | buy | 65 | -1.81% | +1.09% | +3.90% | — | Open → |
| 2026-02-04 | DXC | buy | 65 | -0.64% | -6.14% | -7.11% | -6.48% | Open → |
Method: AI analysis. Sorted by conviction descending. Em-dash means the horizon has not yet matured for that filing (a filing from <30 calendar days ago will not yet have a 30d return). The cohort fills in over time as base_date + horizon crosses today.
Methodology (v1)
Pinned, auditable, no lookahead.
- Triage filter (before scoring): filings whose transactions are entirely mechanical — option exercises, tax withholding, scheduled grants, gifts, derivative conversions — are tagged at parse time and never reach the AI analysis layer. The triage counts at the top of this page reflect that filter.
- Universe (for the calibration buckets): every Form 4 we score that has a tradable US ticker and a "buy" or "sell" primary direction. Synthetic / test filings are excluded.
- Entry price: first regular-trading-session close strictly after the filing's
filed_atdate. This rules out same-day lookahead — a filing published at 16:01 ET cannot share a "close" with a filing published at 09:35. - Exit prices: first close on or after entry + 5/15/30/60/90/180 calendar days. Weekends and holidays roll forward.
- Excess return: (ticker_return − SPY_return) × direction_sign. Buy = +1, sell = −1. So a +5% excess on a buy and a +5% excess on a sell both mean "the insider's directional view paid off vs the broad market by 5%."
- Sells separately: insider sales are noisier than buys (10b5-1 plans, taxes, diversification). Use the "Open-market buys only" filter for the cleanest read.
- Three methodologies, no winner declared: we publish rule-based, AI, and combined side by side. The combined (avg) method is shown as experimental — at the current sample it does not outperform either single method, so we don't recommend it as a default read.
- Bucket display tiers: (a) published — n ≥ 50, full color, treated as a directional read; (b) preview · low n — 30 ≤ n < 50, muted color, read as direction-of-travel only; (c) calibrating— n < 30, no number rendered. Both floors lower automatically as the cohort matures. Horizons with zero matured rows are hidden until they fill in.
- No deletions: calls that didn't work stay on the page. We never re-bucket retrospectively or hide the bad ones.
- Limitations: delisted tickers (~5–10% of small-cap filings) drop from the table when Yahoo can't price them — could mask negative results from failed companies. Sample period is short. Sector benchmarks (XBI/XLF/XLK) are on the roadmap but not yet in.