Calibration log

What we filter, what we score, and how the calls have aged.

Form 4 AI is a triage layer over SEC Form 4 insider filings. The first job is filtering mechanical noise — option exercises, tax withholding, scheduled grants — out of the day's flow. The second job is generating an AI thesis on the filings that survive that filter. This page is our open log of how that's going. The forward-return numbers below are research transparency for an early sample, not a performance claim.

Returns last computed 8/2/2026, 8:22:08 PM · 763 settled rows in store · 172 after current filter

Early calibration, not a performance claim

We publish every settled call so you can audit our progress as the sample grows. The cohort is small and most rows have only crossed the 30–60 day window so far. Treat the bucket numbers below as how the scoring is calibrating against real outcomes — not as evidence of future performance.

Triage performance

What the system filters and surfaces.

Before any scoring, the first job is separating noise from signal. These are the live, defensible numbers: how much routine compensation activity we drop on the floor, how many filings get a full AI thesis, and how often the system flags a high-conviction call.

Mechanical filings filtered

95%

5,479 of 5,776 filings · last 30 days

AI theses generated

297

filings with full AI thesis · last 30 days

High-conviction calls

155

rule ≥70 or AI ≥70 · last 90 days

The mechanical-filter step is the largest concrete operational win: the bulk of incoming Form 4s have no economic content (option exercises, tax withholding, grants) and never reach the AI analysis layer. The forward-return calibration buckets below are still maturing — small samples are flagged as such rather than reported as if the numbers were stable.

Scoring method

Rule-based (deterministic)AI analysis Combined (avg) — experimental

Direction (buys recommended)

Open-market buys only Open-market sells only All directions (incl. sells)

Research signal, not investment advice. Past performance does not indicate future results. Insider-filing backtests are notoriously noisy and path-dependent. Buckets are tiered by sample size: published (n ≥ 50), preview · low n (30–49, dimmed), or calibrating (n < 30, no number). Read the preview tier as a direction-of-travel hint only, not a settled estimate.

Calibration buckets — excess return vs SPY

open-market buys · grouped by conviction score, aged across horizons. Cells with 30–49 settled rows render as preview · low n (dimmed); below 30 they render as calibrating; ≥50 render full numbers.

method: AI analysis

Bucket	Total	5d	15d	30d	60d	90d
70-79	1	calibrating n=1	calibrating n=1	calibrating n=1	calibrating n=1	—
50-69	56	-1.36% med -2.00% · n=56 win 32%	-1.43% med -2.14% · n=56 win 38%	-2.75% med -3.16% · n=56 win 34%	-3.08% med -2.71% · n=56 win 43%	-0.22% med -10.15% · n=30 preview · low nwin 20%
30-49	86	-0.61% med -1.08% · n=86 win 37%	-1.88% med -2.41% · n=86 win 30%	+0.18% med -2.30% · n=86 win 40%single outlier	-2.32% med -1.69% · n=85 win 47%	-8.99% med -8.23% · n=51 win 29%
<30	29	calibrating n=29	calibrating n=29	calibrating n=29	calibrating n=27	calibrating n=16

Cells render in three tiers: published (n ≥ 50, full color), preview · low n (30 ≤ n < 50, muted color — directional only), calibrating (n < 30, no number rendered). Horizons with no matured rows yet (typically 90d / 180d on a young cohort) are hidden entirely.

Recent ≥50-conviction filings — performance vs SPY where settled

The actual filings behind the high-conviction bucket above. Each column is excess return vs SPY at that horizon — em-dash means the horizon hasn't matured yet for that filing (e.g. a filing from last week has no 30d settled return). Click any row to read the full thesis.

Filed	Ticker	Dir	Conv.	5d	15d	30d	60d	Filing
2026-05-05	JEF	buy	72	+2.59%	+0.66%	+5.74%	+5.75%	Open →
2026-02-06	MANE	buy	68	+14.26%	+18.14%	+35.16%	+51.92%	Open →
2026-03-02	MYGN	buy	68	-2.81%	-2.32%	-3.85%	-9.18%	Open →
2026-05-05	CSGP	buy	68	-7.82%	-4.97%	-8.16%	-21.57%	Open →
2026-05-11	BOT	buy	68	-27.94%	-35.92%	-9.21%	-18.44%	Open →
2026-03-13	AMR	buy	68	+4.51%	+18.62%	-0.28%	-11.37%	Open →
2026-05-06	PS	buy	67	+57.10%	+13.09%	+7.83%	+4.83%	Open →
2026-02-10	BMI	buy	66	+5.35%	+1.64%	-1.95%	+6.06%	Open →
2026-05-05	OPCH	buy	66	-9.05%	+1.95%	-10.38%	-3.52%	Open →
2026-02-04	DXC	buy	65	-0.64%	-6.14%	-7.11%	-6.48%	Open →
2026-05-05	FND	buy	65	-1.19%	-5.02%	-3.36%	+13.88%	Open →
2026-05-06	WTW	buy	65	-1.81%	+1.09%	+3.90%	+11.59%	Open →

Method: AI analysis. Sorted by conviction descending. Em-dash means the horizon has not yet matured for that filing (a filing from <30 calendar days ago will not yet have a 30d return). The cohort fills in over time as base_date + horizon crosses today.

Methodology (v1)

Pinned, auditable, no lookahead.

Triage filter (before scoring): filings whose transactions are entirely mechanical — option exercises, tax withholding, scheduled grants, gifts, derivative conversions — are tagged at parse time and never reach the AI analysis layer. The triage counts at the top of this page reflect that filter.
Universe (for the calibration buckets): every Form 4 we score that has a tradable US ticker and a "buy" or "sell" primary direction. Synthetic / test filings are excluded.
Entry price: first regular-trading-session close strictly after the filing's filed_at date. This rules out same-day lookahead — a filing published at 16:01 ET cannot share a "close" with a filing published at 09:35.
Exit prices: first close on or after entry + 5/15/30/60/90/180 calendar days. Weekends and holidays roll forward.
Excess return: (ticker_return − SPY_return) × direction_sign. Buy = +1, sell = −1. So a +5% excess on a buy and a +5% excess on a sell both mean "the insider's directional view paid off vs the broad market by 5%."
Sells separately: insider sales are noisier than buys (10b5-1 plans, taxes, diversification). Use the "Open-market buys only" filter for the cleanest read.
Three methodologies, no winner declared: we publish rule-based, AI, and combined side by side. The combined (avg) method is shown as experimental — at the current sample it does not outperform either single method, so we don't recommend it as a default read.
Bucket display tiers: (a) published — n ≥ 50, full color, treated as a directional read; (b) preview · low n — 30 ≤ n < 50, muted color, read as direction-of-travel only; (c) calibrating— n < 30, no number rendered. Both floors lower automatically as the cohort matures. Horizons with zero matured rows are hidden until they fill in.
No deletions: calls that didn't work stay on the page. We never re-bucket retrospectively or hide the bad ones.
Limitations: delisted tickers (~5–10% of small-cap filings) drop from the table when Yahoo can't price them — could mask negative results from failed companies. Sample period is short. Sector benchmarks (XBI/XLF/XLK) are on the roadmap but not yet in.