Skip to the content.

This page is for the curious — what case-calendar actually does between “new docket entry” and “calendar event”. You don’t need it to run the tool, but if you’re going to modify it (or just want to understand the trade-offs), this is the map.

The exhaustive design-decisions reference lives in AGENTS.md in the repo. This page is the concise version.

← Back to docs

The pipeline at a glance

CourtListener docket
        │
        ▼
┌───────────────────┐
│ regex pre-filter  │  cheap. drops 80%+ of entries before any LLM call.
└─────────┬─────────┘
          │ hearings, deadlines, briefing schedules, etc.
          ▼
┌───────────────────┐
│ LLM extractor     │  small/fast tier (Claude Haiku, gpt-5.4-nano, Gemini Flash Lite).
│ per docket entry  │  returns ADD / RESCHEDULE / CANCEL / MARK_HELD / ...
└─────────┬─────────┘
          │
          ▼
┌───────────────────┐
│ SQLite store      │  stable (case_id, hearing_key) rows.
└─────────┬─────────┘
          │
          ▼
┌───────────────────┐
│ end-of-sync       │  verify-pass LLM checks each live hearing /
│ confidence checks │  deadline against the docket. Catches missed
└─────────┬─────────┘  reschedules, etc.
          │
          ▼
┌───────────────────┐
│ renderers         │  ICS, Google Calendar, M365 Outlook, index.html.
└───────────────────┘

Two delivery modes feed the pipeline:

Both paths share the same code beneath the entry processor. A hearing extracted via webhook is byte-identical to one extracted via polling.

Two LLM tracks

case-calendar uses large-language-model calls for two distinct jobs. Throughout the codebase and these docs:

The two jobs have different cost / quality trade-offs, so they’re wired to independent provider and model knobs:

Track Volume Default model Why
Extraction High (one call per relevant entry) Claude Haiku / gpt-5.4-nano / Gemini Flash Lite Structured-output classification — date, key, significance. The cheap tier handles it fine, and the per-case cost stays in the cents-per-day range.
Summarization Low (one call per docket, rarely re-run) Sonnet / GPT-5.4 / Gemini Pro Synthesis from 30-100k tokens of legal prose. Worth the upgrade; pennies per docket.

The two tracks have independent provider / model knobs (LLM_PROVIDER / LLM_MODEL for the extractor; LLM_SUMMARY_PROVIDER / LLM_SUMMARY_MODEL for summaries) so changing one doesn’t affect the other.

Why LLM-driven extraction, not regex?

Courts describe hearings inconsistently. The same event can show up as:

Maintaining regexes per court is a treadmill — and a new clerk’s habits break them silently. Instead, the LLM sees the entry plus the case’s known-hearings list, and decides ADD vs RESCHEDULE vs UPDATE vs CANCEL in one call. A cheap regex pre-filter still runs before the LLM to drop the obvious non-hearings (briefs, attorney appearances, sealed placeholders) for free.

Stable hearing keys

Each logical hearing — say, “sentencing for Smith” — gets a stable hearing_key (kebab-case, e.g. smith-sentencing) assigned on first observation. Reschedules and detail updates land on the same row. The Google Calendar event id is derived deterministically from sha1(case_id::hearing_key), so the same logical hearing is the same calendar event across syncs, reschedules, and database restores.

Filing deadlines work the same way, in a parallel deadlines table with a separate deadline_key. Renderers don’t care which is which — both are projected into the same shape before the ICS / gcal layer ever sees them.

Three-tier short-circuit

Quiet days cost almost nothing because the syncer short-circuits at three levels:

  1. Per-docket — if the docket’s date_modified hasn’t advanced since the last sync, skip everything. No entries API call, no LLM.
  2. Per-entryiter_entries(modified_after=cutoff) filters server-side to entries newer than the local high-water mark.
  3. Per-fingerprint — even if an entry comes back, dedup against (docket_id, entry_id, content_fingerprint) skips re-LLM-ing entries whose substantive content didn’t change.

On a busy docket with a real update, this still pays for one LLM call. On a quiet day across 30 dockets, it pays for one cheap CourtListener request per docket and zero LLM calls.

What’s in the fingerprint

The third short-circuit is the interesting one. case-calendar can’t trust “have we seen this entry_id before?” alone, because RECAP entries evolve after they first appear — a sealed PDF gets unsealed, or a previously-missing PDF finally gets uploaded to RECAP. We want to re-process those entries, but ignore cosmetic churn that didn’t change anything meaningful.

The fingerprint is a SHA-1 of just the entry state that matters:

Those second-group flags are what makes “PDF finally appeared on RECAP” or “sealed PDF was unsealed” re-trigger processing automatically: the flag flips → the fingerprint changes → the entry no longer matches its cached row → the syncer re-runs the LLM on it. Everything else — re-sorted metadata fields, unrelated audit columns — leaves the fingerprint stable, so the re-sync is a no-op.

End-of-sync confidence pass

After per-entry extraction, every scheduled or recently-changed hearing gets a separate focused LLM call (verify_hearing). The model sees just the candidate hearing plus the last 15 hearing-relevant entries on its docket, and returns one of:

This catches the classes of bug that per-entry extraction can’t see: reschedules across multiple entries, trials that got mooted by a plea but never explicitly vacated, and (rare) hallucinated rows.

There’s a parallel verify pass for filing deadlines when those are enabled on the case.

The data model

The SQLite store has five operational tables:

WAL journaling + a 5-second busy_timeout let the polling sync process and the long-running serve process safely share the same SQLite file. The webhook server also serializes its own worker threads with a server-wide lock.

Why “primary document”?

The summary pipeline talks about each docket’s primary document — the indictment, superseding indictment, information, complaint, amended complaint, or petition that establishes what the case is about. Earlier in the project this was called “operative pleading”, which is a real civil-practice term but reads oddly when applied to criminal indictments. “Primary document” connects to the established “primary source” concept and works across criminal and civil practice. See case summaries for what gets matched and how it’s used.

Data quality guardrails

Several of the codebase’s stricter behaviors exist to prevent specific failure modes seen on real dockets — hallucinations, false-positive “held” verdicts, calendar drift across timezones, cross-docket contamination. They look conservative on first read, and that’s the point: a wrong event on a public calendar erodes subscriber trust far more than a missing one.

AGENTS.md and the runtime prompts

The full set of those guardrails — plus the reasoning behind each one, the architectural conventions every module follows, and the testing philosophy — lives in AGENTS.md at the repo root. That file is the project’s contract with any agentic AI programmer working in the codebase: Claude Code, GitHub Copilot, Cursor, Codex, Aider, or any other tool that has a “follow this project’s conventions” surface. The reason rules live in AGENTS.md (rather than in each agent’s private memory) is portability — every collaborator, human or otherwise, picks them up the same way, and the rules survive when one agent’s session ends or a different agent joins the project. The same file is @-included from CLAUDE.md so Claude Code reads it on every invocation; other agents read it the same way under their own conventions.

The data-quality guardrails described above were the source material for the LLM prompts the project uses at runtime. Same rules, encoded in two places: once as English for the human and agent contributors who write the code, and once as English for the model that’s about to classify a real docket entry or read a real indictment. When a rule gets sharpened (e.g., the no-fabrication refusal, or the “trial-date-is-not-evidence-of-a-trial” invariant), it gets sharpened in both places.

The runtime prompts all live in case_calendar/llm.py:

Reading any of those alongside the corresponding entry in AGENTS.md is the fastest way to see how a particular guardrail moves from “rule for the human / agent writing this code” to “rule the model follows when processing a docket”.

See also