A subscribable calendar of dates is useful. A subscribable calendar that also tells you what each case is about in a couple of sentences is more useful — especially when you’re tracking 30 cases and can’t remember which Wang is which.
When enabled, case-calendar generates a 2-4 sentence prose summary for each docket and renders it on the public index page next to the case row. Summaries are opt-in, off by default, and only run on dockets where the source documents actually support a confident answer (the LLM is instructed to refuse rather than fabricate when they don’t).
What gets summarized
The summary pipeline pulls three sets of source documents for each docket:
- Primary document — the latest indictment / superseding indictment / information for criminal dockets; the latest amended complaint / complaint / petition for civil. Establishes who’s involved and what the case is about.
- Disposition documents — judgments, plea agreements, verdict forms, orders of dismissal, dispositive memoranda. Anything that materially changes “where does the case stand”. A dispositive order in a busy civil case can be a hundred pages back from the latest entry, so the pipeline walks several pages of the docket newest-first to find it.
- Operator-provided documents (optional, see
extra_documentsbelow) — anything you’ve manually pointed the pipeline at to fill a CourtListener data gap.
Those documents — plus a structured scaffold of the hearings and deadlines
the extractor already recorded — go into a single LLM call. The model returns
prose; case-calendar persists it to the case_summaries table.
The page-rendered output looks like:
Mr. Jones is charged in the Northern District of Texas with one count of wire fraud conspiracy and five counts of wire fraud for his alleged role in a $2.6 million online romance-scam scheme. He pled guilty to the conspiracy count on January 14, 2025 pursuant to a plea agreement; sentencing is scheduled for May 28, 2026. Co-defendant Smith remains a fugitive abroad.
A live deployment with real summaries on real federal-court dockets is at casecalendar.net.
Enabling summaries
Add a top-level block to config.yaml:
case_summaries:
enabled: true
# provider: anthropic
# model: claude-sonnet-4-6
# allow_ocr: true
# debounce_seconds: 300
| Key | Required | Purpose |
|---|---|---|
enabled |
yes | Master switch. Defaults to false. |
provider |
no | Force a specific provider (anthropic / openai / gemini). Defaults to whichever LLM key is set, or LLM_SUMMARY_PROVIDER. |
model |
no | Override the model. Defaults to Sonnet / GPT-5.4 / Gemini Pro depending on provider. |
allow_ocr |
no | Run local OCR fallback on PDFs CourtListener hasn’t extracted. Defaults to true. Set to false to skip tesseract entirely. |
debounce_seconds |
no | Webhook-only. How many seconds of quiet to wait after the last summary-relevant entry before re-running the LLM. Defaults to 300. Polling syncs ignore this — they regenerate immediately. |
When enabled: true, summaries auto-refresh as part of sync and serve:
whenever the syncer sees a new primary document or disposition, it flips
the row’s stale flag. At the end of the sync (or after the debounce
timer fires in serve), the pipeline regenerates every stale row before
re-emitting the index. The page reflects the case’s current posture without
you running anything manually.
Cost
Summaries run on a higher-tier model than the extractor pipeline — the synthesis task warrants the upgrade. Defaults:
| Provider | Default model |
|---|---|
| Anthropic | Claude Sonnet 4.6 |
| OpenAI | GPT-5.4 |
| Gemini | Gemini 2.5 Pro |
Budget roughly $0.10–0.60 per docket for the first run, near-zero on subsequent runs (existing rows are reused unless the docket got a new primary document or disposition). On a 30-case calendar you’ll probably spend a few dollars to backfill and pennies a week thereafter.
To force a regeneration after a model upgrade or prompt change:
uv run case-calendar summarize --force
# or, bundled into a polling sync to share the CourtListener session:
uv run case-calendar sync --force-summaries
The “insufficient documents” refusal
The summary LLM is instructed to refuse rather than fabricate when its inputs are too sparse to support a confident summary. If the primary document text is empty (image-only PDF that didn’t OCR), garbled (custom font subsets — see the installation page), or otherwise lacks the substance needed to identify the parties and the gist of the charges or claims, the model emits this exact sentence verbatim:
Documents available for this docket are insufficient to generate a reliable summary.
That gets stored and rendered like any other summary. Subscribers see the
honest acknowledgement instead of a plausible-sounding hallucination, and
operators can grep for the sentence in the database to find dockets that
need attention (typically: install poppler/tesseract for local OCR, or
point extra_documents at an out-of-band source).
This rule is one of several guardrails baked into the prompt. The model is also told:
- A trial date in a scheduling order is not proof a trial occurred. Don’t say “tried before a jury” unless there’s a verdict form or judgment-after-trial.
- If you mention a hearing, state the date. Vague phrasing (“a hearing is scheduled”) that hides whether the date is past or unverified is forbidden.
- Past-dated hearings that the docket hasn’t confirmed as held or vacated must be described as past-but-unconfirmed, not as upcoming.
- When a judgment is provided, the imposed sentence (term of imprisonment, supervised release, fine, restitution) must appear in the summary verbatim.
- The system prompt does NOT render the legal disclaimers (“AI-generated, may contain mistakes” + presumption of innocence) — those are baked into the page template so the language stays stable regardless of model output.
Multi-docket aggregation
For cases that span multiple dockets (district + appellate; co-defendants on separate dockets; parallel filings), the AI summary is generated per docket, then rendered as a labeled paragraph block on the index page:
3:24-cv-00100 (N.D. Cal.): The district court suit alleges …
24-12345 (9th Cir.): The Ninth Circuit appeal challenges …
To frame the litigation strategy for the model, add an aggregation_note
on the case:
- id: anthropic-v-dow
name: "Anthropic v. DOW"
calendar: tech
dockets: [72380208, 72379655, 73136734]
aggregation_note: >-
Parallel suits challenging separate Department of War actions taken
under distinct statutory authorities, each filed in the proper venue
for the action it targets.
The note is only shown to the summarizer. It’s not rendered to subscribers. Keep it short and factual — the model uses it as framing, not as text to copy.
extra_documents
CourtListener and PACER sometimes don’t surface documents the public should be able to see. Two real failure modes the project has hit:
- Sealed-then-unsealed entries that the clerk hasn’t yet unhidden or re-uploaded. The indictment is technically public (the seal was lifted in connection with extradition), but it still shows as was missing in PACER.
- CourtListener metadata bugs. A PDF is in CourtListeners’s storage bucket
but the v4 API reports
is_available: falsebecause the file was uploaded under an olderpacer_case_idthan the docket’s current one. (CourtListener bug #7345).
For those cases, point case-calendar at the document directly. Each entry needs three fields:
- id: us-v-zewei
name: "United States v. Zewei"
calendar: cybercrime
dockets: [70789744]
extra_documents:
- docket: 70789744
url: https://www.justice.gov/opa/media/1407196/dl
note: >-
This PDF is the unsealed indictment in S.D. Tex. case
4:23-cr-00523 (United States v. Xu Zewei).
| Field | Purpose |
|---|---|
docket |
Must be one of this case’s dockets ids. |
url |
Absolute https:// URL to a PDF. Anywhere — DoJ press releases, archived storage URLs, court websites. |
note |
Required. Tells the summary LLM what the document is and why it was added. The note rides into the prompt as trusted operator metadata; the document text itself is still treated as untrusted (the same way CourtListener / PACER text is). |
case-calendar fetches the bytes through the same pypdf → OCR fallback chain
as it does for CourtListener documents, then feeds them to the LLM as their
own labeled section. Each entry’s LLM block is headed
OPERATOR-PROVIDED DOCUMENT (sourced outside CourtListener) with the
operator’s note line beneath it.
Keep the note short — one sentence that identifies the document by
name plus a case citation. The note is data fed to the summary LLM. Bug
numbers, workaround details, or “remove this once CourtListener fixes it” all belong
in a # comment in config.yaml, not in note. The LLM is summarizing
the case for public subscribers; any mention of CourtListener internals
or tooling state in its output would be both off-topic and a leak of
internal context.
Remove each extra_documents entry once the upstream gap closes.
Legal disclaimers
The index page renders a static <footer> block carrying two disclaimers:
Case descriptions are generated by AI and may contain mistakes.
Criminal defendants are presumed innocent unless and until convicted in a court of law.
Both are rendered by the page template, not by the LLM. The legally-loaded text is stable regardless of model output or prompt revision. The summary prompt explicitly tells the model NOT to include these — they’re the renderer’s responsibility.
Next steps
- Public index page — how summaries get rendered.
- Configuration — the complete
config.yamlreference.