AI case summaries

A subscribable calendar of dates is useful. A subscribable calendar that also tells you what each case is about in a couple of sentences is more useful — especially when you’re tracking 30 cases and can’t remember which Wang is which.

When enabled, case-calendar generates a 2-4 sentence prose summary for each docket and renders it on the public index page next to the case row. Summaries are opt-in, off by default, and only run on dockets where the source documents actually support a confident answer (the LLM is instructed to refuse rather than fabricate when they don’t).

← Back to docs

What gets summarized

The summary pipeline pulls three sets of source documents for each docket:

Primary document — the latest indictment / superseding indictment / information for criminal dockets; the latest amended complaint / complaint / petition for civil. Establishes who’s involved and what the case is about.
Disposition documents — judgments, plea agreements, verdict forms, orders of dismissal, dispositive memoranda. Anything that materially changes “where does the case stand”. A dispositive order in a busy civil case can be a hundred pages back from the latest entry, so the pipeline walks several pages of the docket newest-first to find it.
Operator-provided documents (optional, see extra_documents below) — anything you’ve manually pointed the pipeline at to fill a CourtListener data gap.

Those documents — plus a structured scaffold of the hearings and deadlines the extractor already recorded — go into a single LLM call. The model returns prose; case-calendar persists it to the case_summaries table.

The page-rendered output looks like:

Mr. Jones is charged in the Northern District of Texas with one count of wire fraud conspiracy and five counts of wire fraud for his alleged role in a $2.6 million online romance-scam scheme. He pled guilty to the conspiracy count on January 14, 2025 pursuant to a plea agreement; sentencing is scheduled for May 28, 2026. Co-defendant Smith remains a fugitive abroad.

A live deployment with real summaries on real federal-court dockets is at casecalendar.net.

Enabling summaries

Add a top-level block to config.yaml:

case_summaries:
  enabled: true
  # provider: anthropic
  # model: claude-sonnet-4-6
  # allow_ocr: true
  # debounce_seconds: 300

Key	Required	Purpose
`enabled`	yes	Master switch. Defaults to `false`.
`provider`	no	Force a specific provider (`anthropic` / `openai` / `gemini`). Defaults to whichever LLM key is set, or `LLM_SUMMARY_PROVIDER`.
`model`	no	Override the model. Defaults to Sonnet / GPT-5.4 / Gemini Pro depending on provider.
`allow_ocr`	no	Run local OCR fallback on PDFs CourtListener hasn’t extracted. Defaults to `true`. Set to `false` to skip tesseract entirely.
`debounce_seconds`	no	Webhook-only. How many seconds of quiet to wait after the last summary-relevant entry before re-running the LLM. Defaults to 300. Polling syncs ignore this — they regenerate immediately.

When enabled: true, summaries auto-refresh as part of sync and serve: whenever the syncer sees a new primary document or disposition, it flips the row’s stale flag. At the end of the sync (or after the debounce timer fires in serve), the pipeline regenerates every stale row before re-emitting the index. The page reflects the case’s current posture without you running anything manually.

Cost

Summaries run on a higher-tier model than the extractor pipeline — the synthesis task warrants the upgrade. Defaults:

Provider	Default model
Anthropic	Claude Sonnet 4.6
OpenAI	GPT-5.4
Gemini	Gemini 2.5 Pro

Budget roughly $0.10–0.60 per docket for the first run, near-zero on subsequent runs (existing rows are reused unless the docket got a new primary document or disposition). On a 30-case calendar you’ll probably spend a few dollars to backfill and pennies a week thereafter.

To force a regeneration after a model upgrade or prompt change:

uv run case-calendar summarize --force
# or, bundled into a polling sync to share the CourtListener session:
uv run case-calendar sync --force-summaries

The “insufficient documents” refusal

The summary LLM is instructed to refuse rather than fabricate when its inputs are too sparse to support a confident summary. If the primary document text is empty (image-only PDF that didn’t OCR), garbled (custom font subsets — see the installation page), or otherwise lacks the substance needed to identify the parties and the gist of the charges or claims, the model emits this exact sentence verbatim:

Documents available for this docket are insufficient to generate a reliable summary.

That gets stored and rendered like any other summary. Subscribers see the honest acknowledgement instead of a plausible-sounding hallucination, and operators can grep for the sentence in the database to find dockets that need attention (typically: install poppler/tesseract for local OCR, or point extra_documents at an out-of-band source).

This rule is one of several guardrails baked into the prompt. The model is also told:

A trial date in a scheduling order is not proof a trial occurred. Don’t say “tried before a jury” unless there’s a verdict form or judgment-after-trial.
If you mention a hearing, state the date. Vague phrasing (“a hearing is scheduled”) that hides whether the date is past or unverified is forbidden.
Past-dated hearings that the docket hasn’t confirmed as held or vacated must be described as past-but-unconfirmed, not as upcoming.
When a judgment is provided, the imposed sentence (term of imprisonment, supervised release, fine, restitution) must appear in the summary verbatim.
The system prompt does NOT render the legal disclaimers (“AI-generated, may contain mistakes” + presumption of innocence) — those are baked into the page template so the language stays stable regardless of model output.

Multi-docket aggregation

For cases that span multiple dockets (district + appellate; co-defendants on separate dockets; parallel filings), the AI summary is generated per docket, then rendered as a labeled paragraph block on the index page:

3:24-cv-00100 (N.D. Cal.): The district court suit alleges …

24-12345 (9th Cir.): The Ninth Circuit appeal challenges …

To frame the litigation strategy for the model, add an aggregation_note on the case:

- id: anthropic-v-dow
  name: "Anthropic v. DOW"
  calendar: tech
  dockets: [72380208, 72379655, 73136734]
  aggregation_note: >-
    Parallel suits challenging separate Department of War actions taken
    under distinct statutory authorities, each filed in the proper venue
    for the action it targets.

The note is only shown to the summarizer. It’s not rendered to subscribers. Keep it short and factual — the model uses it as framing, not as text to copy.

extra_documents

CourtListener and PACER sometimes don’t surface documents the public should be able to see. Two real failure modes the project has hit:

Sealed-then-unsealed entries that the clerk hasn’t yet unhidden or re-uploaded. The indictment is technically public (the seal was lifted in connection with extradition), but it still shows as was missing in PACER.
CourtListener metadata bugs. A PDF is in CourtListeners’s storage bucket but the v4 API reports is_available: false because the file was uploaded under an older pacer_case_id than the docket’s current one. (CourtListener bug #7345).

For those cases, point case-calendar at the document directly. Each entry needs three fields:

- id: us-v-zewei
  name: "United States v. Zewei"
  calendar: cybercrime
  dockets: [70789744]
  extra_documents:
    - docket: 70789744
      url: https://www.justice.gov/opa/media/1407196/dl
      note: >-
        This PDF is the unsealed indictment in S.D. Tex. case
        4:23-cr-00523 (United States v. Xu Zewei).

Field	Purpose
`docket`	Must be one of this case’s `dockets` ids.
`url`	Absolute `https://` URL to a PDF. Anywhere — DoJ press releases, archived storage URLs, court websites.
`note`	Required. Tells the summary LLM what the document is and why it was added. The note rides into the prompt as trusted operator metadata; the document text itself is still treated as untrusted (the same way CourtListener / PACER text is).

case-calendar fetches the bytes through the same pypdf → OCR fallback chain as it does for CourtListener documents, then feeds them to the LLM as their own labeled section. Each entry’s LLM block is headed OPERATOR-PROVIDED DOCUMENT (sourced outside CourtListener) with the operator’s note line beneath it.

Keep the note short — one sentence that identifies the document by name plus a case citation. The note is data fed to the summary LLM. Bug numbers, workaround details, or “remove this once CourtListener fixes it” all belong in a # comment in config.yaml, not in note. The LLM is summarizing the case for public subscribers; any mention of CourtListener internals or tooling state in its output would be both off-topic and a leak of internal context.

Remove each extra_documents entry once the upstream gap closes.

Legal disclaimers

The index page renders a static <footer> block carrying two disclaimers:

Case descriptions are generated by AI and may contain mistakes.

Criminal defendants are presumed innocent unless and until convicted in a court of law.

Both are rendered by the page template, not by the LLM. The legally-loaded text is stable regardless of model output or prompt revision. The summary prompt explicitly tells the model NOT to include these — they’re the renderer’s responsibility.

Next steps

Public index page — how summaries get rendered.
Configuration — the complete config.yaml reference.