Skip to the content.

A subscribable calendar of dates is useful. A subscribable calendar that also tells you what each case is about in a couple of sentences is more useful — especially when you’re tracking 30 cases and can’t remember which Wang is which.

When enabled, case-calendar generates a 2-4 sentence prose summary for each docket and renders it on the public index page next to the case row. Summaries are opt-in, off by default, and only run on dockets where the source documents actually support a confident answer (the LLM is instructed to refuse rather than fabricate when they don’t).

← Back to docs

What gets summarized

The summary pipeline pulls three sets of source documents for each docket:

  1. Primary document — the latest indictment / superseding indictment / information for criminal dockets; the latest amended complaint / complaint / petition for civil. Establishes who’s involved and what the case is about.
  2. Disposition documents — judgments, plea agreements, verdict forms, orders of dismissal, dispositive memoranda. Anything that materially changes “where does the case stand”. A dispositive order in a busy civil case can be a hundred pages back from the latest entry, so the pipeline walks several pages of the docket newest-first to find it.
  3. Operator-provided documents (optional, see extra_documents below) — anything you’ve manually pointed the pipeline at to fill a CourtListener data gap.

Those documents — plus a structured scaffold of the hearings and deadlines the extractor already recorded — go into a single LLM call. The model returns prose; case-calendar persists it to the case_summaries table.

The page-rendered output looks like:

Mr. Jones is charged in the Northern District of Texas with one count of wire fraud conspiracy and five counts of wire fraud for his alleged role in a $2.6 million online romance-scam scheme. He pled guilty to the conspiracy count on January 14, 2025 pursuant to a plea agreement; sentencing is scheduled for May 28, 2026. Co-defendant Smith remains a fugitive abroad.

A live deployment with real summaries on real federal-court dockets is at casecalendar.net.

Enabling summaries

Add a top-level block to config.yaml:

case_summaries:
  enabled: true
  # provider: anthropic
  # model: claude-sonnet-4-6
  # allow_ocr: true
  # debounce_seconds: 300
Key Required Purpose
enabled yes Master switch. Defaults to false.
provider no Force a specific provider (anthropic / openai / gemini). Defaults to whichever LLM key is set, or LLM_SUMMARY_PROVIDER.
model no Override the model. Defaults to Sonnet / GPT-5.4 / Gemini Pro depending on provider.
allow_ocr no Run local OCR fallback on PDFs CourtListener hasn’t extracted. Defaults to true. Set to false to skip tesseract entirely.
debounce_seconds no Webhook-only. How many seconds of quiet to wait after the last summary-relevant entry before re-running the LLM. Defaults to 300. Polling syncs ignore this — they regenerate immediately.

When enabled: true, summaries auto-refresh as part of sync and serve: whenever the syncer sees a new primary document or disposition, it flips the row’s stale flag. At the end of the sync (or after the debounce timer fires in serve), the pipeline regenerates every stale row before re-emitting the index. The page reflects the case’s current posture without you running anything manually.

Cost

Summaries run on a higher-tier model than the extractor pipeline — the synthesis task warrants the upgrade. Defaults:

Provider Default model
Anthropic Claude Sonnet 4.6
OpenAI GPT-5.4
Gemini Gemini 2.5 Pro

Budget roughly $0.10–0.60 per docket for the first run, near-zero on subsequent runs (existing rows are reused unless the docket got a new primary document or disposition). On a 30-case calendar you’ll probably spend a few dollars to backfill and pennies a week thereafter.

To force a regeneration after a model upgrade or prompt change:

uv run case-calendar summarize --force
# or, bundled into a polling sync to share the CourtListener session:
uv run case-calendar sync --force-summaries

The “insufficient documents” refusal

The summary LLM is instructed to refuse rather than fabricate when its inputs are too sparse to support a confident summary. If the primary document text is empty (image-only PDF that didn’t OCR), garbled (custom font subsets — see the installation page), or otherwise lacks the substance needed to identify the parties and the gist of the charges or claims, the model emits this exact sentence verbatim:

Documents available for this docket are insufficient to generate a reliable summary.

That gets stored and rendered like any other summary. Subscribers see the honest acknowledgement instead of a plausible-sounding hallucination, and operators can grep for the sentence in the database to find dockets that need attention (typically: install poppler/tesseract for local OCR, or point extra_documents at an out-of-band source).

This rule is one of several guardrails baked into the prompt. The model is also told:

Multi-docket aggregation

For cases that span multiple dockets (district + appellate; co-defendants on separate dockets; parallel filings), the AI summary is generated per docket, then rendered as a labeled paragraph block on the index page:

3:24-cv-00100 (N.D. Cal.): The district court suit alleges …

24-12345 (9th Cir.): The Ninth Circuit appeal challenges …

To frame the litigation strategy for the model, add an aggregation_note on the case:

- id: anthropic-v-dow
  name: "Anthropic v. DOW"
  calendar: tech
  dockets: [72380208, 72379655, 73136734]
  aggregation_note: >-
    Parallel suits challenging separate Department of War actions taken
    under distinct statutory authorities, each filed in the proper venue
    for the action it targets.

The note is only shown to the summarizer. It’s not rendered to subscribers. Keep it short and factual — the model uses it as framing, not as text to copy.

extra_documents

CourtListener and PACER sometimes don’t surface documents the public should be able to see. Two real failure modes the project has hit:

For those cases, point case-calendar at the document directly. Each entry needs three fields:

- id: us-v-zewei
  name: "United States v. Zewei"
  calendar: cybercrime
  dockets: [70789744]
  extra_documents:
    - docket: 70789744
      url: https://www.justice.gov/opa/media/1407196/dl
      note: >-
        This PDF is the unsealed indictment in S.D. Tex. case
        4:23-cr-00523 (United States v. Xu Zewei).
Field Purpose
docket Must be one of this case’s dockets ids.
url Absolute https:// URL to a PDF. Anywhere — DoJ press releases, archived storage URLs, court websites.
note Required. Tells the summary LLM what the document is and why it was added. The note rides into the prompt as trusted operator metadata; the document text itself is still treated as untrusted (the same way CourtListener / PACER text is).

case-calendar fetches the bytes through the same pypdf → OCR fallback chain as it does for CourtListener documents, then feeds them to the LLM as their own labeled section. Each entry’s LLM block is headed OPERATOR-PROVIDED DOCUMENT (sourced outside CourtListener) with the operator’s note line beneath it.

Keep the note short — one sentence that identifies the document by name plus a case citation. The note is data fed to the summary LLM. Bug numbers, workaround details, or “remove this once CourtListener fixes it” all belong in a # comment in config.yaml, not in note. The LLM is summarizing the case for public subscribers; any mention of CourtListener internals or tooling state in its output would be both off-topic and a leak of internal context.

Remove each extra_documents entry once the upstream gap closes.

The index page renders a static <footer> block carrying two disclaimers:

Case descriptions are generated by AI and may contain mistakes.

Criminal defendants are presumed innocent unless and until convicted in a court of law.

Both are rendered by the page template, not by the LLM. The legally-loaded text is stable regardless of model output or prompt revision. The summary prompt explicitly tells the model NOT to include these — they’re the renderer’s responsibility.

Next steps