Skip to content
ccrawl

v0.2.1

New guides for all v2 command groups.

Released 2026-06-18

v0.2.1 is a documentation release. No binary changes were made.

New guides

Six new task-oriented guides cover the command groups introduced in v0.2.0:

  • Host graph and enrichment — enumerate 262M hosts, join web-graph topology, aggregate per-host CDX statistics.
  • Building a recrawl engine — seed from the CC host list, fetch live pages, and write WARC or JSONL output.
  • Recrawl scheduling — score URLs by harmonic rank and change rate, diff two CDX snapshots to find changed pages.
  • Building a search index — build a BM25 inverted index from a page corpus and query it in milliseconds.
  • Content signals — extract text, measure quality, and map outlinks from live pages or stored WARCs.
  • API server — serve a local index over HTTP with four REST endpoints.

Documentation fixes

  • Quick-start "Where to next" section now links to all six new guides.
  • Guides index description updated to cover the full v2 command surface.