Skip to content
ccrawl

v0.2.5

Fix web-graph release lookup when commoncrawl.org uses absolute hrefs.

Bug fixes

Web-graph release lookup (ccrawl host dataset, ccrawl host list, ccrawl rank): the commoncrawl.org/web-graphs page changed from relative hrefs (href="cc-main-...") to absolute URLs (href="https://data.commoncrawl.org/projects/hyperlinkgraph/cc-main-.../index.html"). The previous regex matched only the old form; this release fixes it to match the hyperlinkgraph/cc-main-... path segment present in both forms.