archivebox.cli.archivebox_snapshot
archivebox snapshot
Manage Snapshot records.
Actions: create - Create Snapshots from URLs or Crawl JSONL list - List Snapshots as JSONL (with optional filters) update - Update Snapshots from stdin JSONL delete - Delete Snapshots from stdin JSONL
Examples: # Create archivebox snapshot create https://example.com –tag=news archivebox crawl create https://example.com | archivebox snapshot create
# List with filters
archivebox snapshot list --status=queued
archivebox snapshot list --url__icontains=example.com
# Update
archivebox snapshot list --tag=old | archivebox snapshot update --tag=new
# Delete
archivebox snapshot list --url__icontains=spam.com | archivebox snapshot delete --yes
Module Contents
Functions
Create Snapshots from URLs or stdin JSONL (Crawl or Snapshot records). Pass-through: Records that are not Crawl/Snapshot/URL are output unchanged. |
|
List Snapshots as JSONL with optional filters. |
|
Update Snapshots from stdin JSONL. |
|
Delete Snapshots from stdin JSONL. |
|
Manage Snapshot records. |
|
Create Snapshots from URLs or stdin JSONL. |
|
List Snapshots as JSONL. |
|
Update Snapshots from stdin JSONL. |
|
Delete Snapshots from stdin JSONL. |
Data
API
- archivebox.cli.archivebox_snapshot.create_snapshots(urls: collections.abc.Iterable[str], tag: str = '', status: str = 'queued', depth: int = 0, created_by_id: int | None = None) int[source]
Create Snapshots from URLs or stdin JSONL (Crawl or Snapshot records). Pass-through: Records that are not Crawl/Snapshot/URL are output unchanged.
Exit codes: 0: Success 1: Failure
- archivebox.cli.archivebox_snapshot.list_snapshots(status: str | None = None, url__icontains: str | None = None, url__istartswith: str | None = None, tag: str | None = None, crawl_id: str | None = None, limit: int | None = None, sort: str | None = None, csv: str | None = None, with_headers: bool = False, search: str | None = None, query: str | None = None) int[source]
List Snapshots as JSONL with optional filters.
Exit codes: 0: Success (even if no results)
- archivebox.cli.archivebox_snapshot.update_snapshots(status: str | None = None, tag: str | None = None) int[source]
Update Snapshots from stdin JSONL.
Reads Snapshot records from stdin and applies updates. Uses PATCH semantics - only specified fields are updated.
Exit codes: 0: Success 1: No input or error
- archivebox.cli.archivebox_snapshot.delete_snapshots(yes: bool = False, dry_run: bool = False) int[source]
Delete Snapshots from stdin JSONL.
Requires –yes flag to confirm deletion.
Exit codes: 0: Success 1: No input or missing –yes flag
- archivebox.cli.archivebox_snapshot.create_cmd(urls: tuple, tag: str, status: str, depth: int)[source]
Create Snapshots from URLs or stdin JSONL.
- archivebox.cli.archivebox_snapshot.list_cmd(status: str | None, url__icontains: str | None, url__istartswith: str | None, tag: str | None, crawl_id: str | None, limit: int | None, sort: str | None, csv: str | None, with_headers: bool, search: str | None, query: tuple[str, ...])[source]
List Snapshots as JSONL.