archivebox.cli.archivebox_extract

archivebox extract [snapshot_ids…] [–plugins=NAMES]

Run plugins on Snapshots. Accepts snapshot IDs as arguments, from stdin, or via JSONL.

Input formats: - Snapshot UUIDs (one per line) - JSONL: {“type”: “Snapshot”, “id”: “…”, “url”: “…”} - JSONL: {“type”: “ArchiveResult”, “snapshot_id”: “…”, “plugin”: “…”}

Output (JSONL): {“type”: “ArchiveResult”, “id”: “…”, “snapshot_id”: “…”, “plugin”: “…”, “status”: “…”}

Examples: # Extract specific snapshot archivebox extract 01234567-89ab-cdef-0123-456789abcdef

# Pipe from snapshot command
archivebox snapshot https://example.com | archivebox extract

# Run specific plugins only
archivebox extract --plugins=screenshot,singlefile 01234567-89ab-cdef-0123-456789abcdef

# Chain commands
archivebox crawl https://example.com | archivebox snapshot | archivebox extract

Module Contents

Functions

process_archiveresult_by_id

Re-run extraction for a single ArchiveResult by ID.

run_plugins

Run plugins on Snapshots from input.

is_archiveresult_id

Check if value looks like an ArchiveResult UUID.

main

Run plugins on Snapshots, or process existing ArchiveResults by ID

Data

__command__

API

archivebox.cli.archivebox_extract.__command__[source]

‘archivebox extract’

archivebox.cli.archivebox_extract.process_archiveresult_by_id(archiveresult_id: str) int[source]

Re-run extraction for a single ArchiveResult by ID.

ArchiveResults are projected status rows, not queued work items. Re-running a single result means resetting that row and queueing its parent snapshot through the shared crawl runner with the corresponding plugin selected.

archivebox.cli.archivebox_extract.run_plugins(args: tuple, records: list[dict] | None = None, plugins: str = '', wait: bool = True, emit_results: bool = True) int[source]

Run plugins on Snapshots from input.

Reads Snapshot IDs or JSONL from args/stdin, runs plugins, outputs JSONL.

Exit codes: 0: Success 1: Failure

archivebox.cli.archivebox_extract.is_archiveresult_id(value: str) bool[source]

Check if value looks like an ArchiveResult UUID.

archivebox.cli.archivebox_extract.main(plugins: str, wait: bool, args: tuple)[source]

Run plugins on Snapshots, or process existing ArchiveResults by ID