`archivebox.cli.archivebox_extract`

archivebox extract [snapshot_ids…] [–plugins=NAMES]

Run plugins on Snapshots. Accepts snapshot IDs as arguments, from stdin, or via JSONL.

Input formats: - Snapshot UUIDs (one per line) - JSONL: {“type”: “Snapshot”, “id”: “…”, “url”: “…”} - JSONL: {“type”: “ArchiveResult”, “snapshot_id”: “…”, “plugin”: “…”}

Output (JSONL): {“type”: “ArchiveResult”, “id”: “…”, “snapshot_id”: “…”, “plugin”: “…”, “status”: “…”}

Examples: # Extract specific snapshot archivebox extract 01234567-89ab-cdef-0123-456789abcdef

# Pipe from snapshot command
archivebox snapshot https://example.com | archivebox extract

# Run specific plugins only
archivebox extract --plugins=screenshot,singlefile 01234567-89ab-cdef-0123-456789abcdef

# Chain commands
archivebox crawl https://example.com | archivebox snapshot | archivebox extract

Module Contents

Functions

`process_archiveresult_by_id`	Re-run extraction for a single ArchiveResult by ID.
`run_plugins`	Run plugins on Snapshots from input.
`is_archiveresult_id`	Check if value resolves to an ArchiveResult ID.
`main`	Run plugins on Snapshots, or process existing ArchiveResults by ID

Data

__command__

API

archivebox.cli.archivebox_extract.__command__[source]: ‘archivebox extract’

archivebox.cli.archivebox_extract.process_archiveresult_by_id(archiveresult_id: str) → int[source]

Re-run extraction for a single ArchiveResult by ID.

ArchiveResults are projected status rows, not queued work items. Re-running a single result means resetting that row and queueing its parent snapshot through the shared crawl runner with the corresponding plugin selected.

archivebox.cli.archivebox_extract.run_plugins(args: tuple, records: list[dict] | None = None, plugins: str = '', wait: bool = True, emit_results: bool = True, show_progress: bool = True, preserve_queued: bool = False) → int[source]

Run plugins on Snapshots from input.

Reads Snapshot IDs or JSONL from args/stdin, runs plugins, outputs JSONL.

Exit codes: 0: Success 1: Failure

archivebox.cli.archivebox_extract.is_archiveresult_id(value: str) → bool[source]: Check if value resolves to an ArchiveResult ID.

archivebox.cli.archivebox_extract.main(plugins: str, wait: bool, args: tuple)[source]: Run plugins on Snapshots, or process existing ArchiveResults by ID

archivebox.cli.archivebox_extract

Module Contents

Functions

Data

API

`archivebox.cli.archivebox_extract`