archivebox.misc.logging_util

Module Contents

Classes

RuntimeStats

mutable stats counter for logging archiving timing info to CLI output

TimedProgress

Show a progress bar and measure elapsed time until .end() is called

Functions

progress_bar

show timer in the form of progress bar, with percentage and seconds remaining

log_cli_command

log_importing_started

log_source_saved

log_parsing_finished

log_deduping_finished

log_crawl_started

log_indexing_process_started

log_indexing_process_finished

log_indexing_started

log_indexing_finished

log_archiving_started

log_archiving_paused

log_archiving_finished

log_snapshot_archiving_started

log_snapshot_archiving_finished

log_archive_method_started

log_archive_method_finished

quote the argument with whitespace in a command so the user can copy-paste the outputted string directly to run the cmd

log_list_started

log_list_finished

log_removal_started

log_removal_finished

log_index_started

pretty_path

convert paths like …/ArchiveBox/archivebox/../output/abc into output/abc

printable_filesize

format_duration

Format duration in human-readable form.

truncate_url

Truncate URL to max_length, keeping domain and adding ellipsis.

log_worker_event

Log a worker event with structured metadata and indentation.

printable_folders

printable_config

printable_folder_status

printable_dependency_version

Data

_LAST_RUN_STATS

API

class archivebox.misc.logging_util.RuntimeStats[source]

mutable stats counter for logging archiving timing info to CLI output

skipped: int[source]

0

succeeded: int[source]

0

failed: int[source]

0

parse_start_ts: datetime.datetime | None[source]

None

parse_end_ts: datetime.datetime | None[source]

None

index_start_ts: datetime.datetime | None[source]

None

index_end_ts: datetime.datetime | None[source]

None

archiving_start_ts: datetime.datetime | None[source]

None

archiving_end_ts: datetime.datetime | None[source]

None

archivebox.misc.logging_util._LAST_RUN_STATS[source]

β€˜RuntimeStats(…)’

class archivebox.misc.logging_util.TimedProgress(seconds, prefix='')[source]

Show a progress bar and measure elapsed time until .end() is called

Initialization

end()[source]

immediately end progress, clear the progressbar line, and save end_ts

archivebox.misc.logging_util.progress_bar(seconds: int, prefix: str = '', ANSI: dict[str, str] = ANSI) None[source]

show timer in the form of progress bar, with percentage and seconds remaining

archivebox.misc.logging_util.log_cli_command(subcommand: str, subcommand_args: collections.abc.Iterable[str] = (), stdin: str | IO | None = None, pwd: str = '.')[source]
archivebox.misc.logging_util.log_importing_started(urls: str | list[str], depth: int, index_only: bool)[source]
archivebox.misc.logging_util.log_source_saved(source_file: str)[source]
archivebox.misc.logging_util.log_parsing_finished(num_parsed: int, parser_name: str)[source]
archivebox.misc.logging_util.log_deduping_finished(num_new_links: int)[source]
archivebox.misc.logging_util.log_crawl_started(new_links)[source]
archivebox.misc.logging_util.log_indexing_process_started(num_links: int)[source]
archivebox.misc.logging_util.log_indexing_process_finished()[source]
archivebox.misc.logging_util.log_indexing_started(out_path: str)[source]
archivebox.misc.logging_util.log_indexing_finished(out_path: str)[source]
archivebox.misc.logging_util.log_archiving_started(num_links: int, resume: float | None = None)[source]
archivebox.misc.logging_util.log_archiving_paused(num_links: int, idx: int, timestamp: str)[source]
archivebox.misc.logging_util.log_archiving_finished(num_links: int)[source]
archivebox.misc.logging_util.log_snapshot_archiving_started(snapshot: archivebox.core.models.Snapshot, out_dir: str, is_new: bool)[source]
archivebox.misc.logging_util.log_snapshot_archiving_finished(snapshot: archivebox.core.models.Snapshot, out_dir: str, is_new: bool, stats: dict, start_ts: datetime.datetime)[source]
archivebox.misc.logging_util.log_archive_method_started(method: str)[source]
archivebox.misc.logging_util.log_archive_method_finished(result: dict)[source]

quote the argument with whitespace in a command so the user can copy-paste the outputted string directly to run the cmd

archivebox.misc.logging_util.log_list_started(filter_patterns: list[str] | None, filter_type: str)[source]
archivebox.misc.logging_util.log_list_finished(snapshots)[source]
archivebox.misc.logging_util.log_removal_started(snapshots, yes: bool, delete: bool)[source]
archivebox.misc.logging_util.log_removal_finished(remaining_links: int, removed_links: int)[source]
archivebox.misc.logging_util.log_index_started(url: str)[source]
archivebox.misc.logging_util.pretty_path(path: pathlib.Path | str, pwd: pathlib.Path | str = DATA_DIR, color: bool = True) str[source]

convert paths like …/ArchiveBox/archivebox/../output/abc into output/abc

archivebox.misc.logging_util.printable_filesize(num_bytes: int | float) str[source]
archivebox.misc.logging_util.format_duration(seconds: float) str[source]

Format duration in human-readable form.

archivebox.misc.logging_util.truncate_url(url: str, max_length: int = 60) str[source]

Truncate URL to max_length, keeping domain and adding ellipsis.

archivebox.misc.logging_util.log_worker_event(worker_type: str, event: str, indent_level: int = 0, pid: int | None = None, worker_id: str | None = None, url: str | None = None, plugin: str | None = None, metadata: dict[str, Any] | None = None, error: Exception | None = None) None[source]

Log a worker event with structured metadata and indentation.

Args: worker_type: Type of worker (Orchestrator, CrawlWorker, SnapshotWorker) event: Event name (Starting, Completed, Failed, etc.) indent_level: Indentation level (0=Orchestrator, 1=CrawlWorker, 2=SnapshotWorker) pid: Process ID worker_id: Worker ID (UUID for workers) url: URL being processed (for SnapshotWorker) plugin: Plugin name (for hook processes) metadata: Dict of metadata to show in curly braces error: Exception if event is an error

archivebox.misc.logging_util.printable_folders(folders: dict[str, Optional[archivebox.core.models.Snapshot]], with_headers: bool = False) str[source]
archivebox.misc.logging_util.printable_config(config: dict, prefix: str = '') str[source]
archivebox.misc.logging_util.printable_folder_status(name: str, folder: dict) str[source]
archivebox.misc.logging_util.printable_dependency_version(name: str, dependency: dict) str[source]