archivebox.config.common

Module Contents

Classes

ShellConfig

StorageConfig

GeneralConfig

ServerConfig

DatabaseConfig

ArchivingConfig

SearchBackendConfig

ArchiveBoxBaseConfig

Merged, typed ArchiveBox config.

Functions

_legacy_bool

permissions_from_legacy_public_flags

resolve_delete_after_config_value

_plugin_sensitive_config_keys

is_sensitive_config_key

True if a config key names a credential and must be write-only in the UI.

redact_sensitive_config

Return a copy of config with credential values replaced by ********.

normalize_runtime_config

Return config filtered for runtime/frozen usage, optionally JSON-safe.

build_crawl_config_snapshot

Build the frozen crawl config stored on Crawl.config at creation time.

rprint

parse_delete_after

_plugin_user_config_value

_plugin_user_config

_discover_plugin_config_schemas

_plugin_config_properties

_plugin_config_model

_archivebox_config_input_names

_build_archivebox_config_model

_normalize_plugins_config_value

_plugin_enabled_config_keys

_plugins_with_required_plugins

get_live_config_url

config_field_metadata

Return one centralized metadata map for core and plugin config fields.

find_config_section

find_config_default

config_field_type

find_config_type

find_config_source

Determine where a config value comes from.

get_request_config

Return the per-request ArchiveBox config, upgrading to plugin resolution if needed.

get_config

Get merged config from all sources.

get_all_configs

Get all config section objects as a dictionary.

Data

ConfigOverrides

ConfigPayload

PluginSchemaDocuments

LIVE_CONFIG_BASE_URL

_STDOUT_CONSOLE

_STDERR_CONSOLE

_WARNED_ARCHIVING_CONFIGS

_SENSITIVE_CONFIG_KEY_NEEDLES

SENSITIVE_CONFIG_VALUE_REDACTED

_SCOPE_CRAWL_FROZEN

_SCOPE_CRAWL_EXECUTION

_SCOPE_SERVER

PLUGIN_CONFIG_SCHEMAS

ArchiveBoxConfig

API

archivebox.config.common.ConfigOverrides[source]

None

archivebox.config.common.ConfigPayload[source]

None

archivebox.config.common.PluginSchemaDocuments[source]

None

archivebox.config.common.LIVE_CONFIG_BASE_URL[source]

‘/admin/environment/config/’

archivebox.config.common._STDOUT_CONSOLE[source]

‘Console(…)’

archivebox.config.common._STDERR_CONSOLE[source]

‘Console(…)’

archivebox.config.common._WARNED_ARCHIVING_CONFIGS: set[tuple[int, bool]][source]

‘set(…)’

archivebox.config.common._legacy_bool(value: object) bool | None[source]
archivebox.config.common.permissions_from_legacy_public_flags(raw_config: collections.abc.Mapping[str, object]) str | None[source]
archivebox.config.common.resolve_delete_after_config_value(*configs: collections.abc.Mapping[str, Any] | None) str[source]
archivebox.config.common._SENSITIVE_CONFIG_KEY_NEEDLES[source]

(‘TOKEN’, ‘SECRET’, ‘API_KEY’, ‘APIKEY’, ‘PASSWORD’)

archivebox.config.common.SENSITIVE_CONFIG_VALUE_REDACTED[source]

‘********’

archivebox.config.common._SCOPE_CRAWL_FROZEN[source]

‘crawl_frozen’

archivebox.config.common._SCOPE_CRAWL_EXECUTION[source]

‘crawl_execution’

archivebox.config.common._SCOPE_SERVER[source]

‘server’

archivebox.config.common._plugin_sensitive_config_keys() frozenset[str][source]
archivebox.config.common.is_sensitive_config_key(key: str) bool[source]

True if a config key names a credential and must be write-only in the UI.

Matches any key whose uppercase form contains TOKEN, SECRET, API_KEY, APIKEY, or PASSWORD — covers SECRET_KEY, OPENAI_API_KEY, TWOCAPTCHA_APIKEY, GITHUB_TOKEN, ADMIN_PASSWORD, etc. Centralized here so the KeyValueWidget (Machine/Crawl/Snapshot/Persona admin forms), the plugin config grid, REST API responses, and any future surface that round-trips raw config values all agree on which keys to redact.

archivebox.config.common.redact_sensitive_config(config: collections.abc.Mapping[str, Any] | None) dict[str, Any][source]

Return a copy of config with credential values replaced by ********.

Used wherever a config dict crosses an API/export/debug-dump boundary. The widget-side write-only treatment handles the form-render path; this helper handles every JSON-response path (REST schemas, to_json exports, admin debug views, etc.). Empty values are passed through unchanged so callers can still tell “unset” from “set-but-hidden.”

archivebox.config.common.normalize_runtime_config(config: archivebox.config.configset.BaseConfigSet | collections.abc.Mapping[str, Any] | str | None, *, only_crawl_execution: bool = False, exclude_runtime_derived: bool = False, exclude_crawl_execution: bool = False, json_safe: bool = True) dict[str, Any][source]

Return config filtered for runtime/frozen usage, optionally JSON-safe.

archivebox.config.common.build_crawl_config_snapshot(*, persona: Any = None, overrides: collections.abc.Mapping[str, Any] | None = None, base_config: ArchiveBoxBaseConfig | collections.abc.Mapping[str, object] | None = None) dict[str, Any][source]

Build the frozen crawl config stored on Crawl.config at creation time.

archivebox.config.common.rprint(*args, file=None, **kwargs)[source]
class archivebox.config.common.ShellConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘SHELL_CONFIG’

_scope: str[source]

‘PrivateAttr(…)’

DEBUG: bool[source]

‘Field(…)’

IS_TTY: bool[source]

‘Field(…)’

USE_COLOR: bool[source]

‘Field(…)’

SHOW_PROGRESS: bool[source]

‘Field(…)’

IN_DOCKER: bool[source]

‘Field(…)’

IN_QEMU: bool[source]

‘Field(…)’

ANSI: dict[str, str][source]

‘Field(…)’

property TERM_WIDTH: int[source]
property COMMIT_HASH: str | None[source]
property BUILD_TIME: str[source]
class archivebox.config.common.StorageConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘STORAGE_CONFIG’

_scope: str[source]

‘PrivateAttr(…)’

TMP_DIR: pathlib.Path[source]

‘Field(…)’

LIB_DIR: pathlib.Path[source]

‘Field(…)’

LIB_BIN_DIR: pathlib.Path[source]

‘Field(…)’

OUTPUT_PERMISSIONS: str[source]

‘Field(…)’

ENFORCE_ATOMIC_WRITES: bool[source]

‘Field(…)’

ALLOW_NO_UNIX_SOCKETS: bool[source]

‘Field(…)’

class archivebox.config.common.GeneralConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘GENERAL_CONFIG’

_scope: str[source]

‘PrivateAttr(…)’

TAG_SEPARATOR_PATTERN: str[source]

‘Field(…)’

class archivebox.config.common.ServerConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘SERVER_CONFIG’

_scope: str[source]

‘PrivateAttr(…)’

SERVER_SECURITY_MODES: ClassVar[tuple[str, ...]][source]

(‘safe-subdomains-fullreplay’, ‘safe-onedomain-nojsreplay’, ‘unsafe-onedomain-noadmin’, ‘danger-oned…

SECRET_KEY: str[source]

‘Field(…)’

BIND_ADDR: str[source]

‘Field(…)’

BASE_URL: str[source]

‘Field(…)’

ALLOWED_HOSTS: str[source]

‘Field(…)’

CSRF_TRUSTED_ORIGINS: str[source]

‘Field(…)’

SERVER_SECURITY_MODE: str[source]

‘Field(…)’

SNAPSHOTS_PER_PAGE: int[source]

‘Field(…)’

FOOTER_INFO: str[source]

‘Field(…)’

PUBLIC_INDEX: bool[source]

‘Field(…)’

PUBLIC_ADD_VIEW: bool[source]

‘Field(…)’

ADMIN_USERNAME: str | None[source]

‘Field(…)’

ADMIN_PASSWORD: str | None[source]

‘Field(…)’

REVERSE_PROXY_USER_HEADER: str[source]

‘Field(…)’

REVERSE_PROXY_WHITELIST: str[source]

‘Field(…)’

LOGOUT_REDIRECT_URL: str[source]

‘Field(…)’

validate_server_security_mode(v: str) str[source]
property USES_SUBDOMAIN_ROUTING: bool[source]
property ENABLES_FULL_JS_REPLAY: bool[source]
property CONTROL_PLANE_ENABLED: bool[source]
property BLOCK_UNSAFE_METHODS: bool[source]
property SHOULD_NEUTER_RISKY_REPLAY: bool[source]
property IS_UNSAFE_MODE: bool[source]
property IS_DANGEROUS_MODE: bool[source]
property IS_LOWER_SECURITY_MODE: bool[source]
class archivebox.config.common.DatabaseConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘DATABASE_CONFIG’

_scope: str[source]

‘PrivateAttr(…)’

DATABASE_NAME: str[source]

‘Field(…)’

SQLITE_JOURNAL_MODE: str[source]

‘Field(…)’

SQLITE_MMAP_SIZE: int[source]

‘Field(…)’

SQLITE_BUSY_TIMEOUT: int[source]

‘Field(…)’

SQLITE_LOCK_RETRY_TIMEOUT: float[source]

‘Field(…)’

SQLITE_LOCK_RETRY_INTERVAL: float[source]

‘Field(…)’

class archivebox.config.common.ArchivingConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘ARCHIVING_CONFIG’

_scope: str[source]

‘PrivateAttr(…)’

PLUGINS: str[source]

‘Field(…)’

ONLY_NEW: bool[source]

‘Field(…)’

INDEX_ONLY: bool[source]

‘Field(…)’

TIMEOUT: int[source]

‘Field(…)’

CRAWL_MAX_URLS: int[source]

‘Field(…)’

CRAWL_MAX_SIZE: int[source]

‘Field(…)’

CRAWL_TIMEOUT: int[source]

‘Field(…)’

CRAWL_MAX_CONCURRENT_SNAPSHOTS: int[source]

‘Field(…)’

SNAPSHOT_MAX_SIZE: int[source]

‘Field(…)’

RESOLUTION: str[source]

‘Field(…)’

CHECK_SSL_VALIDITY: bool[source]

‘Field(…)’

USER_AGENT: str[source]

‘Field(…)’

COOKIES_FILE: pathlib.Path | None[source]

‘Field(…)’

URL_DENYLIST: str[source]

‘Field(…)’

URL_ALLOWLIST: str | None[source]

‘Field(…)’

DEFAULT_PERSONA: str[source]

‘Field(…)’

PERMISSIONS: str[source]

‘Field(…)’

DELETE_AFTER: str[source]

‘Field(…)’

warn_if_invalid() None[source]
validate_check_ssl_validity(v)[source]

SIDE EFFECT: disable “you really shouldnt disable ssl” warnings emitted by requests

classmethod validate_delete_after(value)[source]
classmethod validate_permissions(value)[source]
property URL_ALLOWLIST_PTN: re.Pattern | None[source]
property URL_DENYLIST_PTN: re.Pattern[source]
archivebox.config.common.parse_delete_after(value) datetime.timedelta | None[source]
class archivebox.config.common.SearchBackendConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘SEARCH_BACKEND_CONFIG’

_scope: str[source]

‘PrivateAttr(…)’

SEARCH_BACKEND_ENGINE: str[source]

‘Field(…)’

archivebox.config.common._plugin_user_config_value(value: Any) str[source]
archivebox.config.common._plugin_user_config(config: collections.abc.Mapping[str, object]) dict[str, str][source]
archivebox.config.common._discover_plugin_config_schemas() archivebox.config.common.PluginSchemaDocuments[source]
archivebox.config.common._plugin_config_properties(plugin_schemas: archivebox.config.common.PluginSchemaDocuments) dict[str, dict[str, Any]][source]
archivebox.config.common._plugin_config_model(plugin_schemas: archivebox.config.common.PluginSchemaDocuments) type[pydantic.BaseModel][source]
archivebox.config.common._archivebox_config_input_names() set[str][source]
class archivebox.config.common.ArchiveBoxBaseConfig[source]

Bases: archivebox.config.common.ShellConfig, archivebox.config.common.StorageConfig, archivebox.config.common.GeneralConfig, archivebox.config.common.ServerConfig, archivebox.config.common.DatabaseConfig, archivebox.config.common.ArchivingConfig, archivebox.config.common.SearchBackendConfig, archivebox.config.ldap.LDAPConfig

Merged, typed ArchiveBox config.

Core ArchiveBox fields are declared above. Plugin-owned fields are added to the concrete ArchiveBoxConfig model from plugin JSONSchema below, so ArchiveBox does not hardcode any individual plugin config names.

model_config[source]

‘SettingsConfigDict(…)’

computed_config_keys: ClassVar[tuple[str, ...]][source]

None

classmethod _core_config_classes() tuple[type[archivebox.config.configset.BaseConfigSet], ...][source]
classmethod _core_field_scope(key: str) str | None[source]
classmethod _plugin_field_scope(key: str) str | None[source]
classmethod scope_for_key(key: str) str[source]
classmethod _scope_by_key() dict[str, str][source]
classmethod _crawl_frozen_keys() frozenset[str][source]
classmethod _crawl_runtime_keys() frozenset[str][source]
classmethod runtime_derived_config_keys() frozenset[str][source]
_scoped_config(*, include_execution: bool) dict[str, Any][source]
for_crawl() dict[str, Any][source]

Config scoped to crawl execution, without runtime object overlays.

for_crawl_frozen(*, persona: Any = None) dict[str, Any][source]

Config safe to persist permanently on Crawl.config.

for_crawl_runtime(*, crawl: Any = None, snapshot: Any = None, persona: Any = None, runtime_overrides: collections.abc.Mapping[str, Any] | None = None, extra_context: collections.abc.Mapping[str, Any] | None = None, crawl_output_dir: Any = None, snapshot_output_dir: Any = None) dict[str, Any][source]

Config payload safe to pass to crawl/snapshot hook execution.

resolve_runtime_paths()[source]
derive_plugin_enabled_config()[source]
archivebox.config.common._build_archivebox_config_model(plugin_schemas: archivebox.config.common.PluginSchemaDocuments) type[archivebox.config.common.ArchiveBoxBaseConfig][source]
archivebox.config.common.PLUGIN_CONFIG_SCHEMAS[source]

‘_discover_plugin_config_schemas(…)’

archivebox.config.common.ArchiveBoxConfig[source]

‘_build_archivebox_config_model(…)’

archivebox.config.common._normalize_plugins_config_value(value: Any) set[str][source]
archivebox.config.common._plugin_enabled_config_keys() dict[str, str][source]
archivebox.config.common._plugins_with_required_plugins(plugin_names: set[str]) set[str][source]
archivebox.config.common.get_live_config_url(key: str) str[source]
archivebox.config.common.config_field_metadata() dict[str, dict[str, Any]][source]

Return one centralized metadata map for core and plugin config fields.

archivebox.config.common.find_config_section(key: str) str[source]
archivebox.config.common.find_config_default(key: str) str[source]
archivebox.config.common.config_field_type(key: str) str[source]
archivebox.config.common.find_config_type(key: str) str[source]
archivebox.config.common.find_config_source(key: str, merged_config: collections.abc.Mapping[str, Any]) str[source]

Determine where a config value comes from.

archivebox.config.common.get_request_config(request: Any, *, resolve_plugins: bool = False) archivebox.config.common.ArchiveBoxBaseConfig[source]

Return the per-request ArchiveBox config, upgrading to plugin resolution if needed.

archivebox.config.common.get_config(defaults: archivebox.config.common.ConfigOverrides | None = None, overrides: archivebox.config.common.ConfigOverrides | None = None, base_config: archivebox.config.common.ArchiveBoxBaseConfig | collections.abc.Mapping[str, object] | None = None, persona: Any = None, crawl: Any = None, snapshot: Any = None, machine: Any = None, include_machine: bool = True, resolve_plugins: bool = True, redact_sensitive: bool = False) archivebox.config.common.ArchiveBoxBaseConfig[source]

Get merged config from all sources.

Defaults are hydrated by pydantic from core/plugin defaults, ArchiveBox.conf, and environment variables. Persisted Machine/Persona values then apply for live crawl-execution scope, while Crawl/Snapshot rows apply their frozen crawl-scope values. Explicit overrides win last.

Crawl-execution config is not persisted on Crawl.config. It is rederived from current Machine/Persona state and hydrated process defaults each time.

archivebox.config.common.get_all_configs() dict[str, archivebox.config.configset.BaseConfigSet][source]

Get all config section objects as a dictionary.