archivebox.config.common

Module Contents

Classes

ShellConfig

StorageConfig

GeneralConfig

ServerConfig

ArchivingConfig

SearchBackendConfig

ArchiveBoxBaseConfig

Merged, typed ArchiveBox config.

Functions

rprint

_print_server_security_mode_warning

_plugin_user_config_value

_plugin_user_config

_discover_plugin_config_schemas

_plugin_config_properties

_plugin_config_model

_archivebox_config_input_names

_build_archivebox_config_model

get_config

Get merged config from all sources.

get_all_configs

Get all config section objects as a dictionary.

Data

ConfigOverrides

ConfigPayload

PluginSchemaDocuments

_STDOUT_CONSOLE

_STDERR_CONSOLE

_WARNED_SERVER_SECURITY_MODES

_WARNED_ARCHIVING_CONFIGS

PLUGIN_CONFIG_SCHEMAS

ArchiveBoxConfig

API

archivebox.config.common.ConfigOverrides[source]

None

archivebox.config.common.ConfigPayload[source]

None

archivebox.config.common.PluginSchemaDocuments[source]

None

archivebox.config.common._STDOUT_CONSOLE[source]

‘Console(…)’

archivebox.config.common._STDERR_CONSOLE[source]

‘Console(…)’

archivebox.config.common._WARNED_SERVER_SECURITY_MODES: set[str][source]

‘set(…)’

archivebox.config.common._WARNED_ARCHIVING_CONFIGS: set[tuple[int, bool]][source]

‘set(…)’

archivebox.config.common.rprint(*args, file=None, **kwargs)[source]
class archivebox.config.common.ShellConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘SHELL_CONFIG’

DEBUG: bool[source]

‘Field(…)’

IS_TTY: bool[source]

‘Field(…)’

USE_COLOR: bool[source]

‘Field(…)’

SHOW_PROGRESS: bool[source]

‘Field(…)’

IN_DOCKER: bool[source]

‘Field(…)’

IN_QEMU: bool[source]

‘Field(…)’

ANSI: dict[str, str][source]

‘Field(…)’

property TERM_WIDTH: int[source]
property COMMIT_HASH: str | None[source]
property BUILD_TIME: str[source]
class archivebox.config.common.StorageConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘STORAGE_CONFIG’

ARCHIVE_DIR: pathlib.Path[source]

‘Field(…)’

USERS_DIR: pathlib.Path[source]

‘Field(…)’

TMP_DIR: pathlib.Path[source]

‘Field(…)’

LIB_DIR: pathlib.Path[source]

‘Field(…)’

LIB_BIN_DIR: pathlib.Path[source]

‘Field(…)’

CUSTOM_TEMPLATES_DIR: pathlib.Path[source]

‘Field(…)’

OUTPUT_PERMISSIONS: str[source]

‘Field(…)’

RESTRICT_FILE_NAMES: str[source]

‘Field(…)’

ENFORCE_ATOMIC_WRITES: bool[source]

‘Field(…)’

DIR_OUTPUT_PERMISSIONS: str[source]

‘Field(…)’

class archivebox.config.common.GeneralConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘GENERAL_CONFIG’

TAG_SEPARATOR_PATTERN: str[source]

‘Field(…)’

class archivebox.config.common.ServerConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘SERVER_CONFIG’

SERVER_SECURITY_MODES: ClassVar[tuple[str, ...]][source]

(‘safe-subdomains-fullreplay’, ‘safe-onedomain-nojsreplay’, ‘unsafe-onedomain-noadmin’, ‘danger-oned…

SECRET_KEY: str[source]

‘Field(…)’

BIND_ADDR: str[source]

‘Field(…)’

LISTEN_HOST: str[source]

‘Field(…)’

ADMIN_BASE_URL: str[source]

‘Field(…)’

ARCHIVE_BASE_URL: str[source]

‘Field(…)’

ALLOWED_HOSTS: str[source]

‘Field(…)’

CSRF_TRUSTED_ORIGINS: str[source]

‘Field(…)’

SERVER_SECURITY_MODE: str[source]

‘Field(…)’

SNAPSHOTS_PER_PAGE: int[source]

‘Field(…)’

PREVIEW_ORIGINALS: bool[source]

‘Field(…)’

FOOTER_INFO: str[source]

‘Field(…)’

PUBLIC_INDEX: bool[source]

‘Field(…)’

PUBLIC_SNAPSHOTS: bool[source]

‘Field(…)’

PUBLIC_SNAPSHOTS_LIST: bool | None[source]

‘Field(…)’

PUBLIC_ADD_VIEW: bool[source]

‘Field(…)’

ADMIN_USERNAME: str | None[source]

‘Field(…)’

ADMIN_PASSWORD: str | None[source]

‘Field(…)’

REVERSE_PROXY_USER_HEADER: str[source]

‘Field(…)’

REVERSE_PROXY_WHITELIST: str[source]

‘Field(…)’

LOGOUT_REDIRECT_URL: str[source]

‘Field(…)’

validate_server_security_mode(v: str) str[source]
property USES_SUBDOMAIN_ROUTING: bool[source]
property ENABLES_FULL_JS_REPLAY: bool[source]
property CONTROL_PLANE_ENABLED: bool[source]
property BLOCK_UNSAFE_METHODS: bool[source]
property SHOULD_NEUTER_RISKY_REPLAY: bool[source]
property IS_UNSAFE_MODE: bool[source]
property IS_DANGEROUS_MODE: bool[source]
property IS_LOWER_SECURITY_MODE: bool[source]
archivebox.config.common._print_server_security_mode_warning(config: archivebox.config.common.ServerConfig) None[source]
class archivebox.config.common.ArchivingConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘ARCHIVING_CONFIG’

PLUGINS: str[source]

‘Field(…)’

ENABLED_PLUGINS: str[source]

‘Field(…)’

ENABLED_EXTRACTORS: str[source]

‘Field(…)’

ONLY_NEW: bool[source]

‘Field(…)’

OVERWRITE: bool[source]

‘Field(…)’

TIMEOUT: int[source]

‘Field(…)’

MAX_URL_ATTEMPTS: int[source]

‘Field(…)’

MAX_DEPTH: int[source]

‘Field(…)’

MAX_URLS: int[source]

‘Field(…)’

MAX_SIZE: int[source]

‘Field(…)’

RESOLUTION: str[source]

‘Field(…)’

CHECK_SSL_VALIDITY: bool[source]

‘Field(…)’

USER_AGENT: str[source]

‘Field(…)’

COOKIES_FILE: pathlib.Path | None[source]

‘Field(…)’

URL_DENYLIST: str[source]

‘Field(…)’

URL_ALLOWLIST: str | None[source]

‘Field(…)’

SAVE_ALLOWLIST: dict[str, list[str]][source]

‘Field(…)’

SAVE_DENYLIST: dict[str, list[str]][source]

‘Field(…)’

DEFAULT_PERSONA: str[source]

‘Field(…)’

warn_if_invalid() None[source]
validate_check_ssl_validity(v)[source]

SIDE EFFECT: disable “you really shouldnt disable ssl” warnings emitted by requests

property URL_ALLOWLIST_PTN: re.Pattern | None[source]
property URL_DENYLIST_PTN: re.Pattern[source]
property SAVE_ALLOWLIST_PTNS: dict[re.Pattern, list[str]][source]
property SAVE_DENYLIST_PTNS: dict[re.Pattern, list[str]][source]
class archivebox.config.common.SearchBackendConfig[source]

Bases: archivebox.config.configset.BaseConfigSet

toml_section_header: str[source]

‘SEARCH_BACKEND_CONFIG’

USE_INDEXING_BACKEND: bool[source]

‘Field(…)’

USE_SEARCHING_BACKEND: bool[source]

‘Field(…)’

SEARCH_BACKEND_ENGINE: str[source]

‘Field(…)’

SEARCH_PROCESS_HTML: bool[source]

‘Field(…)’

archivebox.config.common._plugin_user_config_value(value: Any) str[source]
archivebox.config.common._plugin_user_config(config: collections.abc.Mapping[str, object]) dict[str, str][source]
archivebox.config.common._discover_plugin_config_schemas() archivebox.config.common.PluginSchemaDocuments[source]
archivebox.config.common._plugin_config_properties(plugin_schemas: archivebox.config.common.PluginSchemaDocuments) dict[str, dict[str, Any]][source]
archivebox.config.common._plugin_config_model(plugin_schemas: archivebox.config.common.PluginSchemaDocuments) type[pydantic.BaseModel][source]
archivebox.config.common._archivebox_config_input_names() set[str][source]
class archivebox.config.common.ArchiveBoxBaseConfig[source]

Bases: archivebox.config.common.ShellConfig, archivebox.config.common.StorageConfig, archivebox.config.common.GeneralConfig, archivebox.config.common.ServerConfig, archivebox.config.common.ArchivingConfig, archivebox.config.common.SearchBackendConfig, archivebox.config.ldap.LDAPConfig

Merged, typed ArchiveBox config.

Core ArchiveBox fields are declared above. Plugin-owned fields are added to the concrete ArchiveBoxConfig model from plugin JSONSchema below, so ArchiveBox does not hardcode any individual plugin config names.

model_config[source]

‘SettingsConfigDict(…)’

DATA_DIR: pathlib.Path[source]

‘Field(…)’

ABX_RUNTIME: str[source]

‘Field(…)’

CRAWL_DIR: pathlib.Path | None[source]

‘Field(…)’

CRAWL_OUTPUT_DIR: pathlib.Path | None[source]

‘Field(…)’

SNAP_DIR: pathlib.Path | None[source]

‘Field(…)’

computed_config_keys: ClassVar[tuple[str, ...]][source]

None

resolve_runtime_paths()[source]
archivebox.config.common._build_archivebox_config_model(plugin_schemas: archivebox.config.common.PluginSchemaDocuments) type[archivebox.config.common.ArchiveBoxBaseConfig][source]
archivebox.config.common.PLUGIN_CONFIG_SCHEMAS[source]

‘_discover_plugin_config_schemas(…)’

archivebox.config.common.ArchiveBoxConfig[source]

‘_build_archivebox_config_model(…)’

archivebox.config.common.get_config(defaults: archivebox.config.common.ConfigOverrides | None = None, overrides: archivebox.config.common.ConfigOverrides | None = None, persona: Any = None, user: Any = None, crawl: Any = None, snapshot: Any = None, archiveresult: Any = None, machine: Any = None) archivebox.config.common.ArchiveBoxBaseConfig[source]

Get merged config from all sources.

Priority (highest to lowest):

  1. Explicit overrides

  2. Per-snapshot config and output path

  3. Per-crawl config and output path

  4. Per-user config

  5. Per-persona derived config

  6. Current machine derived config

  7. Environment variables

  8. Config file (ArchiveBox.conf)

  9. Plugin schema defaults

  10. Core config defaults

archivebox.config.common.get_all_configs() dict[str, archivebox.config.configset.BaseConfigSet][source]

Get all config section objects as a dictionary.