archivebox.config.configset

Simplified config system for ArchiveBox.

This replaces the complex abx_spec_config/base_configset.py with a simpler approach that still supports environment variables, config files, and per-object overrides.

Module Contents

Classes

CaseConfigParser

IniConfigSettingsSource

Custom settings source that reads from ArchiveBox.conf (INI format). Flattens all sections into a single namespace.

BaseConfigSet

Base class for config sections.

Functions

get_config

Get merged config from all sources.

get_flat_config

Get a flat dictionary of all config values.

get_all_configs

Get all config section objects as a dictionary.

_parse_env_value

Parse an environment variable value based on expected type.

get_worker_concurrency

Get worker concurrency settings.

Data

DEFAULT_WORKER_CONCURRENCY

API

class archivebox.config.configset.CaseConfigParser(defaults=None, dict_type=_default_dict, allow_no_value=False, *, delimiters=('=', ':'), comment_prefixes=('#', ';'), inline_comment_prefixes=None, strict=True, empty_lines_in_values=True, default_section=DEFAULTSECT, interpolation=_UNSET, converters=_UNSET)[source]

Bases: configparser.ConfigParser

optionxform(optionstr: str) str[source]
class archivebox.config.configset.IniConfigSettingsSource[source]

Bases: pydantic_settings.PydanticBaseSettingsSource

Custom settings source that reads from ArchiveBox.conf (INI format). Flattens all sections into a single namespace.

get_field_value(field: Any, field_name: str) tuple[Any, str, bool][source]
__call__() dict[str, Any][source]
_load_config_file() dict[str, Any][source]
class archivebox.config.configset.BaseConfigSet[source]

Bases: pydantic_settings.BaseSettings

Base class for config sections.

Automatically loads values from (highest to lowest priority):

  1. Environment variables

  2. ArchiveBox.conf file (INI format, flattened)

  3. Default values

Subclasses define fields with defaults and types:

class ShellConfig(BaseConfigSet):
    DEBUG: bool = Field(default=False)
    USE_COLOR: bool = Field(default=True)
model_config[source]

‘SettingsConfigDict(…)’

classmethod settings_customise_sources(settings_cls: type[pydantic_settings.BaseSettings], init_settings: pydantic_settings.PydanticBaseSettingsSource, env_settings: pydantic_settings.PydanticBaseSettingsSource, dotenv_settings: pydantic_settings.PydanticBaseSettingsSource, file_secret_settings: pydantic_settings.PydanticBaseSettingsSource) tuple[pydantic_settings.PydanticBaseSettingsSource, ...][source]

Define the order of settings sources (first = highest priority).

classmethod load_from_file(config_path: pathlib.Path) dict[str, str][source]

Load config values from INI file.

update_in_place(warn: bool = True, persist: bool = False, **kwargs) None[source]

Update config values in place.

This allows runtime updates to config without reloading.

archivebox.config.configset.get_config(defaults: dict | None = None, persona: Any = None, user: Any = None, crawl: Any = None, snapshot: Any = None, archiveresult: Any = None, machine: Any = None) dict[str, Any][source]

Get merged config from all sources.

Priority (highest to lowest):

  1. Per-snapshot config (snapshot.config JSON field)

  2. Per-crawl config (crawl.config JSON field)

  3. Per-user config (user.config JSON field)

  4. Per-persona config (persona.get_derived_config() - includes CHROME_USER_DATA_DIR etc.)

  5. Environment variables

  6. Config file (ArchiveBox.conf)

  7. Plugin schema defaults (config.json)

  8. Core config defaults

Args: defaults: Default values to start with persona: Persona object (provides derived paths like CHROME_USER_DATA_DIR) user: User object with config JSON field crawl: Crawl object with config JSON field snapshot: Snapshot object with config JSON field archiveresult: ArchiveResult object (auto-fetches snapshot) machine: Unused legacy argument kept for call compatibility

Note: Objects are auto-fetched from relationships if not provided: - snapshot auto-fetched from archiveresult.snapshot - crawl auto-fetched from snapshot.crawl - user auto-fetched from crawl.created_by

Returns: Merged config dict

archivebox.config.configset.get_flat_config() dict[str, Any][source]

Get a flat dictionary of all config values.

Replaces abx.pm.hook.get_FLAT_CONFIG()

archivebox.config.configset.get_all_configs() dict[str, archivebox.config.configset.BaseConfigSet][source]

Get all config section objects as a dictionary.

Replaces abx.pm.hook.get_CONFIGS()

archivebox.config.configset._parse_env_value(value: str, default: Any = None) Any[source]

Parse an environment variable value based on expected type.

archivebox.config.configset.DEFAULT_WORKER_CONCURRENCY[source]

None

archivebox.config.configset.get_worker_concurrency() dict[str, int][source]

Get worker concurrency settings.

Can be configured via WORKER_CONCURRENCY env var as JSON dict.