archivebox.personas.models

Persona management for ArchiveBox.

A Persona represents a browser profile/identity used for archiving. Each persona has its own:

  • Chrome user data directory (for cookies, localStorage, extensions, etc.)

  • Chrome extensions directory

  • Cookies file

  • Config overrides

Module Contents

Classes

Persona

Browser persona/profile for archiving sessions.

Data

_fcntl

VOLATILE_PROFILE_DIR_NAMES

VOLATILE_PROFILE_FILE_NAMES

API

archivebox.personas.models._fcntl: Any | None[source]

None

archivebox.personas.models.VOLATILE_PROFILE_DIR_NAMES[source]

None

archivebox.personas.models.VOLATILE_PROFILE_FILE_NAMES[source]

None

class archivebox.personas.models.Persona[source]

Bases: archivebox.base_models.models.ModelWithConfig

Browser persona/profile for archiving sessions.

Each persona provides:

  • CHROME_USER_DATA_DIR: Chrome profile directory

  • CHROME_EXTENSIONS_DIR: Installed extensions directory

  • CHROME_DOWNLOADS_DIR: Chrome downloads directory

  • COOKIES_FILE: Cookies file for wget/curl

  • config: JSON field with persona-specific config overrides

Usage: # Get persona and its derived config config = get_config(persona=crawl.persona, crawl=crawl, snapshot=snapshot) chrome_dir = config[‘CHROME_USER_DATA_DIR’]

# Or access directly from persona
persona = Persona.objects.get(name='Default')
persona.CHROME_USER_DATA_DIR  # -> Path to chrome_user_data
id[source]

‘UUIDField(…)’

name[source]

‘CharField(…)’

created_at[source]

‘DateTimeField(…)’

created_by[source]

‘ForeignKey(…)’

class Meta[source]

Bases: archivebox.base_models.models.ModelWithConfig.Meta

app_label[source]

‘personas’

__str__() str[source]
property path: pathlib.Path[source]

Path to persona directory under PERSONAS_DIR.

property CHROME_USER_DATA_DIR: str[source]

Derived path to Chrome user data directory for this persona.

property CHROME_EXTENSIONS_DIR: str[source]

Derived path to Chrome extensions directory for this persona.

property CHROME_DOWNLOADS_DIR: str[source]

Derived path to Chrome downloads directory for this persona.

property COOKIES_FILE: str[source]

Derived path to cookies.txt file for this persona (if exists).

property AUTH_STORAGE_FILE: str[source]

Derived path to auth.json for this persona (if it exists).

get_derived_config() dict[source]

Get config dict with derived paths filled in.

Returns dict with:

  • All values from self.config JSONField

  • CHROME_USER_DATA_DIR (derived from persona path)

  • CHROME_EXTENSIONS_DIR (derived from persona path)

  • CHROME_DOWNLOADS_DIR (derived from persona path)

  • COOKIES_FILE (derived from persona path, if file exists)

  • AUTH_STORAGE_FILE (derived from persona path, if file exists)

  • ACTIVE_PERSONA (set to this persona’s name)

ensure_dirs() None[source]

Create persona directories if they don’t exist.

cleanup_chrome_profile(profile_dir: pathlib.Path) bool[source]

Remove volatile Chrome state that should never be reused across launches.

cleanup_chrome() bool[source]

Clean up volatile Chrome state for this persona’s base profile.

lock_runtime_for_crawl()[source]
runtime_root_for_crawl(crawl) pathlib.Path[source]
runtime_profile_dir_for_crawl(crawl) pathlib.Path[source]
runtime_downloads_dir_for_crawl(crawl) pathlib.Path[source]
copy_chrome_profile(source_dir: pathlib.Path, destination_dir: pathlib.Path) None[source]
prepare_runtime_for_crawl(crawl, chrome_binary: str = '') dict[str, str][source]
cleanup_runtime_for_crawl(crawl) None[source]
classmethod get_or_create_default() archivebox.personas.models.Persona[source]

Get or create the Default persona.

classmethod cleanup_chrome_all() int[source]

Clean up Chrome state files for all personas.