ArchiveBox Logo
  • Contents
  • Overview
    • ArchiveBox Documentation
    • Key Features
    • 🤝 Professional Integration
    • Quickstart
      • ✳️  Easy Setup
      • 🛠  Package Manager Setup
      • 🎗  Other Options
      • ➡️  Next Steps
      • Usage
        • ⚡️  CLI Usage
        • ArchiveBox Subcommands
    • Overview
      • Input Formats: How to pass URLs into ArchiveBox for saving
      • Output Formats: What ArchiveBox saves for each URL
      • Configuration
      • Dependencies
      • Archive Layout
      • Static Archive Exporting
      • Caveats
        • Archiving Private Content
        • Security Risks of Viewing Archived JS
        • Working Around Sites that Block Archiving
        • Saving Multiple Snapshots of a Single URL
        • Storage Requirements
      • Screenshots
    • Background & Motivation
      • Comparison to Other Projects
      • Internet Archiving Ecosystem
    • Documentation
      • Getting Started
      • Advanced
      • Developers
      • More Info
    • ArchiveBox Development
      • Setup the dev environment
        • 1. Setup the monorepo
        • 2. Option A: Install the Python, JS, and system dependencies directly on your machine
        • 2. Option B: Build the docker container and use that for development instead
      • Common development tasks
        • Run in DEBUG mode
        • Install and run a specific GitHub branch
          • Use a Pre-Built Image
          • Build Branch from Source
        • Run the linters / tests
        • Make DB migrations, enter Django shell, other dev helper commands
        • Contributing a new extractor
        • Build the docs, pip package, and docker image
        • Roll a release
      • Further Reading
  • Getting Started
    • Quickstart
      • 1. Set up ArchiveBox
      • 2. Get your list of URLs to archive
      • 3. Add your URLs to the archive
      • ✅ Done!
    • Install
      • Supported Systems
      • Option A. Docker / Docker Compose Setup ⭐️
      • Option B. Automatic Setup Script
      • Option C. Bare Metal Setup
        • 1. Install base system dependencies needed for your OS
          • macOS
          • Ubuntu/Debian-based Systems
          • FreeBSD
          • OpenBSD
          • Arch Linux / Nix / Guix / etc. Other OSs
        • 2. Install the Python dependencies using pip
        • 3. Install the JS dependencies using archivebox setup
        • Troubleshooting
        • Next Steps: Add some URLs to archive and try out CLI / Web UI
        • Next Steps: Upgrading Archivebox to a new version
        • Further Reading
    • Docker
      • Overview
      • Docker Compose
        • Setup
        • Upgrading
        • Usage
        • Accessing the data
        • Configuration
      • Docker
        • Setup
        • Upgrading
        • Usage
        • Accessing the data
        • Configuration
    • Configuration
      • General Settings
        • ONLY_NEW
        • OVERWRITE
        • TIMEOUT
        • MAX_URL_ATTEMPTS
        • RESOLUTION
        • CHECK_SSL_VALIDITY
        • USER_AGENT
        • COOKIES_FILE
        • DEFAULT_PERSONA
        • URL_DENYLIST
        • URL_ALLOWLIST
        • SAVE_ALLOWLIST
        • SAVE_DENYLIST
        • TAG_SEPARATOR_PATTERN
      • Server Settings
        • ADMIN_USERNAME / ADMIN_PASSWORD
        • PUBLIC_INDEX / PUBLIC_SNAPSHOTS / PUBLIC_ADD_VIEW
        • SECRET_KEY
        • BIND_ADDR
        • LISTEN_HOST
        • ALLOWED_HOSTS
        • CSRF_TRUSTED_ORIGINS
        • ADMIN_BASE_URL
        • ARCHIVE_BASE_URL
        • SNAPSHOTS_PER_PAGE
        • PREVIEW_ORIGINALS
        • FOOTER_INFO
        • CUSTOM_TEMPLATES_DIR
        • REVERSE_PROXY_USER_HEADER
        • REVERSE_PROXY_WHITELIST
        • LOGOUT_REDIRECT_URL
        • LDAP Settings
          • LDAP_ENABLED
          • LDAP_SERVER_URI
          • LDAP_BIND_DN
          • LDAP_BIND_PASSWORD
          • LDAP_USER_BASE
          • LDAP_USER_FILTER
          • LDAP_USERNAME_ATTR
          • LDAP_FIRSTNAME_ATTR
          • LDAP_LASTNAME_ATTR
          • LDAP_EMAIL_ATTR
          • LDAP_CREATE_SUPERUSER
      • Storage Settings
        • OUTPUT_PERMISSIONS
        • PUID / PGID
        • RESTRICT_FILE_NAMES
        • ENFORCE_ATOMIC_WRITES
        • TMP_DIR
        • LIB_DIR
        • LIB_BIN_DIR
      • Search Settings
        • USE_INDEXING_BACKEND
        • USE_SEARCHING_BACKEND
        • SEARCH_BACKEND_ENGINE
        • SEARCH_PROCESS_HTML
      • Shell Options
        • DEBUG
        • IS_TTY
        • USE_COLOR
        • SHOW_PROGRESS
        • IN_DOCKER
        • IN_QEMU
      • Plugin Settings
        • Title Settings
          • TITLE_ENABLED
          • TITLE_TIMEOUT
        • Favicon Settings
          • FAVICON_ENABLED
          • FAVICON_TIMEOUT
          • FAVICON_USER_AGENT
        • Wget Settings
          • WGET_ARGS
          • WGET_ARGS_EXTRA
          • WGET_BINARY
          • WGET_CHECK_SSL_VALIDITY
          • WGET_COOKIES_FILE
          • WGET_ENABLED
          • WGET_TIMEOUT
          • WGET_USER_AGENT
          • WGET_WARC_ENABLED
        • Screenshot Settings
          • SCREENSHOT_ENABLED
          • SCREENSHOT_RESOLUTION
          • SCREENSHOT_TIMEOUT
        • PDF Settings
          • PDF_ENABLED
          • PDF_RESOLUTION
          • PDF_TIMEOUT
        • DOM Settings
          • DOM_ENABLED
          • DOM_TIMEOUT
        • SingleFile Settings
          • SINGLEFILE_ARGS
          • SINGLEFILE_ARGS_EXTRA
          • SINGLEFILE_BINARY
          • SINGLEFILE_CHECK_SSL_VALIDITY
          • SINGLEFILE_CHROME_ARGS
          • SINGLEFILE_COOKIES_FILE
          • SINGLEFILE_ENABLED
          • SINGLEFILE_TIMEOUT
          • SINGLEFILE_USER_AGENT
        • Readability Settings
          • READABILITY_ARGS
          • READABILITY_ARGS_EXTRA
          • READABILITY_BINARY
          • READABILITY_ENABLED
          • READABILITY_TIMEOUT
        • Mercury Settings
          • MERCURY_ARGS
          • MERCURY_ARGS_EXTRA
          • MERCURY_BINARY
          • MERCURY_ENABLED
          • MERCURY_TIMEOUT
        • Defuddle Settings
          • DEFUDDLE_ARGS
          • DEFUDDLE_ARGS_EXTRA
          • DEFUDDLE_BINARY
          • DEFUDDLE_ENABLED
          • DEFUDDLE_TIMEOUT
        • HTML to Text Settings
          • HTMLTOTEXT_ENABLED
          • HTMLTOTEXT_TIMEOUT
        • Trafilatura Settings
          • TRAFILATURA_BINARY
          • TRAFILATURA_ENABLED
          • TRAFILATURA_OUTPUT_CSV
          • TRAFILATURA_OUTPUT_HTML
          • TRAFILATURA_OUTPUT_JSON
          • TRAFILATURA_OUTPUT_MARKDOWN
          • TRAFILATURA_OUTPUT_TXT
          • TRAFILATURA_OUTPUT_XML
          • TRAFILATURA_OUTPUT_XMLTEI
          • TRAFILATURA_TIMEOUT
        • Git Settings
          • GIT_ARGS
          • GIT_ARGS_EXTRA
          • GIT_BINARY
          • GIT_DOMAINS
          • GIT_ENABLED
          • GIT_TIMEOUT
        • yt-dlp Settings
          • YTDLP_ARGS
          • YTDLP_ARGS_EXTRA
          • YTDLP_BINARY
          • YTDLP_CHECK_SSL_VALIDITY
          • YTDLP_COOKIES_FILE
          • YTDLP_ENABLED
          • YTDLP_MAX_SIZE
          • YTDLP_TIMEOUT
        • gallery-dl Settings
          • GALLERYDL_ARGS
          • GALLERYDL_ARGS_EXTRA
          • GALLERYDL_BINARY
          • GALLERYDL_CHECK_SSL_VALIDITY
          • GALLERYDL_COOKIES_FILE
          • GALLERYDL_ENABLED
          • GALLERYDL_TIMEOUT
        • forum-dl Settings
          • FORUMDL_ARGS
          • FORUMDL_ARGS_EXTRA
          • FORUMDL_BINARY
          • FORUMDL_ENABLED
          • FORUMDL_OUTPUT_FORMAT
          • FORUMDL_TIMEOUT
        • papers-dl Settings
          • PAPERSDL_ARGS
          • PAPERSDL_ARGS_EXTRA
          • PAPERSDL_BINARY
          • PAPERSDL_ENABLED
          • PAPERSDL_TIMEOUT
        • Archive.org Settings
          • ARCHIVEDOTORG_ENABLED
          • ARCHIVEDOTORG_TIMEOUT
          • ARCHIVEDOTORG_USER_AGENT
        • Chrome Settings
          • CHROME_ARGS
          • CHROME_ARGS_EXTRA
          • CHROME_BINARY
          • CHROME_CHECK_SSL_VALIDITY
          • CHROME_DELAY_AFTER_LOAD
          • CHROME_ENABLED
          • CHROME_HEADLESS
          • CHROME_PAGELOAD_TIMEOUT
          • CHROME_RESOLUTION
          • CHROME_SANDBOX
          • CHROME_TIMEOUT
          • CHROME_USER_AGENT
          • CHROME_USER_DATA_DIR
          • CHROME_WAIT_FOR
        • DNS Settings
          • DNS_ENABLED
          • DNS_TIMEOUT
        • SSL Settings
          • SSL_ENABLED
          • SSL_TIMEOUT
        • Headers Settings
          • HEADERS_ENABLED
          • HEADERS_TIMEOUT
        • Redirects Settings
          • REDIRECTS_ENABLED
          • REDIRECTS_TIMEOUT
        • Responses Settings
          • RESPONSES_ENABLED
          • RESPONSES_TIMEOUT
        • Console Log Settings
          • CONSOLELOG_ENABLED
          • CONSOLELOG_TIMEOUT
        • Accessibility Settings
          • ACCESSIBILITY_ENABLED
          • ACCESSIBILITY_TIMEOUT
        • SEO Settings
          • SEO_ENABLED
          • SEO_TIMEOUT
        • Hashes Settings
          • HASHES_ENABLED
          • HASHES_TIMEOUT
        • Static File Settings
          • STATICFILE_ENABLED
          • STATICFILE_TIMEOUT
        • uBlock Origin Settings
          • UBLOCK_ENABLED
        • I Still Don’t Care About Cookies Settings
          • ISTILLDONTCAREABOUTCOOKIES_ENABLED
        • 2captcha Settings
          • TWOCAPTCHA_API_KEY
          • TWOCAPTCHA_AUTO_SUBMIT
          • TWOCAPTCHA_ENABLED
          • TWOCAPTCHA_RETRY_COUNT
          • TWOCAPTCHA_RETRY_DELAY
          • TWOCAPTCHA_TIMEOUT
        • Modal Closer Settings
          • MODALCLOSER_ENABLED
          • MODALCLOSER_POLL_INTERVAL
          • MODALCLOSER_TIMEOUT
        • Infinite Scroll Settings
          • INFINISCROLL_ENABLED
          • INFINISCROLL_EXPAND_DETAILS
          • INFINISCROLL_MIN_HEIGHT
          • INFINISCROLL_SCROLL_DELAY
          • INFINISCROLL_SCROLL_DISTANCE
          • INFINISCROLL_SCROLL_LIMIT
          • INFINISCROLL_TIMEOUT
        • DOM Outlinks Parser Settings
          • PARSE_DOM_OUTLINKS_ENABLED
          • PARSE_DOM_OUTLINKS_TIMEOUT
        • HTML URL Parser Settings
          • PARSE_HTML_URLS_ENABLED
        • JSONL URL Parser Settings
          • PARSE_JSONL_URLS_ENABLED
        • Netscape URL Parser Settings
          • PARSE_NETSCAPE_URLS_ENABLED
        • Text URL Parser Settings
          • PARSE_TXT_URLS_ENABLED
        • RSS URL Parser Settings
          • PARSE_RSS_URLS_ENABLED
        • Claude Code Settings
          • ANTHROPIC_API_KEY
          • CLAUDECODE_BINARY
          • CLAUDECODE_ENABLED
          • CLAUDECODE_MAX_TURNS
          • CLAUDECODE_MODEL
          • CLAUDECODE_TIMEOUT
        • Claude Chrome Settings
          • CLAUDECHROME_ENABLED
          • CLAUDECHROME_MAX_ACTIONS
          • CLAUDECHROME_MODEL
          • CLAUDECHROME_PROMPT
          • CLAUDECHROME_TIMEOUT
        • Claude Code Extract Settings
          • CLAUDECODEEXTRACT_ENABLED
          • CLAUDECODEEXTRACT_MAX_TURNS
          • CLAUDECODEEXTRACT_MODEL
          • CLAUDECODEEXTRACT_PROMPT
          • CLAUDECODEEXTRACT_TIMEOUT
        • Claude Code Cleanup Settings
          • CLAUDECODECLEANUP_ENABLED
          • CLAUDECODECLEANUP_MAX_TURNS
          • CLAUDECODECLEANUP_MODEL
          • CLAUDECODECLEANUP_PROMPT
          • CLAUDECODECLEANUP_TIMEOUT
        • Ripgrep Search Settings
          • RIPGREP_ARGS
          • RIPGREP_ARGS_EXTRA
          • RIPGREP_BINARY
          • RIPGREP_TIMEOUT
        • Sonic Search Settings
          • SEARCH_BACKEND_SONIC_BUCKET
          • SEARCH_BACKEND_SONIC_COLLECTION
          • SEARCH_BACKEND_SONIC_HOST_NAME
          • SEARCH_BACKEND_SONIC_PASSWORD
          • SEARCH_BACKEND_SONIC_PORT
        • SQLite FTS Search Settings
          • SEARCH_BACKEND_SQLITE_DB
          • SEARCH_BACKEND_SQLITE_SEPARATE_DATABASE
          • SEARCH_BACKEND_SQLITE_TOKENIZERS
    • Security Overview
      • Web UI Permissions
      • ArchiveBox Use-Cases
        • Archiving Public Content Only ⭐️ [Default, recommended for most people]
        • Archiving Content Behind Log-Ins 🚨 [Advanced users only]
        • ⚠️ Things to watch out for: ⚠️
        • Publishing
      • Do not run as root
      • Output Folder
        • Database
        • Filesystem
          • Purging entries
          • Permissions
    • Usage
      • CLI Usage
        • Run ArchiveBox with configuration options
        • Import a single URL
        • Import a list of URLs from a text file
        • Import list of links from browser history
        • Import browser cookies into a persona
      • UI Usage
        • Explanation of buttons in the web UI - admin snapshots list
      • Browser Extension Usage
        • More Info
      • Disk Layout
        • Large Archives
      • SQL Shell Usage
      • Python Shell Usage
      • Python API Usage
  • Guides
    • Setting Up Storage
      • Supported Local Filesystems
        • EXT4 (default on Linux), APFS (default on macOS)
        • ZFS (recommended for best experience on Linux/BSD) ⭐️
        • NTFS, HFS+, BTRFS
        • EXT2, EXT3, FAT32, exFAT
      • Supported Remote Filesystems
        • NFS (Docker Driver)
        • SMB / Ceph (Docker CIFS Driver)
        • Amazon S3 / Backblaze B2 / Google Drive / etc. (RClone)
          • RClone Config Examples
          • Option A: Running RClone on Bare Metal host
          • Option B: Running RClone with Docker Storage Plugin
        • More Docker Storage Plugins
    • Setting Up Authentication
      • Set Up Admin Web UI Permissions
      • Admin Web UI Authentication Methods
        • Username & Password (the default)
        • Reverse Proxy Authentication
        • LDAP Authentication
        • Not Yet Supported: SAML / OAuth2 / OpenID Authentication
      • REST API
        • API Bearer Token Authentication
        • API Request Header Authentication
        • API Query Parameter Authentication
        • API Session Cookie Authentication
        • API HTTP Basic Authentication
          • Further Reading
    • Setting Up Search
      • How to Search in ArchiveBox
      • How Search Works
      • ArchiveBox Search Backends
        • ripgrep (the default)
          • Pros
          • Cons
        • ripgrep-all (aka rga)
        • ugrep
          • Pros
          • Cons
        • sonic ⭐️ (the recommended upgrade path for most people)
          • Pros
          • Cons
        • SQLite FTS5
          • Pros
          • Cons
        • Further Reading
    • Publishing Your Archive
      • 1. Use the built-in web server
      • 2. Export and host it as static HTML
      • Security Concerns
        • Protecting the Admin Dashboard
      • Copyright Concerns
        • Further Reading: USA Copyright Law & Fair Use Exemptions
    • Scheduled Archiving
      • How It Works
      • CLI Usage
      • Docker Compose
      • Examples
    • Chrome / Chromium Setup
      • Installing Chromium
        • ⭐️ Any OS (recommended)
        • macOS
        • Ubuntu/Debian
      • Installing Google Chrome
        • macOS
        • Ubuntu/Debian
      • Troubleshooting Chromium Install
    • Setting Up a Chromium User Profile
      • Docker VNC Setup
      • Non-Docker Setup (Local Host)
      • Non-Docker Setup (Remote Host)
      • More Info & Troubleshooting
    • Upgrading Versions
      • Upgrading with Docker Compose ⭐️
      • Upgrading with plain Docker
      • Upgrading with a package manager
      • Merge two or more existing archives
      • Related Documents
    • Upgrading or Merging Archives
    • Merging Collections
      • Modify the ArchiveBox SQLite3 DB directly
        • Example: Modifying an existing user’s email
        • Example: Adding a new user with a hashed password
      • Database Troubleshooting
      • Related Documents
    • Troubleshooting
      • Installing
        • macOS
        • Python
        • Chromium/Google Chrome
        • Wget & Curl
        • NPM Dependencies
      • Archiving
        • No links parsed from export file
        • Lots of skipped sites
        • Lots of errors
        • Lots of broken links from the index
        • Removing unwanted links from the index
      • Hosting the Archive
        • Other database or filesystem issues
          • Docker Permissions issues
      • Database
        • Filesystem doesn’t support FSYNC (e.g. network mounts)
        • Database and filesystem contention issues when running multiple ArchiveBox processes
        • Database migrations errors or upgrade issues
        • Repairing a corrupted SQLite3 database file
  • Architecture
    • ArchiveBox Architecture Diagrams
      • High-Level System Execution Flow
      • State Diagrams for Main Models
        • Crawl
      • Snapshot
        • ArchiveResult
  • API Reference
    • Filesystem
    • SQL API
    • REST API
    • Python API
      • archivebox
        • Subpackages
          • archivebox.config
          • archivebox.misc
          • archivebox.search
          • archivebox.cli
          • archivebox.api
          • archivebox.base_models
          • archivebox.services
          • archivebox.ldap
          • archivebox.mcp
          • archivebox.crawls
          • archivebox.personas
          • archivebox.core
          • archivebox.ideas
          • archivebox.workers
          • archivebox.machine
        • Submodules
          • archivebox.manage
          • archivebox.__main__
          • archivebox.hooks
          • archivebox.uuid_compat
        • Package Contents
          • Classes
          • Data
          • API
  • Meta
    • Roadmap
      • Planned Specification
        • v0.7: Schema improvements
        • v0.8:  Security
        • v0.9:  Performance
        • v1.0: Full headless browser control
        • v2.0 Federated or distributed archiving + paid hosted service offering
        • Major long-term changes
        • Smaller planned features
      • Past Releases
      • UI / UX Improvements Planned
      • New Extractors Planned
        • Social Media
        • Video/Streams
        • Audio/Music
        • Photos/Images/Comics
        • Text/Forums
        • MOOC/Educational Content
        • Re-Archiving / WARC Creation
        • Other
    • Changelog
    • Supporting Development
    • Web Archiving Community
      • The Master Lists
      • Web Archiving Projects
        • Bookmarking Services
        • From the Archive.org & Archive-It teams
        • From Webrecorder
        • From Rhizome.org (Conifer)
        • From the Old Dominion University: Web Science Team
        • From the Archives Unleashed Team
        • From the IIPC team
        • Other Public Archiving Services
        • Other ArchiveBox Alternatives
        • Smaller Utilities
      • Reading List
        • Blogs Friends of ArchiveBox
        • Articles We Like About Internet Archiving
        • ArchiveBox-Specific Posts, Tutorials, and Guides
        • ArchiveBox Discussions in News & Social Media
      • Communities
        • Most Active Communities
        • Web Archiving Communities
        • General Archiving Foundations, Coalitions, Initiatives, and Institutes
      • ArchiveBox Community Resources
        • ArchiveBox Chat Rooms
        • ArchiveBox on Social Media
        • ArchiveBox on Package Distribution Platforms
ArchiveBox
  • Contents
  • API Reference
  • archivebox
  • archivebox.personas
  • Edit on GitHub

archivebox.personas

Submodules

  • archivebox.personas.views
  • archivebox.personas.apps
  • archivebox.personas.importers
  • archivebox.personas.models
  • archivebox.personas.admin
  • archivebox.personas.forms
Previous Next

© Copyright 2026 ArchiveBox.