Configurationο
Configuration of ArchiveBox is done by using the archivebox config command, modifying the ArchiveBox.conf file in the data folder, or by using environment variables. All three methods work equivalently when using Docker as well.
Some equivalent examples of setting some configuration options:
archivebox config --set CHROME_BINARY=google-chrome-stable
# OR
echo "CHROME_BINARY=google-chrome-stable" >> ArchiveBox.conf
# OR
env CHROME_BINARY=google-chrome-stable archivebox add ~/Downloads/bookmarks_export.html
Environment variables take precedence over the config file, which is useful if you only want to use a certain option temporarily during a single run. For more examples see Usage: Configurationβ¦
Available Configuration Options:
General Settings: Archiving process, output format, and timing.
Server Settings: Web UI, authentication, and reverse proxy options.
Storage Settings: File layout, permissions, and temp directories.
Search Settings: Full-text search backend configuration.
Shell Options: Format & behavior of CLI output.
Plugin Settings: Per-plugin configuration options.
In case this document is ever out of date, check the source code for config definitions: archivebox/config/common.py β‘οΈ
General Settingsο
General options around the archiving process, output format, and timing.
ONLY_NEWο
Possible Values: [True]/False
Toggle whether or not to attempt rechecking old links when adding new ones, or leave old incomplete links alone and only archive the new links.
By default, ArchiveBox will only archive new links on each import. If you want it to go back through all links in the index and download any missing files on every run, set this to False.
Note: Regardless of how this is set, ArchiveBox will never re-download sites that have already succeeded previously. When this is False it only attempts to fix previous pages have missing archive extractor outputs, it does not re-archive pages that have already been successfully archived.
OVERWRITEο
Possible Values: [False]/True
When set to True, ArchiveBox will re-archive URLs even if they have already been successfully archived before, overwriting any existing output.
TIMEOUTο
Possible Values: [60]/120/β¦
Maximum allowed download time per archive method for each link in seconds. If you have a slow network connection or are seeing frequent timeout errors, you can raise this value.
Note: Do not set this to anything less than 5 seconds as it will cause Chrome to hang indefinitely and many sites to fail completely.
MAX_URL_ATTEMPTSο
Possible Values: [50]/100/β¦
Maximum number of times ArchiveBox will attempt to archive a URL before giving up. Useful for handling transient failures.
RESOLUTIONο
Possible Values: [1440,2000]/1024,768/β¦
Default screenshot/PDF resolution in pixels width,height. Used as the fallback for SCREENSHOT_RESOLUTION, PDF_RESOLUTION, and CHROME_RESOLUTION.
CHECK_SSL_VALIDITYο
Possible Values: [True]/False
Whether to enforce HTTPS certificate and HSTS chain of trust when archiving sites. Set this to False if you want to archive pages even if they have expired or invalid certificates. Be aware that when False you cannot guarantee that you have not been man-in-the-middleβd while archiving content, so the content cannot be verified to be whatβs on the original site.
USER_AGENTο
Possible Values: [Mozilla/5.0 ... ArchiveBox/{VERSION} ...]/"Mozilla/5.0 ..."/β¦
The default user agent string used during archiving. Individual extractors (wget, Chrome, curl, etc.) can override this with their own *_USER_AGENT settings, or fall back to this value.
DEFAULT_PERSONAο
Possible Values: [Default]/personal/work/β¦
The persona profile to use by default when archiving. Personas allow you to have separate sets of cookies, Chrome profiles, and user agent strings for different archiving contexts.
URL_DENYLISTο
Possible Values: [\.(css|js|otf|ttf|woff|woff2|gstatic\.com|googleapis\.com/css)(\?.*)?$]/.+\.exe$/β¦
A regex expression used to exclude certain URLs from archiving.
Related options:
URL_ALLOWLIST, SAVE_ALLOWLIST, SAVE_DENYLIST
URL_ALLOWLISTο
Possible Values: [None]/^http(s)?:\/\/(.+)?example\.com\/?.*$/β¦
A regex expression used to exclude all URLs that donβt match the given pattern from archiving. Useful for recursive crawling within a single domain.
SAVE_ALLOWLISTο
Possible Values: [{}]/{".*example\\.com.*": ["screenshot", "pdf"]}/β¦
A JSON dictionary mapping URL regex patterns to lists of archive methods. Only the specified methods will be used for URLs matching each pattern.
SAVE_DENYLISTο
Possible Values: [{}]/{".*\\.pdf$": ["screenshot", "dom"]}/β¦
A JSON dictionary mapping URL regex patterns to lists of archive methods to skip.
TAG_SEPARATOR_PATTERNο
Possible Values: [[,]]/[,;]/β¦
Regex pattern used to split tag strings into individual tags.
Server Settingsο
Options for the web UI, authentication, and reverse proxy configuration.
ADMIN_USERNAME / ADMIN_PASSWORDο
Possible Values: [None]/"admin"/β¦
Only used on first run / initial setup in Docker. ArchiveBox will create an admin user with the specified username and password when these options are found in the environment.
More info:
PUBLIC_INDEX / PUBLIC_SNAPSHOTS / PUBLIC_ADD_VIEWο
Possible Values: [True]/False
Configure whether or not login is required to use each area of ArchiveBox.
archivebox config --set PUBLIC_INDEX=True # allow viewing snapshots list without login
archivebox config --set PUBLIC_SNAPSHOTS=True # allow viewing snapshot content without login
archivebox config --set PUBLIC_ADD_VIEW=False # allow submitting new URLs without login
SECRET_KEYο
Possible Values: auto-generated random string Djangoβs secret key for cryptographic signing (sessions, CSRF tokens, etc.). Automatically generated on first run.
BIND_ADDRο
Possible Values: [127.0.0.1:8000]/0.0.0.0:8000/β¦
Address and port for the ArchiveBox web server to listen on.
LISTEN_HOSTο
Possible Values: [archivebox.localhost:8000]/archive.example.com:443/β¦
The public hostname and port that ArchiveBox is accessible at.
ALLOWED_HOSTSο
Possible Values: [*]/archive.example.com,localhost/β¦
Comma-separated list of allowed HTTP Host header values. Set this to your domain name(s) in production.
CSRF_TRUSTED_ORIGINSο
Possible Values: [http://admin.archivebox.localhost:8000]/https://archive.example.com/β¦
Comma-separated list of trusted origins for CSRF validation. Must include the scheme (http/https).
ADMIN_BASE_URLο
Possible Values: [""]//admin//β¦
Base URL path for the Django admin interface.
ARCHIVE_BASE_URLο
Possible Values: [""]//archive//β¦
Base URL path for serving archived content.
SNAPSHOTS_PER_PAGEο
Possible Values: [40]/100/β¦
Maximum number of Snapshots to show per page on Snapshot list pages.
PREVIEW_ORIGINALSο
Possible Values: [True]/False
Whether to show inline previews of the original URL on snapshot detail pages.
CUSTOM_TEMPLATES_DIRο
Possible Values: [data/custom_templates]//path/to/custom_templates/β¦
Path to a directory containing custom html/css/images for overriding the default UI styling.
REVERSE_PROXY_USER_HEADERο
Possible Values: [Remote-User]/X-Remote-User/β¦
HTTP header containing user name from authenticated proxy.
Related options:
REVERSE_PROXY_WHITELIST, LOGOUT_REDIRECT_URL
REVERSE_PROXY_WHITELISTο
Possible Values: [<empty string>]/172.16.0.0/16/β¦
Comma separated list of IP CIDRs which are allowed to use reverse proxy authentication.
LOGOUT_REDIRECT_URLο
Possible Values: [/]/https://example.com/some/other/app/β¦
URL to redirect users back to on logout when using reverse proxy authentication.
LDAP Settingsο
Options for LDAP/Active Directory authentication. Requires pip install archivebox[ldap].
LDAP_ENABLEDο
Possible Values: [False]/True
Whether to use an external LDAP server for authentication.
pip install archivebox[ldap]
Then set these configuration values:
LDAP_ENABLED: True
LDAP_SERVER_URI: "ldap://ldap.example.com:3389"
LDAP_BIND_DN: "ou=archivebox,ou=services,dc=ldap.example.com"
LDAP_BIND_PASSWORD: "secret-bind-user-password"
LDAP_USER_BASE: "ou=users,ou=archivebox,ou=services,dc=ldap.example.com"
LDAP_USER_FILTER: "(uid=%(user)s)"
LDAP_USERNAME_ATTR: "username"
LDAP_FIRSTNAME_ATTR: "givenName"
LDAP_LASTNAME_ATTR: "sn"
LDAP_EMAIL_ATTR: "mail"
LDAP_CREATE_SUPERUSER: False
More info:
LDAP_SERVER_URIο
Default: [None]
LDAP server URI (e.g. ldap://ldap.example.com:389).
LDAP_BIND_DNο
Default: [None]
DN to bind for searching.
LDAP_BIND_PASSWORDο
Default: [None]
Password for bind DN.
LDAP_USER_BASEο
Default: [None]
Base DN for user searches.
LDAP_USER_FILTERο
Default: [(uid=%(user)s)]
LDAP search filter for users.
LDAP_USERNAME_ATTRο
Default: [username]
LDAP attribute for username.
LDAP_FIRSTNAME_ATTRο
Default: [givenName]
LDAP attribute for first name.
LDAP_LASTNAME_ATTRο
Default: [sn]
LDAP attribute for last name.
LDAP_EMAIL_ATTRο
Default: [mail]
LDAP attribute for email.
LDAP_CREATE_SUPERUSERο
Default: [False]
Auto-create superuser accounts for LDAP users.
Storage Settingsο
Options for file layout, permissions, and temp/lib directories.
OUTPUT_PERMISSIONSο
Possible Values: [644]/755/β¦
Permissions to set output files to.
Related options:
PUID / PGID
PUID / PGIDο
Possible Values: [911]/1000/β¦
Note: Only applicable for Docker users, settable via environment variables only.
User and Group ID that the data directory should be owned by.
Learn more:
RESTRICT_FILE_NAMESο
Possible Values: [windows]/unix/ascii/β¦
Restrict output filenames to be compatible with the given filesystem type.
ENFORCE_ATOMIC_WRITESο
Possible Values: [True]/False
Whether to use atomic writes when saving files.
TMP_DIRο
Possible Values: [data/tmp/<machine_id>]//tmp/archivebox/abc5d851/β¦
Path for temporary files, unix sockets, and supervisor config. Must be a local, fast, short-path directory.
LIB_DIRο
Possible Values: [data/lib/<arch>-<os>]//usr/local/share/archivebox/abc5/β¦
Path for installed binary dependencies.
LIB_BIN_DIRο
Possible Values: [LIB_DIR/bin]
Path where installed binaries are symlinked for easy PATH management.
Search Settingsο
Options for full-text search backend configuration.
USE_INDEXING_BACKENDο
Possible Values: [True]/False
Enable the search indexing backend.
USE_SEARCHING_BACKENDο
Possible Values: [True]/False
Enable the search querying backend.
SEARCH_BACKEND_ENGINEο
Possible Values: [ripgrep]/sqlite/sonic
Which search backend engine to use. ripgrep (default) requires no setup. sqlite uses FTS5. sonic requires a running Sonic instance.
SEARCH_PROCESS_HTMLο
Possible Values: [True]/False
Whether to strip HTML tags before indexing content for search.
Shell Optionsο
Options around the format of the CLI output.
DEBUGο
Possible Values: [False]/True
Enable debug mode. Automatically set to True if --debug is passed on the command line.
IS_TTYο
Possible Values: auto-detected Whether stdout is a TTY (interactive terminal).
USE_COLORο
Possible Values: [True]/False
Colorize console output. Defaults to True if stdin is a TTY.
SHOW_PROGRESSο
Possible Values: [True]/False
Show real-time progress bar in console output. Defaults to True if stdin is a TTY.
IN_DOCKERο
Possible Values: [False]/True
Whether ArchiveBox is running inside a Docker container.
IN_QEMUο
Possible Values: [False]/True
Whether ArchiveBox is running inside QEMU emulation.
Plugin Settingsο
ArchiveBox uses a plugin system where each extractor defines its own configuration via config.json files. All plugin config options can be set the same way as core options β via environment variables, ArchiveBox.conf, or archivebox config --set.
archivebox config # see all available config options
archivebox config --set SCREENSHOT_TIMEOUT=120 # set a plugin option
For the full list of plugins and their config schemas, see the abx-plugins repository.
Title Settingsο
TITLE_ENABLEDο
Default: [True]
Enable title extraction
TITLE_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for title extraction in seconds
Favicon Settingsο
FAVICON_ENABLEDο
Default: [True]
Enable favicon downloading
FAVICON_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for favicon fetch in seconds
FAVICON_USER_AGENTο
Default: [""] (falls back to USER_AGENT)
User agent string
Wget Settingsο
WGET_ARGSο
Default: [see defaults] Default wget arguments
WGET_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to wget command
WGET_BINARYο
Default: [wget]
Path to wget binary
WGET_CHECK_SSL_VALIDITYο
Default: [True] (falls back to CHECK_SSL_VALIDITY)
Whether to verify SSL certificates
WGET_ENABLEDο
Default: [True]
Enable wget archiving
WGET_TIMEOUTο
Default: [60] (falls back to TIMEOUT)
Timeout for wget in seconds
WGET_USER_AGENTο
Default: [""] (falls back to USER_AGENT)
User agent string for wget
WGET_WARC_ENABLEDο
Default: [True]
Save WARC archive file
Screenshot Settingsο
SCREENSHOT_ENABLEDο
Default: [True]
Enable screenshot capture
SCREENSHOT_RESOLUTIONο
Default: [1440,2000] (falls back to RESOLUTION)
Screenshot resolution (width,height)
SCREENSHOT_TIMEOUTο
Default: [60] (falls back to TIMEOUT)
Timeout for screenshot capture in seconds
PDF Settingsο
PDF_ENABLEDο
Default: [True]
Enable PDF generation
PDF_RESOLUTIONο
Default: [1440,2000] (falls back to RESOLUTION)
PDF page resolution (width,height)
PDF_TIMEOUTο
Default: [60] (falls back to TIMEOUT)
Timeout for PDF generation in seconds
DOM Settingsο
DOM_ENABLEDο
Default: [True]
Enable DOM capture
DOM_TIMEOUTο
Default: [60] (falls back to TIMEOUT)
Timeout for DOM capture in seconds
SingleFile Settingsο
SINGLEFILE_ARGSο
Default: [['--browser-headless']]
Default single-file arguments
SINGLEFILE_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to single-file command
SINGLEFILE_BINARYο
Default: [single-file]
Path to single-file binary
SINGLEFILE_CHECK_SSL_VALIDITYο
Default: [True] (falls back to CHECK_SSL_VALIDITY)
Whether to verify SSL certificates
SINGLEFILE_CHROME_ARGSο
Default: [[]] (falls back to CHROME_ARGS)
Chrome command-line arguments for SingleFile
SINGLEFILE_ENABLEDο
Default: [True]
Enable SingleFile archiving
SINGLEFILE_TIMEOUTο
Default: [60] (falls back to TIMEOUT)
Timeout for SingleFile in seconds
SINGLEFILE_USER_AGENTο
Default: [""] (falls back to USER_AGENT)
User agent string
Readability Settingsο
READABILITY_ARGSο
Default: [[]]
Default Readability arguments
READABILITY_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to Readability command
READABILITY_BINARYο
Default: [readability-extractor]
Path to readability-extractor binary
READABILITY_ENABLEDο
Default: [True]
Enable Readability text extraction
READABILITY_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for Readability in seconds
Mercury Settingsο
MERCURY_ARGSο
Default: [[]]
Default Mercury parser arguments
MERCURY_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to Mercury parser command
MERCURY_BINARYο
Default: [postlight-parser]
Path to Mercury/Postlight parser binary
MERCURY_ENABLEDο
Default: [True]
Enable Mercury text extraction
MERCURY_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for Mercury in seconds
Defuddle Settingsο
DEFUDDLE_ARGSο
Default: [[]]
Default Defuddle arguments
DEFUDDLE_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to Defuddle command
DEFUDDLE_BINARYο
Default: [defuddle]
Path to defuddle binary
DEFUDDLE_ENABLEDο
Default: [True]
Enable Defuddle text extraction
DEFUDDLE_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for Defuddle in seconds
HTML to Text Settingsο
HTMLTOTEXT_ENABLEDο
Default: [True]
Enable HTML to text conversion
HTMLTOTEXT_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for HTML to text conversion in seconds
Trafilatura Settingsο
TRAFILATURA_BINARYο
Default: [trafilatura]
Path to trafilatura binary
TRAFILATURA_ENABLEDο
Default: [True]
Enable Trafilatura extraction
TRAFILATURA_OUTPUT_CSVο
Default: [False]
Write CSV output (content.csv)
TRAFILATURA_OUTPUT_HTMLο
Default: [True]
Write HTML output (content.html)
TRAFILATURA_OUTPUT_JSONο
Default: [False]
Write JSON output (content.json)
TRAFILATURA_OUTPUT_MARKDOWNο
Default: [True]
Write markdown output (content.md)
TRAFILATURA_OUTPUT_TXTο
Default: [True]
Write plain text output (content.txt)
TRAFILATURA_OUTPUT_XMLο
Default: [False]
Write XML output (content.xml)
TRAFILATURA_OUTPUT_XMLTEIο
Default: [False]
Write XML TEI output (content.xmltei)
TRAFILATURA_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for Trafilatura in seconds
Git Settingsο
GIT_ARGSο
Default: [['clone', '--depth=1', '--recursive']]
Default git arguments
GIT_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to git command
GIT_BINARYο
Default: [git]
Path to git binary
GIT_DOMAINSο
Default: [see defaults] Comma-separated list of domains to treat as git repositories
GIT_ENABLEDο
Default: [True]
Enable git repository cloning
GIT_TIMEOUTο
Default: [120] (falls back to TIMEOUT)
Timeout for git operations in seconds
yt-dlp Settingsο
YTDLP_ARGSο
Default: [see defaults] Default yt-dlp arguments
YTDLP_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to yt-dlp command
YTDLP_BINARYο
Default: [yt-dlp]
Path to yt-dlp binary
YTDLP_CHECK_SSL_VALIDITYο
Default: [True] (falls back to CHECK_SSL_VALIDITY)
Whether to verify SSL certificates
YTDLP_ENABLEDο
Default: [True]
Enable video/audio downloading with yt-dlp
YTDLP_MAX_SIZEο
Default: [750m]
Maximum file size for yt-dlp downloads
YTDLP_TIMEOUTο
Default: [3600] (falls back to TIMEOUT)
Timeout for yt-dlp downloads in seconds
gallery-dl Settingsο
GALLERYDL_ARGSο
Default: [['--write-metadata', '--write-info-json']]
Default gallery-dl arguments
GALLERYDL_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to gallery-dl command
GALLERYDL_BINARYο
Default: [gallery-dl]
Path to gallery-dl binary
GALLERYDL_CHECK_SSL_VALIDITYο
Default: [True] (falls back to CHECK_SSL_VALIDITY)
Whether to verify SSL certificates
GALLERYDL_ENABLEDο
Default: [True]
Enable gallery downloading with gallery-dl
GALLERYDL_TIMEOUTο
Default: [3600] (falls back to TIMEOUT)
Timeout for gallery downloads in seconds
forum-dl Settingsο
FORUMDL_ARGSο
Default: [[]]
Default forum-dl arguments
FORUMDL_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to forum-dl command
FORUMDL_BINARYο
Default: [forum-dl]
Path to forum-dl binary
FORUMDL_ENABLEDο
Default: [True]
Enable forum downloading with forum-dl
FORUMDL_OUTPUT_FORMATο
Default: [jsonl]
Output format for forum downloads
FORUMDL_TIMEOUTο
Default: [3600] (falls back to TIMEOUT)
Timeout for forum downloads in seconds
papers-dl Settingsο
PAPERSDL_ARGSο
Default: [['fetch']]
Default papers-dl arguments
PAPERSDL_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to papers-dl command
PAPERSDL_BINARYο
Default: [papers-dl]
Path to papers-dl binary
PAPERSDL_ENABLEDο
Default: [True]
Enable paper downloading with papers-dl
PAPERSDL_TIMEOUTο
Default: [300] (falls back to TIMEOUT)
Timeout for paper downloads in seconds
Archive.org Settingsο
ARCHIVEDOTORG_ENABLEDο
Default: [True]
Submit URLs to archive.org Wayback Machine
ARCHIVEDOTORG_TIMEOUTο
Default: [60] (falls back to TIMEOUT)
Timeout for archive.org submission in seconds
ARCHIVEDOTORG_USER_AGENTο
Default: [""] (falls back to USER_AGENT)
User agent string
Chrome Settingsο
CHROME_ARGSο
Default: [see defaults] Default Chrome command-line arguments (static flags only, dynamic args like βuser-data-dir are added at runtime)
CHROME_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to Chrome command (for user customization)
CHROME_BINARYο
Default: [chromium]
Path to Chromium binary
CHROME_CHECK_SSL_VALIDITYο
Default: [True] (falls back to CHECK_SSL_VALIDITY)
Whether to verify SSL certificates (disable for self-signed certs)
CHROME_DELAY_AFTER_LOADο
Default: [0]
Extra delay in seconds after page load completes before archiving (useful for JS-heavy SPAs)
CHROME_ENABLEDο
Default: [True]
Enable Chromium browser integration for archiving
CHROME_HEADLESSο
Default: [True]
Run Chrome in headless mode
CHROME_PAGELOAD_TIMEOUTο
Default: [60] (falls back to CHROME_TIMEOUT)
Timeout for page navigation/load in seconds
CHROME_RESOLUTIONο
Default: [1440,2000] (falls back to RESOLUTION)
Browser viewport resolution (width,height)
CHROME_SANDBOXο
Default: [True]
Enable Chrome sandbox (disable in Docker with βno-sandbox)
CHROME_TIMEOUTο
Default: [60] (falls back to TIMEOUT)
Timeout for Chrome operations in seconds
CHROME_USER_AGENTο
Default: [""] (falls back to USER_AGENT)
User agent string for Chrome
CHROME_USER_DATA_DIRο
Default: [""]
Path to Chrome user data directory for persistent sessions (derived from ACTIVE_PERSONA if not set)
CHROME_WAIT_FORο
Default: [networkidle2]
Page load completion condition (domcontentloaded, load, networkidle0, networkidle2)
DNS Settingsο
DNS_ENABLEDο
Default: [True]
Enable DNS traffic recording during page load
DNS_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for DNS recording in seconds
SSL Settingsο
SSL_ENABLEDο
Default: [True]
Enable SSL certificate capture
SSL_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for SSL capture in seconds
Headers Settingsο
HEADERS_ENABLEDο
Default: [True]
Enable HTTP headers capture
HEADERS_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for headers capture in seconds
Redirects Settingsο
REDIRECTS_ENABLEDο
Default: [True]
Enable redirect chain capture
REDIRECTS_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for redirect capture in seconds
Responses Settingsο
RESPONSES_ENABLEDο
Default: [True]
Enable HTTP response capture
RESPONSES_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for response capture in seconds
Console Log Settingsο
CONSOLELOG_ENABLEDο
Default: [True]
Enable console log capture
CONSOLELOG_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for console log capture in seconds
Accessibility Settingsο
ACCESSIBILITY_ENABLEDο
Default: [True]
Enable accessibility tree capture
ACCESSIBILITY_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for accessibility capture in seconds
SEO Settingsο
SEO_ENABLEDο
Default: [True]
Enable SEO metadata capture
SEO_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for SEO capture in seconds
Hashes Settingsο
HASHES_ENABLEDο
Default: [True]
Enable merkle tree hash generation
HASHES_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for merkle tree generation in seconds
Static File Settingsο
STATICFILE_ENABLEDο
Default: [True]
Enable static file detection
STATICFILE_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for static file detection in seconds
uBlock Origin Settingsο
UBLOCK_ENABLEDο
Default: [True]
Enable uBlock Origin browser extension for ad blocking
2captcha Settingsο
TWOCAPTCHA_API_KEYο
Default: [""]
2captcha API key for CAPTCHA solving service (get from https://2captcha.com)
TWOCAPTCHA_AUTO_SUBMITο
Default: [False]
Automatically submit forms after CAPTCHA is solved
TWOCAPTCHA_ENABLEDο
Default: [True]
Enable 2captcha browser extension for automatic CAPTCHA solving
TWOCAPTCHA_RETRY_COUNTο
Default: [3]
Number of times to retry CAPTCHA solving on error
TWOCAPTCHA_RETRY_DELAYο
Default: [5]
Delay in seconds between CAPTCHA solving retries
TWOCAPTCHA_TIMEOUTο
Default: [60] (falls back to TIMEOUT)
Timeout for CAPTCHA solving in seconds
Modal Closer Settingsο
MODALCLOSER_ENABLEDο
Default: [True]
Enable automatic modal and dialog closing
MODALCLOSER_POLL_INTERVALο
Default: [500]
How often to check for CSS modals (ms)
MODALCLOSER_TIMEOUTο
Default: [1250]
Delay before auto-closing dialogs (ms)
Infinite Scroll Settingsο
INFINISCROLL_ENABLEDο
Default: [True]
Enable infinite scroll page expansion
INFINISCROLL_EXPAND_DETAILSο
Default: [True]
Expand
INFINISCROLL_MIN_HEIGHTο
Default: [16000]
Minimum page height to scroll to in pixels
INFINISCROLL_SCROLL_DELAYο
Default: [2000]
Delay between scrolls in milliseconds
INFINISCROLL_SCROLL_DISTANCEο
Default: [1600]
Distance to scroll per step in pixels
INFINISCROLL_SCROLL_LIMITο
Default: [10]
Maximum number of scroll steps
INFINISCROLL_TIMEOUTο
Default: [120] (falls back to TIMEOUT)
Maximum timeout for scrolling in seconds
DOM Outlinks Parser Settingsο
PARSE_DOM_OUTLINKS_ENABLEDο
Default: [True]
Enable DOM outlinks parsing from archived pages
PARSE_DOM_OUTLINKS_TIMEOUTο
Default: [30] (falls back to TIMEOUT)
Timeout for DOM outlinks parsing in seconds
HTML URL Parser Settingsο
PARSE_HTML_URLS_ENABLEDο
Default: [True]
Enable HTML URL parsing
JSONL URL Parser Settingsο
PARSE_JSONL_URLS_ENABLEDο
Default: [True]
Enable JSON Lines URL parsing
Netscape URL Parser Settingsο
PARSE_NETSCAPE_URLS_ENABLEDο
Default: [True]
Enable Netscape bookmarks HTML URL parsing
Text URL Parser Settingsο
PARSE_TXT_URLS_ENABLEDο
Default: [True]
Enable plain text URL parsing
RSS URL Parser Settingsο
PARSE_RSS_URLS_ENABLEDο
Default: [True]
Enable RSS/Atom feed URL parsing
Claude Code Settingsο
ANTHROPIC_API_KEYο
Default: [""]
Anthropic API key for Claude Code authentication
CLAUDECODE_BINARYο
Default: [claude]
Path to Claude Code CLI binary
CLAUDECODE_ENABLEDο
Default: [False]
Enable Claude Code AI agent integration. Controls whether the claudecode plugin participates in crawl-time extraction; child plugins still need the claudecode plugin installed and a working Claude binary.
CLAUDECODE_MAX_TURNSο
Default: [10]
Maximum number of agentic turns per invocation
CLAUDECODE_MODELο
Default: [sonnet]
Claude model to use (e.g. sonnet, opus, haiku)
CLAUDECODE_TIMEOUTο
Default: [120] (falls back to TIMEOUT)
Timeout for Claude Code operations in seconds
Claude Chrome Settingsο
CLAUDECHROME_ENABLEDο
Default: [False]
Enable Claude for Chrome browser extension for AI-driven page interaction
CLAUDECHROME_MAX_ACTIONSο
Default: [15]
Maximum number of agentic loop iterations (screenshots + actions) per page
CLAUDECHROME_MODELο
Default: [sonnet]
Claude model to use (e.g. sonnet, opus, haiku). Availability depends on your plan.
CLAUDECHROME_PROMPTο
Default: [see defaults] Prompt for Claude to execute on the page. Claude can click buttons, fill forms, download files, and interact with any page element.
CLAUDECHROME_TIMEOUTο
Default: [120] (falls back to TIMEOUT)
Timeout for Claude for Chrome operations in seconds
Claude Code Extract Settingsο
CLAUDECODEEXTRACT_ENABLEDο
Default: [False]
Enable Claude Code AI extraction
CLAUDECODEEXTRACT_MAX_TURNSο
Default: [10] (falls back to CLAUDECODE_MAX_TURNS)
Maximum number of agentic turns for extraction
CLAUDECODEEXTRACT_MODELο
Default: [sonnet] (falls back to CLAUDECODE_MODEL)
Claude model to use for extraction (e.g. sonnet, opus, haiku)
CLAUDECODEEXTRACT_PROMPTο
Default: [see defaults] Custom prompt for Claude Code extraction. Use this to define what Claude should extract or generate from the snapshot.
CLAUDECODEEXTRACT_TIMEOUTο
Default: [120] (falls back to CLAUDECODE_TIMEOUT)
Timeout for Claude Code extraction in seconds
Claude Code Cleanup Settingsο
CLAUDECODECLEANUP_ENABLEDο
Default: [False]
Enable Claude Code AI cleanup of snapshot files
CLAUDECODECLEANUP_MAX_TURNSο
Default: [15] (falls back to CLAUDECODE_MAX_TURNS)
Maximum number of agentic turns for cleanup
CLAUDECODECLEANUP_MODELο
Default: [sonnet] (falls back to CLAUDECODE_MODEL)
Claude model to use for cleanup (e.g. sonnet, opus, haiku)
CLAUDECODECLEANUP_PROMPTο
Default: [see defaults] Custom prompt for Claude Code cleanup. Defines what Claude should clean up and how to determine which duplicates to keep.
CLAUDECODECLEANUP_TIMEOUTο
Default: [120] (falls back to CLAUDECODE_TIMEOUT)
Timeout for Claude Code cleanup in seconds
Ripgrep Search Settingsο
RIPGREP_ARGSο
Default: [['--files-with-matches', '--no-messages', '--ignore-case']]
Default ripgrep arguments
RIPGREP_ARGS_EXTRAο
Default: [[]]
Extra arguments to append to ripgrep command
RIPGREP_BINARYο
Default: [rg]
Path to ripgrep binary
RIPGREP_TIMEOUTο
Default: [90] (falls back to TIMEOUT)
Search timeout in seconds
Sonic Search Settingsο
SEARCH_BACKEND_SONIC_BUCKETο
Default: [snapshots]
Sonic bucket name
SEARCH_BACKEND_SONIC_COLLECTIONο
Default: [archivebox]
Sonic collection name
SEARCH_BACKEND_SONIC_HOST_NAMEο
Default: [127.0.0.1]
Sonic server hostname
SEARCH_BACKEND_SONIC_PASSWORDο
Default: [SecretPassword]
Sonic server password
SEARCH_BACKEND_SONIC_PORTο
Default: [1491]
Sonic server port
SQLite FTS Search Settingsο
SEARCH_BACKEND_SQLITE_DBο
Default: [search.sqlite3]
SQLite FTS database filename
SEARCH_BACKEND_SQLITE_SEPARATE_DATABASEο
Default: [True]
Use separate database file for FTS index
SEARCH_BACKEND_SQLITE_TOKENIZERSο
Default: [porter unicode61 remove_diacritics 2]
FTS5 tokenizer configuration