archivebox.main

Module Contents

Functions

help

Print the ArchiveBox help message and usage

version

Print the ArchiveBox version and dependency information

run

Run a given ArchiveBox subcommand with the given list of args

init

Initialize a new ArchiveBox collection in the current directory

status

Print out some info and statistics about the archive collection

oneshot

Create a single URL archive folder with an index.json and index.html, and all the archive method outputs. You can run this to archive single pages without needing to create a whole collection with archivebox init.

add

Add a new URL or list of URLs to your archive

remove

Remove the specified URLs from the archive

update

Import any new links from subscriptions and retry any previously failed/skipped links

list_all

List, filter, and export information about archive entries

list_links

list_folders

install

Automatically install all ArchiveBox dependencies and extras

config

Get and set your ArchiveBox project configuration values

schedule

Set ArchiveBox to regularly import URLs at specific times using cron

server

Run the ArchiveBox HTTP server

manage

Run an ArchiveBox Django management command

shell

Enter an interactive ArchiveBox Django shell

Data

setup

API

archivebox.main.help(out_dir: pathlib.Path = DATA_DIR) None[source]

Print the ArchiveBox help message and usage

archivebox.main.version(quiet: bool = False, out_dir: pathlib.Path = DATA_DIR, binproviders: Optional[List[str]] = None, binaries: Optional[List[str]] = None) None[source]

Print the ArchiveBox version and dependency information

archivebox.main.run(subcommand: str, subcommand_args: Optional[List[str]], stdin: Optional[IO] = None, out_dir: pathlib.Path = DATA_DIR) None[source]

Run a given ArchiveBox subcommand with the given list of args

archivebox.main.init(force: bool = False, quick: bool = False, install: bool = False, out_dir: pathlib.Path = DATA_DIR) None[source]

Initialize a new ArchiveBox collection in the current directory

archivebox.main.status(out_dir: pathlib.Path = DATA_DIR) None[source]

Print out some info and statistics about the archive collection

archivebox.main.oneshot(url: str, extractors: str = '', out_dir: pathlib.Path = DATA_DIR, created_by_id: int | None = None) List[archivebox.index.schema.Link][source]

Create a single URL archive folder with an index.json and index.html, and all the archive method outputs. You can run this to archive single pages without needing to create a whole collection with archivebox init.

archivebox.main.add(urls: Union[str, List[str]], tag: str = '', depth: int = 0, update: bool = not ARCHIVING_CONFIG.ONLY_NEW, update_all: bool = False, index_only: bool = False, overwrite: bool = False, init: bool = False, extractors: str = '', parser: str = 'auto', created_by_id: int | None = None, out_dir: pathlib.Path = DATA_DIR) List[archivebox.index.schema.Link][source]

Add a new URL or list of URLs to your archive

archivebox.main.remove(filter_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: str = 'exact', snapshots: Optional[django.db.models.QuerySet] = None, after: Optional[float] = None, before: Optional[float] = None, yes: bool = False, delete: bool = False, out_dir: pathlib.Path = DATA_DIR) List[archivebox.index.schema.Link][source]

Remove the specified URLs from the archive

archivebox.main.update(resume: Optional[float] = None, only_new: bool = ARCHIVING_CONFIG.ONLY_NEW, index_only: bool = False, overwrite: bool = False, filter_patterns_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: Optional[str] = None, status: Optional[str] = None, after: Optional[str] = None, before: Optional[str] = None, extractors: str = '', out_dir: pathlib.Path = DATA_DIR) List[archivebox.index.schema.Link][source]

Import any new links from subscriptions and retry any previously failed/skipped links

archivebox.main.list_all(filter_patterns_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: str = 'exact', status: Optional[str] = None, after: Optional[float] = None, before: Optional[float] = None, sort: Optional[str] = None, csv: Optional[str] = None, json: bool = False, html: bool = False, with_headers: bool = False, out_dir: pathlib.Path = DATA_DIR)[source]

List, filter, and export information about archive entries

archivebox.main.list_folders(links: List[archivebox.index.schema.Link], status: str, out_dir: pathlib.Path = DATA_DIR) Dict[str, Optional[archivebox.index.schema.Link]][source]
archivebox.main.install(out_dir: pathlib.Path = DATA_DIR, binproviders: Optional[List[str]] = None, binaries: Optional[List[str]] = None, dry_run: bool = False) None[source]

Automatically install all ArchiveBox dependencies and extras

archivebox.main.setup[source]

None

archivebox.main.config(config_options_str: Optional[str] = None, config_options: Optional[List[str]] = None, get: bool = False, set: bool = False, search: bool = False, reset: bool = False, out_dir: pathlib.Path = DATA_DIR) None[source]

Get and set your ArchiveBox project configuration values

archivebox.main.schedule(add: bool = False, show: bool = False, clear: bool = False, foreground: bool = False, run_all: bool = False, quiet: bool = False, every: Optional[str] = None, tag: str = '', depth: int = 0, overwrite: bool = False, update: bool = not ARCHIVING_CONFIG.ONLY_NEW, import_path: Optional[str] = None, out_dir: pathlib.Path = DATA_DIR)[source]

Set ArchiveBox to regularly import URLs at specific times using cron

archivebox.main.server(runserver_args: Optional[List[str]] = None, reload: bool = False, debug: bool = False, init: bool = False, quick_init: bool = False, createsuperuser: bool = False, daemonize: bool = False, out_dir: pathlib.Path = DATA_DIR) None[source]

Run the ArchiveBox HTTP server

archivebox.main.manage(args: Optional[List[str]] = None, out_dir: pathlib.Path = DATA_DIR) None[source]

Run an ArchiveBox Django management command

archivebox.main.shell(out_dir: pathlib.Path = DATA_DIR) None[source]

Enter an interactive ArchiveBox Django shell