archivebox.main
Module Contents
Functions
Print the ArchiveBox help message and usage |
|
Print the ArchiveBox version and dependency information |
|
Run a given ArchiveBox subcommand with the given list of args |
|
Initialize a new ArchiveBox collection in the current directory |
|
Print out some info and statistics about the archive collection |
|
Create a single URL archive folder with an index.json and index.html, and all the archive method outputs. You can run this to archive single pages without needing to create a whole collection with archivebox init. |
|
Add a new URL or list of URLs to your archive |
|
Remove the specified URLs from the archive |
|
Import any new links from subscriptions and retry any previously failed/skipped links |
|
List, filter, and export information about archive entries |
|
Automatically install all ArchiveBox dependencies and extras |
|
Get and set your ArchiveBox project configuration values |
|
Set ArchiveBox to regularly import URLs at specific times using cron |
|
Run the ArchiveBox HTTP server |
|
Run an ArchiveBox Django management command |
|
Enter an interactive ArchiveBox Django shell |
Data
API
- archivebox.main.help(out_dir: pathlib.Path = DATA_DIR) None [source]
Print the ArchiveBox help message and usage
- archivebox.main.version(quiet: bool = False, out_dir: pathlib.Path = DATA_DIR, binproviders: Optional[List[str]] = None, binaries: Optional[List[str]] = None) None [source]
Print the ArchiveBox version and dependency information
- archivebox.main.run(subcommand: str, subcommand_args: Optional[List[str]], stdin: Optional[IO] = None, out_dir: pathlib.Path = DATA_DIR) None [source]
Run a given ArchiveBox subcommand with the given list of args
- archivebox.main.init(force: bool = False, quick: bool = False, install: bool = False, out_dir: pathlib.Path = DATA_DIR) None [source]
Initialize a new ArchiveBox collection in the current directory
- archivebox.main.status(out_dir: pathlib.Path = DATA_DIR) None [source]
Print out some info and statistics about the archive collection
- archivebox.main.oneshot(url: str, extractors: str = '', out_dir: pathlib.Path = DATA_DIR, created_by_id: int | None = None) List[archivebox.index.schema.Link] [source]
Create a single URL archive folder with an index.json and index.html, and all the archive method outputs. You can run this to archive single pages without needing to create a whole collection with archivebox init.
- archivebox.main.add(urls: Union[str, List[str]], tag: str = '', depth: int = 0, update: bool = not ARCHIVING_CONFIG.ONLY_NEW, update_all: bool = False, index_only: bool = False, overwrite: bool = False, init: bool = False, extractors: str = '', parser: str = 'auto', created_by_id: int | None = None, out_dir: pathlib.Path = DATA_DIR) List[archivebox.index.schema.Link] [source]
Add a new URL or list of URLs to your archive
- archivebox.main.remove(filter_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: str = 'exact', snapshots: Optional[django.db.models.QuerySet] = None, after: Optional[float] = None, before: Optional[float] = None, yes: bool = False, delete: bool = False, out_dir: pathlib.Path = DATA_DIR) List[archivebox.index.schema.Link] [source]
Remove the specified URLs from the archive
- archivebox.main.update(resume: Optional[float] = None, only_new: bool = ARCHIVING_CONFIG.ONLY_NEW, index_only: bool = False, overwrite: bool = False, filter_patterns_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: Optional[str] = None, status: Optional[str] = None, after: Optional[str] = None, before: Optional[str] = None, extractors: str = '', out_dir: pathlib.Path = DATA_DIR) List[archivebox.index.schema.Link] [source]
Import any new links from subscriptions and retry any previously failed/skipped links
- archivebox.main.list_all(filter_patterns_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: str = 'exact', status: Optional[str] = None, after: Optional[float] = None, before: Optional[float] = None, sort: Optional[str] = None, csv: Optional[str] = None, json: bool = False, html: bool = False, with_headers: bool = False, out_dir: pathlib.Path = DATA_DIR)[source]
List, filter, and export information about archive entries
- archivebox.main.list_links(snapshots: Optional[django.db.models.QuerySet] = None, filter_patterns: Optional[List[str]] = None, filter_type: str = 'exact', after: Optional[float] = None, before: Optional[float] = None, out_dir: pathlib.Path = DATA_DIR) Iterable[archivebox.index.schema.Link] [source]
- archivebox.main.list_folders(links: List[archivebox.index.schema.Link], status: str, out_dir: pathlib.Path = DATA_DIR) Dict[str, Optional[archivebox.index.schema.Link]] [source]
- archivebox.main.install(out_dir: pathlib.Path = DATA_DIR, binproviders: Optional[List[str]] = None, binaries: Optional[List[str]] = None, dry_run: bool = False) None [source]
Automatically install all ArchiveBox dependencies and extras
- archivebox.main.config(config_options_str: Optional[str] = None, config_options: Optional[List[str]] = None, get: bool = False, set: bool = False, search: bool = False, reset: bool = False, out_dir: pathlib.Path = DATA_DIR) None [source]
Get and set your ArchiveBox project configuration values
- archivebox.main.schedule(add: bool = False, show: bool = False, clear: bool = False, foreground: bool = False, run_all: bool = False, quiet: bool = False, every: Optional[str] = None, tag: str = '', depth: int = 0, overwrite: bool = False, update: bool = not ARCHIVING_CONFIG.ONLY_NEW, import_path: Optional[str] = None, out_dir: pathlib.Path = DATA_DIR)[source]
Set ArchiveBox to regularly import URLs at specific times using cron
- archivebox.main.server(runserver_args: Optional[List[str]] = None, reload: bool = False, debug: bool = False, init: bool = False, quick_init: bool = False, createsuperuser: bool = False, daemonize: bool = False, out_dir: pathlib.Path = DATA_DIR) None [source]
Run the ArchiveBox HTTP server