archivebox package¶
Subpackages¶
- archivebox.cli package
- Submodules
- archivebox.cli.archivebox module
- archivebox.cli.archivebox_add module
- archivebox.cli.archivebox_config module
- archivebox.cli.archivebox_help module
- archivebox.cli.archivebox_info module
- archivebox.cli.archivebox_init module
- archivebox.cli.archivebox_list module
- archivebox.cli.archivebox_manage module
- archivebox.cli.archivebox_remove module
- archivebox.cli.archivebox_schedule module
- archivebox.cli.archivebox_server module
- archivebox.cli.archivebox_shell module
- archivebox.cli.archivebox_update module
- archivebox.cli.archivebox_version module
- archivebox.cli.logging module
- archivebox.cli.tests module
- Module contents
- archivebox.config package
- archivebox.core package
- Subpackages
- Submodules
- archivebox.core.admin module
- archivebox.core.apps module
- archivebox.core.models module
- archivebox.core.settings module
- archivebox.core.tests module
- archivebox.core.urls module
- archivebox.core.views module
- archivebox.core.welcome_message module
- archivebox.core.wsgi module
- Module contents
- archivebox.extractors package
- Submodules
- archivebox.extractors.archive_org module
- archivebox.extractors.dom module
- archivebox.extractors.favicon module
- archivebox.extractors.git module
- archivebox.extractors.media module
- archivebox.extractors.pdf module
- archivebox.extractors.screenshot module
- archivebox.extractors.title module
- archivebox.extractors.wget module
- Module contents
- archivebox.index package
- archivebox.parsers package
- Submodules
- archivebox.parsers.generic_json module
- archivebox.parsers.generic_rss module
- archivebox.parsers.generic_txt module
- archivebox.parsers.medium_rss module
- archivebox.parsers.netscape_html module
- archivebox.parsers.pinboard_rss module
- archivebox.parsers.pocket_html module
- archivebox.parsers.shaarli_rss module
- Module contents
Submodules¶
archivebox.main module¶
-
archivebox.main.help(out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → None[source]¶ Print the ArchiveBox help message and usage
-
archivebox.main.version(quiet: bool = False, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → None[source]¶ Print the ArchiveBox version and dependency information
-
archivebox.main.run(subcommand: str, subcommand_args: Optional[List[str]], stdin: Optional[IO] = None, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → None[source]¶ Run a given ArchiveBox subcommand with the given list of args
-
archivebox.main.init(force: bool = False, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → None[source]¶ Initialize a new ArchiveBox collection in the current directory
-
archivebox.main.status(out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → None[source]¶ Print out some info and statistics about the archive collection
-
archivebox.main.oneshot(url: str, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs')[source]¶ Create a single URL archive folder with an index.json and index.html, and all the archive method outputs. You can run this to archive single pages without needing to create a whole collection with archivebox init.
-
archivebox.main.add(urls: Union[str, List[str]], depth: int = 0, update_all: bool = False, index_only: bool = False, overwrite: bool = False, init: bool = False, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → List[archivebox.index.schema.Link][source]¶ Add a new URL or list of URLs to your archive
-
archivebox.main.remove(filter_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: str = 'exact', links: Optional[List[archivebox.index.schema.Link]] = None, after: Optional[float] = None, before: Optional[float] = None, yes: bool = False, delete: bool = False, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → List[archivebox.index.schema.Link][source]¶ Remove the specified URLs from the archive
-
archivebox.main.update(resume: Optional[float] = None, only_new: bool = True, index_only: bool = False, overwrite: bool = False, filter_patterns_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: Optional[str] = None, status: Optional[str] = None, after: Optional[str] = None, before: Optional[str] = None, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → List[archivebox.index.schema.Link][source]¶ Import any new links from subscriptions and retry any previously failed/skipped links
-
archivebox.main.list_all(filter_patterns_str: Optional[str] = None, filter_patterns: Optional[List[str]] = None, filter_type: str = 'exact', status: Optional[str] = None, after: Optional[float] = None, before: Optional[float] = None, sort: Optional[str] = None, csv: Optional[str] = None, json: bool = False, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → Iterable[archivebox.index.schema.Link][source]¶ List, filter, and export information about archive entries
-
archivebox.main.list_links(filter_patterns: Optional[List[str]] = None, filter_type: str = 'exact', after: Optional[float] = None, before: Optional[float] = None, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → Iterable[archivebox.index.schema.Link][source]¶
-
archivebox.main.list_folders(links: List[archivebox.index.schema.Link], status: str, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → Dict[str, Optional[archivebox.index.schema.Link]][source]¶
-
archivebox.main.config(config_options_str: Optional[str] = None, config_options: Optional[List[str]] = None, get: bool = False, set: bool = False, reset: bool = False, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → None[source]¶ Get and set your ArchiveBox project configuration values
-
archivebox.main.schedule(add: bool = False, show: bool = False, clear: bool = False, foreground: bool = False, run_all: bool = False, quiet: bool = False, every: Optional[str] = None, depth: int = 0, import_path: Optional[str] = None, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs')[source]¶ Set ArchiveBox to regularly import URLs at specific times using cron
-
archivebox.main.server(runserver_args: Optional[List[str]] = None, reload: bool = False, debug: bool = False, init: bool = False, out_dir: str = '/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.4.20/docs') → None[source]¶ Run the ArchiveBox HTTP server
archivebox.manage module¶
archivebox.system module¶
-
archivebox.system.run(*args, input=None, capture_output=True, text=False, **kwargs)[source]¶ Patched of subprocess.run to fix blocking io making timeout=innefective
-
archivebox.system.atomic_write(path: Union[pathlib.Path, str], contents: Union[dict, str, bytes], overwrite: bool = True) → None[source]¶ Safe atomic write to filesystem by writing to temp file + atomic rename
-
archivebox.system.chmod_file(path: str, cwd: str = '.', permissions: str = '755') → None[source]¶ chmod -R <permissions> <cwd>/<path>
-
archivebox.system.copy_and_overwrite(from_path: str, to_path: str)[source]¶ copy a given file or directory to a given path, overwriting the destination
archivebox.util module¶
-
archivebox.util.detect_encoding(rawdata)¶
-
archivebox.util.scheme(url)¶
-
archivebox.util.without_scheme(url)¶
-
archivebox.util.without_query(url)¶
-
archivebox.util.without_fragment(url)¶
-
archivebox.util.without_path(url)¶
-
archivebox.util.path(url)¶
-
archivebox.util.basename(url)¶
-
archivebox.util.domain(url)¶
-
archivebox.util.query(url)¶
-
archivebox.util.fragment(url)¶
-
archivebox.util.extension(url)¶
-
archivebox.util.base_url(url)¶
-
archivebox.util.without_www(url)¶
-
archivebox.util.without_trailing_slash(url)¶
-
archivebox.util.hashurl(url)¶
-
archivebox.util.urlencode(s)¶
-
archivebox.util.urldecode(s)¶
-
archivebox.util.htmlencode(s)¶
-
archivebox.util.htmldecode(s)¶
-
archivebox.util.short_ts(ts)¶
-
archivebox.util.ts_to_date(ts)¶
-
archivebox.util.ts_to_iso(ts)¶
-
archivebox.util.enforce_types(func)[source]¶ Enforce function arg and kwarg types at runtime using its python3 type hints
-
archivebox.util.docstring(text: Optional[str])[source]¶ attach the given docstring to the decorated function
-
archivebox.util.str_between(string: str, start: str, end: str = None) → str[source]¶ (<abc>12345</def>, <abc>, </def>) -> 12345
-
archivebox.util.parse_date(date: Any) → Optional[datetime.datetime][source]¶ Parse unix timestamps, iso format, and human-readable strings
-
archivebox.util.download_url(url: str, timeout: int = None) → str[source]¶ Download the contents of a remote url and return the text
-
archivebox.util.chrome_args(**options) → List[str][source]¶ helper to build up a chrome shell command with arguments
-
archivebox.util.ansi_to_html(text)[source]¶ Based on: https://stackoverflow.com/questions/19212665/python-converting-ansi-color-codes-to-html
-
class
archivebox.util.AttributeDict(*args, **kwargs)[source]¶ Bases:
dictHelper to allow accessing dict values via Example.key or Example[‘key’]
-
class
archivebox.util.ExtendedEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None)[source]¶ Bases:
json.encoder.JSONEncoderExtended json serializer that supports serializing several model fields and objects
-
default(obj)[source]¶ Implement this method in a subclass such that it returns a serializable object for
o, or calls the base implementation (to raise aTypeError).For example, to support arbitrary iterators, you could implement default like this:
def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return JSONEncoder.default(self, o)
-