archivebox.parsers

Everything related to parsing links from input sources.

For a list of supported services, see the README.md. For examples of supported import formats see tests/.

Submodules

Package Contents

Functions

parse_links_memory

parse a list of URLS without touching the filesystem

parse_links

parse a list of URLs with their metadata from an RSS feed, bookmarks export, or text file

run_parser_functions

save_text_as_source

save_file_as_source

download a given url’s content into output/sources/domain-.txt

Data

PARSERS

API

archivebox.parsers.PARSERS[source]

None

parse a list of URLS without touching the filesystem

parse a list of URLs with their metadata from an RSS feed, bookmarks export, or text file

archivebox.parsers.run_parser_functions(to_parse: IO[str], timer, root_url: Optional[str] = None, parser: str = 'auto') Tuple[List[archivebox.index.schema.Link], Optional[str]][source]
archivebox.parsers.save_text_as_source(raw_text: str, filename: str = '{ts}-stdin.txt', out_dir: pathlib.Path = DATA_DIR) str[source]
archivebox.parsers.save_file_as_source(path: str, timeout: int = ARCHIVING_CONFIG.TIMEOUT, filename: str = '{ts}-{basename}.txt', out_dir: pathlib.Path = DATA_DIR) str[source]

download a given url’s content into output/sources/domain-.txt