archivebox.parsers package
Submodules
archivebox.parsers.generic_json module
archivebox.parsers.generic_rss module
archivebox.parsers.generic_txt module
archivebox.parsers.medium_rss module
archivebox.parsers.netscape_html module
archivebox.parsers.pinboard_rss module
archivebox.parsers.pocket_html module
archivebox.parsers.shaarli_rss module
Module contents
Everything related to parsing links from input sources.
For a list of supported services, see the README.md. For examples of supported import formats see tests/.
- archivebox.parsers.parse_links_memory(urls: List[str], root_url: str | None = None)[source]
parse a list of URLS without touching the filesystem
- archivebox.parsers.parse_links(source_file: str, root_url: str | None = None, parser: str = 'auto') Tuple[List[Link], str] [source]
parse a list of URLs with their metadata from an RSS feed, bookmarks export, or text file
- archivebox.parsers.run_parser_functions(to_parse: IO[str], timer, root_url: str | None = None, parser: str = 'auto') Tuple[List[Link], str | None] [source]
- archivebox.parsers.save_text_as_source(raw_text: str, filename: str = '{ts}-stdin.txt', out_dir: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/latest')) str [source]
- archivebox.parsers.save_file_as_source(path: str, timeout: int = 60, filename: str = '{ts}-{basename}.txt', out_dir: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/latest')) str [source]
download a given url’s content into output/sources/domain-<timestamp>.txt