archivebox.parsers package

Submodules

archivebox.parsers.generic_json module

archivebox.parsers.generic_json.parse_generic_json_export(json_file: IO[str], **_kwargs) → Iterable[Link][source]: Parse JSON-format bookmarks export files (produced by pinboard.in/export/, or wallabag)

archivebox.parsers.generic_json.PARSER(json_file: IO[str], **_kwargs) → Iterable[Link]: Parse JSON-format bookmarks export files (produced by pinboard.in/export/, or wallabag)

archivebox.parsers.generic_rss module

archivebox.parsers.generic_rss.parse_generic_rss_export(rss_file: IO[str], **_kwargs) → Iterable[Link][source]: Parse RSS XML-format files into links

archivebox.parsers.generic_rss.PARSER(rss_file: IO[str], **_kwargs) → Iterable[Link]: Parse RSS XML-format files into links

archivebox.parsers.generic_txt module

archivebox.parsers.generic_txt.parse_generic_txt_export(text_file: IO[str], **_kwargs) → Iterable[Link][source]: Parse links from a text file, ignoring other text

archivebox.parsers.generic_txt.PARSER(text_file: IO[str], **_kwargs) → Iterable[Link]: Parse links from a text file, ignoring other text

archivebox.parsers.medium_rss module

archivebox.parsers.medium_rss.parse_medium_rss_export(rss_file: IO[str], **_kwargs) → Iterable[Link][source]: Parse Medium RSS feed files into links

archivebox.parsers.medium_rss.PARSER(rss_file: IO[str], **_kwargs) → Iterable[Link]: Parse Medium RSS feed files into links

archivebox.parsers.netscape_html module

archivebox.parsers.netscape_html.parse_netscape_html_export(html_file: IO[str], **_kwargs) → Iterable[Link][source]: Parse netscape-format bookmarks export files (produced by all browsers)

archivebox.parsers.netscape_html.PARSER(html_file: IO[str], **_kwargs) → Iterable[Link]: Parse netscape-format bookmarks export files (produced by all browsers)

archivebox.parsers.pinboard_rss module

archivebox.parsers.pinboard_rss.parse_pinboard_rss_export(rss_file: IO[str], **_kwargs) → Iterable[Link][source]: Parse Pinboard RSS feed files into links

archivebox.parsers.pinboard_rss.PARSER(rss_file: IO[str], **_kwargs) → Iterable[Link]: Parse Pinboard RSS feed files into links

archivebox.parsers.pocket_html module

archivebox.parsers.pocket_html.parse_pocket_html_export(html_file: IO[str], **_kwargs) → Iterable[Link][source]: Parse Pocket-format bookmarks export files (produced by getpocket.com/export/)

archivebox.parsers.pocket_html.PARSER(html_file: IO[str], **_kwargs) → Iterable[Link]: Parse Pocket-format bookmarks export files (produced by getpocket.com/export/)

archivebox.parsers.shaarli_rss module

archivebox.parsers.shaarli_rss.parse_shaarli_rss_export(rss_file: IO[str], **_kwargs) → Iterable[Link][source]: Parse Shaarli-specific RSS XML-format files into links

archivebox.parsers.shaarli_rss.PARSER(rss_file: IO[str], **_kwargs) → Iterable[Link]: Parse Shaarli-specific RSS XML-format files into links

Module contents

Everything related to parsing links from input sources.

For a list of supported services, see the README.md. For examples of supported import formats see tests/.

archivebox.parsers.parse_links_memory(urls: List[str], root_url: str | None = None)[source]: parse a list of URLS without touching the filesystem

archivebox.parsers.parse_links(source_file: str, root_url: str | None = None, parser: str = 'auto') → Tuple[List[Link], str][source]: parse a list of URLs with their metadata from an RSS feed, bookmarks export, or text file

archivebox.parsers.run_parser_functions(to_parse: IO[str], timer, root_url: str | None = None, parser: str = 'auto') → Tuple[List[Link], str | None][source]

archivebox.parsers.save_text_as_source(raw_text: str, filename: str = '{ts}-stdin.txt', out_dir: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/latest')) → str[source]

archivebox.parsers.save_file_as_source(path: str, timeout: int = 60, filename: str = '{ts}-{basename}.txt', out_dir: Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/latest')) → str[source]: download a given url’s content into output/sources/domain-<timestamp>.txt

Read the Docs v: latest

Versions: master; latest; v0.7.2; v0.7.1; v0.7.0; v0.6.2; v0.6.0; v0.5.6; v0.5.4; v0.5.3; v0.4.24; v0.4.21; v0.4.20; v0.4.19; v0.4.18; v0.4.17; v0.4.16; v0.4.15; v0.4.14; v0.4.13; v0.4.12; v0.4.9; v0.4.3; v0.4.2; v0.4.1; v0.4.0; v0.2.4; v0.2.3; v0.2.2; v0.2.1; v0.2.0; v0.1.0; dev; v0.0.3; v0.0.2; v0.0.1

Downloads: pdf; epub

On Read the Docs: Project Home; Builds