archivebox.parsers package¶
Submodules¶
archivebox.parsers.generic_json module¶
-
archivebox.parsers.generic_json.
parse_generic_json_export
(json_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link][source]¶ Parse JSON-format bookmarks export files (produced by pinboard.in/export/, or wallabag)
-
archivebox.parsers.generic_json.
PARSER
(json_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link]¶ Parse JSON-format bookmarks export files (produced by pinboard.in/export/, or wallabag)
archivebox.parsers.generic_rss module¶
-
archivebox.parsers.generic_rss.
parse_generic_rss_export
(rss_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link][source]¶ Parse RSS XML-format files into links
-
archivebox.parsers.generic_rss.
PARSER
(rss_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link]¶ Parse RSS XML-format files into links
archivebox.parsers.generic_txt module¶
-
archivebox.parsers.generic_txt.
parse_generic_txt_export
(text_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link][source]¶ Parse links from a text file, ignoring other text
-
archivebox.parsers.generic_txt.
PARSER
(text_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link]¶ Parse links from a text file, ignoring other text
archivebox.parsers.medium_rss module¶
-
archivebox.parsers.medium_rss.
parse_medium_rss_export
(rss_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link][source]¶ Parse Medium RSS feed files into links
-
archivebox.parsers.medium_rss.
PARSER
(rss_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link]¶ Parse Medium RSS feed files into links
archivebox.parsers.netscape_html module¶
-
archivebox.parsers.netscape_html.
parse_netscape_html_export
(html_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link][source]¶ Parse netscape-format bookmarks export files (produced by all browsers)
-
archivebox.parsers.netscape_html.
PARSER
(html_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link]¶ Parse netscape-format bookmarks export files (produced by all browsers)
archivebox.parsers.pinboard_rss module¶
-
archivebox.parsers.pinboard_rss.
parse_pinboard_rss_export
(rss_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link][source]¶ Parse Pinboard RSS feed files into links
-
archivebox.parsers.pinboard_rss.
PARSER
(rss_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link]¶ Parse Pinboard RSS feed files into links
archivebox.parsers.pocket_html module¶
-
archivebox.parsers.pocket_html.
parse_pocket_html_export
(html_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link][source]¶ Parse Pocket-format bookmarks export files (produced by getpocket.com/export/)
-
archivebox.parsers.pocket_html.
PARSER
(html_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link]¶ Parse Pocket-format bookmarks export files (produced by getpocket.com/export/)
archivebox.parsers.shaarli_rss module¶
-
archivebox.parsers.shaarli_rss.
parse_shaarli_rss_export
(rss_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link][source]¶ Parse Shaarli-specific RSS XML-format files into links
-
archivebox.parsers.shaarli_rss.
PARSER
(rss_file: IO[str], **_kwargs) → Iterable[archivebox.index.schema.Link]¶ Parse Shaarli-specific RSS XML-format files into links
Module contents¶
Everything related to parsing links from input sources.
For a list of supported services, see the README.md. For examples of supported import formats see tests/.
-
archivebox.parsers.
parse_links_memory
(urls: List[str], root_url: Optional[str] = None)[source]¶ parse a list of URLS without touching the filesystem
-
archivebox.parsers.
parse_links
(source_file: str, root_url: Optional[str] = None, parser: str = 'auto') → Tuple[List[archivebox.index.schema.Link], str][source]¶ parse a list of URLs with their metadata from an RSS feed, bookmarks export, or text file
-
archivebox.parsers.
run_parser_functions
(to_parse: IO[str], timer, root_url: Optional[str] = None, parser: str = 'auto') → Tuple[List[archivebox.index.schema.Link], Optional[str]][source]¶
-
archivebox.parsers.
save_text_as_source
(raw_text: str, filename: str = '{ts}-stdin.txt', out_dir: pathlib.Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.6.0/docs')) → str[source]¶
-
archivebox.parsers.
save_file_as_source
(path: str, timeout: int = 60, filename: str = '{ts}-{basename}.txt', out_dir: pathlib.Path = PosixPath('/home/docs/checkouts/readthedocs.org/user_builds/archivebox/checkouts/v0.6.0/docs')) → str[source]¶ download a given url’s content into output/sources/domain-<timestamp>.txt