abx_plugin_title.extractor
Module Contents
Classes
Functions
Try to find wget, singlefile and then dom files. If none is found, download the url again. |
|
try to guess the page’s title from its content |
Data
API
- class abx_plugin_title.extractor.TitleParser(*args, **kwargs)[source]
Bases:
html.parser.HTMLParser
- abx_plugin_title.extractor.get_html(link: archivebox.index.schema.Link, path: pathlib.Path, timeout: int = CURL_CONFIG.CURL_TIMEOUT) str [source]
Try to find wget, singlefile and then dom files. If none is found, download the url again.
- abx_plugin_title.extractor.should_save_title(link: archivebox.index.schema.Link, out_dir: Optional[str] = None, overwrite: Optional[bool] = False) bool [source]
- abx_plugin_title.extractor.save_title(link: archivebox.index.schema.Link, out_dir: Optional[pathlib.Path] = None, timeout: int = CURL_CONFIG.CURL_TIMEOUT) archivebox.index.schema.ArchiveResult [source]
try to guess the page’s title from its content