archivebox.extractors package¶
Submodules¶
archivebox.extractors.archive_org module¶
-
archivebox.extractors.archive_org.
should_save_archive_dot_org
(link: archivebox.index.schema.Link, out_dir: Optional[str] = None) → bool[source]¶
archivebox.extractors.dom module¶
archivebox.extractors.favicon module¶
archivebox.extractors.git module¶
archivebox.extractors.media module¶
archivebox.extractors.pdf module¶
archivebox.extractors.screenshot module¶
archivebox.extractors.title module¶
archivebox.extractors.wget module¶
-
archivebox.extractors.wget.
should_save_wget
(link: archivebox.index.schema.Link, out_dir: Optional[str] = None) → bool[source]¶
Module contents¶
-
archivebox.extractors.
archive_link
(link: archivebox.index.schema.Link, overwrite: bool = False, methods: Optional[Iterable[str]] = None, out_dir: Optional[str] = None, skip_index: bool = False) → archivebox.index.schema.Link[source]¶ download the DOM, PDF, and a screenshot into a folder named after the link’s timestamp