archivebox.misc.legacy
Legacy archive import utilities.
These functions are used to import data from old ArchiveBox archive formats (JSON indexes, archive directory structures) into the new database.
This is separate from the hooks-based parser system which handles importing new URLs from bookmark files, RSS feeds, etc.
Module Contents
Classes
Dictionary type representing a snapshot/link, compatible with Snapshot model fields. |
Functions
Parse links from the main JSON index file (archive/index.json). |
|
Parse links from individual snapshot index.jsonl/index.json files in archive directories. |
API
- class archivebox.misc.legacy.SnapshotDict[source]
Bases:
typing.TypedDictDictionary type representing a snapshot/link, compatible with Snapshot model fields.
Initialization
Initialize self. See help(type(self)) for accurate signature.
- archivebox.misc.legacy.parse_json_main_index(out_dir: pathlib.Path) collections.abc.Iterator[archivebox.misc.legacy.SnapshotDict][source]
Parse links from the main JSON index file (archive/index.json).
This is used to recover links from old archive formats.
- archivebox.misc.legacy.parse_json_links_details(out_dir: pathlib.Path) collections.abc.Iterator[archivebox.misc.legacy.SnapshotDict][source]
Parse links from individual snapshot index.jsonl/index.json files in archive directories.
Walks through archive//index.jsonl and archive//index.json files to discover orphaned snapshots. Prefers index.jsonl (new format) over index.json (legacy format).