archivebox.cli.archivebox_updateο
Module Contentsο
Functionsο
Update snapshots: migrate old dirs, reconcile DB, and re-queue for archiving. |
|
Drain old archive/ directories (0.8.x β 0.9.x migration). |
|
O(n) scan over entire DB from most recent to least recent. |
|
Process snapshots matching filters (DB query only). |
|
Print statistics for filtered mode. |
|
Print statistics for full mode. |
|
Dataο
APIο
- archivebox.cli.archivebox_update.LINK_FILTERS: dict[str, collections.abc.Callable[[str], django.db.models.Q]][source]ο
None
- archivebox.cli.archivebox_update._apply_pattern_filters(snapshots: django.db.models.QuerySet[archivebox.core.models.Snapshot, archivebox.core.models.Snapshot], filter_patterns: list[str], filter_type: str) django.db.models.QuerySet[archivebox.core.models.Snapshot, archivebox.core.models.Snapshot][source]ο
- archivebox.cli.archivebox_update._get_snapshot_crawl(snapshot: archivebox.core.models.Snapshot) Crawl | None[source]ο
- archivebox.cli.archivebox_update._build_filtered_snapshots_queryset(*, filter_patterns: collections.abc.Iterable[str], filter_type: str, before: float | None, after: float | None, resume: str | None = None)[source]ο
- archivebox.cli.archivebox_update.reindex_snapshots(snapshots: django.db.models.QuerySet[archivebox.core.models.Snapshot, archivebox.core.models.Snapshot], *, search_plugins: list[str], batch_size: int) dict[str, int][source]ο
- archivebox.cli.archivebox_update.update(filter_patterns: collections.abc.Iterable[str] = (), filter_type: str = 'exact', before: float | None = None, after: float | None = None, resume: str | None = None, batch_size: int = 100, continuous: bool = False, index_only: bool = False) None[source]ο
Update snapshots: migrate old dirs, reconcile DB, and re-queue for archiving.
Three-phase operation (without filters):
Phase 1: Drain old archive/ dirs by moving to new fs location (0.8.x β 0.9.x)
Phase 2: O(n) scan over entire DB from most recent to least recent
No orphan scans needed (trust 1:1 mapping between DB and filesystem after phase 1)
With filters: Only phase 2 (DB query), no filesystem operations. Without filters: All phases (full update).
- archivebox.cli.archivebox_update.drain_old_archive_dirs(resume_from: str | None = None, batch_size: int = 100) dict[str, int][source]ο
Drain old archive/ directories (0.8.x β 0.9.x migration).
Only processes real directories (skips symlinks - those are already migrated). For each old dir found in archive/:
Load or create DB snapshot
Trigger fs migration on save() to move to data/users/{user}/β¦
Leave symlink in archive/ pointing to new location
After this drains, archive/ should only contain symlinks and we can trust 1:1 mapping between DB and filesystem.
- archivebox.cli.archivebox_update.process_all_db_snapshots(batch_size: int = 100, resume: str | None = None) dict[str, int][source]ο
O(n) scan over entire DB from most recent to least recent.
For each snapshot:
Reconcile index.json with DB (merge titles, tags, archive results)
Queue for archiving (state machine will handle it)
No orphan detection needed - we trust 1:1 mapping between DB and filesystem after Phase 1 has drained all old archive/ directories.
- archivebox.cli.archivebox_update.process_filtered_snapshots(filter_patterns: collections.abc.Iterable[str], filter_type: str, before: float | None, after: float | None, resume: str | None, batch_size: int) dict[str, int][source]ο
Process snapshots matching filters (DB query only).
- archivebox.cli.archivebox_update.print_stats(stats: dict)[source]ο
Print statistics for filtered mode.