archivebox.misc.db
Database utility functions for ArchiveBox.
Post-bootstrap: requires archivebox.config constants and uses Django lazily
(from django.db import ... inside functions). Not safe to import pre-bootstrap.
Module Contents
Functions
Advance one step of a batched SQLite |
|
Cheaply compare migration files to django_migrations without invoking migrate. |
|
Return migration files on disk that have not been applied yet. |
|
Apply pending Django migrations |
Data
API
- archivebox.misc.db.run_db_analyze_batch(remaining: list[str] | None, *, max_seconds_per_table: float = 120.0) list[str][source]
Advance one step of a batched SQLite
ANALYZEsweep.Without periodic ANALYZE the optimizer’s table stats go stale as snapshot/archiveresult tables grow, causing it to start large joins from
auth_userinstead of using the indexed url column and blowing snapshot detail page render time from ~50ms to ~500ms+.The whole sweep is spread across many calls instead of running as one blocking
ANALYZE: passNoneto start a fresh sweep (this call enumerates user tables and runsANALYZEon the first one); pass the returned list to advance one more table on each subsequent call. An empty return value means the sweep is complete (or has been aborted) and the next caller should passNoneagain. Caller is responsible for throttling new sweeps (orchestrator starts at most one per 24hr while idle) and enforcing a hard upper bound on total sweep wall time.Safety guarantees:
Never raises: every database call is wrapped; on any failure the function returns
[](abandoning the rest of the sweep) so the orchestrator never crashes on maintenance errors.Bounded per-call wall time: a SQLite progress handler aborts the current
ANALYZEstatement oncemax_seconds_per_tableis exceeded, so a single pathological table cannot wedge the call.Never leaves the db locked: each
ANALYZEruns as a single statement transaction that auto-commits (or rolls back on abort/error). The cursor and progress handler are always cleaned up infinallyblocks even if Python raises mid-call.Silent no-op on non-SQLite backends.
WAL journal mode (set in Django settings) keeps readers fully unblocked throughout; the writer lock is only held for the brief
sqlite_stat*flush after each table completes.
- archivebox.misc.db.sqlite_lock_holders(db_path: pathlib.Path = CONSTANTS.DATABASE_FILE) list[str][source]
- archivebox.misc.db.log_sqlite_lock_holders(console: Any, *, db_path: pathlib.Path = CONSTANTS.DATABASE_FILE, limit: int = 8) None[source]
- archivebox.misc.db.retry_sqlite_locks(action: collections.abc.Callable[[], Any], *, label: str, stderr: TextIO | None = None) Any[source]
- archivebox.misc.db.migration_state(out_dir: pathlib.Path = CONSTANTS.DATA_DIR) tuple[list[str], list[str], dict[str, str]][source]
Cheaply compare migration files to django_migrations without invoking migrate.