archivebox.workers.models
Module Contents
Classes
Base class for all ArchiveBox state machines. |
Data
API
- archivebox.workers.models.default_retry_at_field: django.db.models.DateTimeField[source]
‘DateTimeField(…)’
- class archivebox.workers.models.BaseModelWithStateMachine[source]
Bases:
django.db.models.Model,statemachine.mixins.MachineMixin- StatusChoices: ClassVar[type[archivebox.workers.models.DefaultStatusChoices]][source]
None
- active_state: archivebox.workers.models.ObjectState[source]
None
- static _state_to_str(state: archivebox.workers.models.ObjectState) str[source]
Convert a statemachine.State, models.TextChoices.choices value, or Enum value to a str
- update_and_requeue(**kwargs) bool[source]
Atomically update fields and schedule retry_at for next worker tick. Returns True if the update was successful, False if the object was modified by another worker.
- classmethod get_queue()[source]
Get the sorted and filtered QuerySet of objects that are ready for processing. Objects are ready if:
status is not in FINAL_STATES
retry_at is in the past (or now)
- classmethod claim_for_worker(obj: archivebox.workers.models.BaseModelWithStateMachine, lock_seconds: int = 60) bool[source]
Atomically claim a due object for processing using retry_at as the lock.
Correct lifecycle for any state-machine-driven work item:
Queue the item by setting retry_at <= now
Exactly one owner claims it by moving retry_at into the future
Only that owner may call .sm.tick() and perform side effects
State-machine callbacks update retry_at again when the work completes, backs off, or is re-queued
The critical rule is that future retry_at values are already owned. Callers must never “steal” those future timestamps and start another copy of the same work. That is what prevents duplicate installs, hook runs, and other concurrent side effects.
Returns True if successfully claimed, False if another worker got it first or the object is not currently due.
- claim_processing_lock(lock_seconds: int = 60) bool[source]
Claim this model instance immediately before executing one state-machine tick.
This helper is the safe entrypoint for any direct state-machine driver (workers, synchronous crawl dependency installers, one-off CLI helpers). Calling
.sm.tick()without claiming first turns retry_at into “just a schedule” instead of the ownership lock it is meant to be.Returns True only for the caller that successfully moved retry_at into the future. False means another process already owns the work item or it is not currently due.
- tick_claimed(lock_seconds: int = 60) bool[source]
Claim ownership via retry_at and then execute exactly one
.sm.tick().Future maintainers should prefer this helper over calling
.sm.tick()directly whenever there is any chance another process could see the same queued row. If this method returns False, someone else already owns the work and the caller must not run side effects for it.
- classmethod extend_choices(base_choices: type[django.db.models.TextChoices])[source]
Decorator to extend the base choices with extra choices, e.g.:
class MyModel(ModelWithStateMachine):
@ModelWithStateMachine.extend_choices(ModelWithStateMachine.StatusChoices) class StatusChoices(models.TextChoices): SUCCEEDED = 'succeeded' FAILED = 'failed' SKIPPED = 'skipped'
- classmethod StatusField(**kwargs) django.db.models.CharField[source]
Used on subclasses to extend/modify the status field with updated kwargs. e.g.:
class MyModel(ModelWithStateMachine): class StatusChoices(ModelWithStateMachine.StatusChoices): QUEUED = ‘queued’, ‘Queued’ STARTED = ‘started’, ‘Started’ SEALED = ‘sealed’, ‘Sealed’ BACKOFF = ‘backoff’, ‘Backoff’ FAILED = ‘failed’, ‘Failed’ SKIPPED = ‘skipped’, ‘Skipped’
status = ModelWithStateMachine.StatusField(choices=StatusChoices.choices, default=StatusChoices.QUEUED)
- class archivebox.workers.models.ModelWithStateMachine[source]
- class archivebox.workers.models.BaseStateMachine(obj, *args, **kwargs)[source]
Bases:
statemachine.StateMachineBase class for all ArchiveBox state machines.
Eliminates boilerplate init, repr, str methods that were duplicated across all 4 state machines (Snapshot, ArchiveResult, Crawl, Binary).
Subclasses must set model_attr_name to specify the attribute name (e.g., ‘snapshot’, ‘archiveresult’, ‘crawl’, ‘binary’).
Example usage: class SnapshotMachine(BaseStateMachine): model_attr_name = ‘snapshot’
# States and transitions... queued = State(value=Snapshot.StatusChoices.QUEUED, initial=True) # ...The model instance is accessible via self.{model_attr_name} (e.g., self.snapshot, self.archiveresult, etc.)
Initialization