archivebox.workers.models

Module Contents

Classes

DefaultStatusChoices

BaseModelWithStateMachine

ModelWithStateMachine

BaseStateMachine

Base class for all ArchiveBox state machines.

Data

default_status_field

default_retry_at_field

ObjectState

ObjectStateList

API

class archivebox.workers.models.DefaultStatusChoices[source]

Bases: django.db.models.TextChoices

QUEUED[source]

(‘queued’, ‘Queued’)

STARTED[source]

(‘started’, ‘Started’)

SEALED[source]

(‘sealed’, ‘Sealed’)

archivebox.workers.models.default_status_field: django.db.models.CharField[source]

‘CharField(…)’

archivebox.workers.models.default_retry_at_field: django.db.models.DateTimeField[source]

‘DateTimeField(…)’

archivebox.workers.models.ObjectState[source]

None

archivebox.workers.models.ObjectStateList[source]

None

class archivebox.workers.models.BaseModelWithStateMachine[source]

Bases: django.db.models.Model, statemachine.mixins.MachineMixin

StatusChoices: ClassVar[type[archivebox.workers.models.DefaultStatusChoices]][source]

None

state_machine_name: str | None[source]

None

state_field_name: str[source]

None

state_machine_attr: str[source]

‘sm’

bind_events_as_methods: bool[source]

True

active_state: archivebox.workers.models.ObjectState[source]

None

retry_at_field_name: str[source]

None

class Meta[source]

Bases: django_stubs_ext.db.models.TypedModelMeta

app_label[source]

‘workers’

abstract[source]

True

classmethod check(sender=None, **kwargs)[source]
static _state_to_str(state: archivebox.workers.models.ObjectState) str[source]

Convert a statemachine.State, models.TextChoices.choices value, or Enum value to a str

property RETRY_AT: datetime.datetime[source]
property STATE: str[source]
bump_retry_at(seconds: int = 10)[source]
update_and_requeue(**kwargs) bool[source]

Atomically update fields and schedule retry_at for next worker tick. Returns True if the update was successful, False if the object was modified by another worker.

classmethod get_queue()[source]

Get the sorted and filtered QuerySet of objects that are ready for processing. Objects are ready if:

  • status is not in FINAL_STATES

  • retry_at is in the past (or now)

classmethod claim_for_worker(obj: archivebox.workers.models.BaseModelWithStateMachine, lock_seconds: int = 60) bool[source]

Atomically claim a due object for processing using retry_at as the lock.

Correct lifecycle for any state-machine-driven work item:

  1. Queue the item by setting retry_at <= now

  2. Exactly one owner claims it by moving retry_at into the future

  3. Only that owner may call .sm.tick() and perform side effects

  4. State-machine callbacks update retry_at again when the work completes, backs off, or is re-queued

The critical rule is that future retry_at values are already owned. Callers must never “steal” those future timestamps and start another copy of the same work. That is what prevents duplicate installs, hook runs, and other concurrent side effects.

Returns True if successfully claimed, False if another worker got it first or the object is not currently due.

claim_processing_lock(lock_seconds: int = 60) bool[source]

Claim this model instance immediately before executing one state-machine tick.

This helper is the safe entrypoint for any direct state-machine driver (workers, synchronous crawl dependency installers, one-off CLI helpers). Calling .sm.tick() without claiming first turns retry_at into “just a schedule” instead of the ownership lock it is meant to be.

Returns True only for the caller that successfully moved retry_at into the future. False means another process already owns the work item or it is not currently due.

tick_claimed(lock_seconds: int = 60) bool[source]

Claim ownership via retry_at and then execute exactly one .sm.tick().

Future maintainers should prefer this helper over calling .sm.tick() directly whenever there is any chance another process could see the same queued row. If this method returns False, someone else already owns the work and the caller must not run side effects for it.

ACTIVE_STATE() str[source]
INITIAL_STATE() str[source]
FINAL_STATES() list[str][source]
FINAL_OR_ACTIVE_STATES() list[str][source]
classmethod extend_choices(base_choices: type[django.db.models.TextChoices])[source]

Decorator to extend the base choices with extra choices, e.g.:

class MyModel(ModelWithStateMachine):

@ModelWithStateMachine.extend_choices(ModelWithStateMachine.StatusChoices)
class StatusChoices(models.TextChoices):
    SUCCEEDED = 'succeeded'
    FAILED = 'failed'
    SKIPPED = 'skipped'
classmethod StatusField(**kwargs) django.db.models.CharField[source]

Used on subclasses to extend/modify the status field with updated kwargs. e.g.:

class MyModel(ModelWithStateMachine): class StatusChoices(ModelWithStateMachine.StatusChoices): QUEUED = ‘queued’, ‘Queued’ STARTED = ‘started’, ‘Started’ SEALED = ‘sealed’, ‘Sealed’ BACKOFF = ‘backoff’, ‘Backoff’ FAILED = ‘failed’, ‘Failed’ SKIPPED = ‘skipped’, ‘Skipped’

status = ModelWithStateMachine.StatusField(choices=StatusChoices.choices, default=StatusChoices.QUEUED)
classmethod RetryAtField(**kwargs) django.db.models.DateTimeField[source]

Used on subclasses to extend/modify the retry_at field with updated kwargs. e.g.:

class MyModel(ModelWithStateMachine): retry_at = ModelWithStateMachine.RetryAtField(editable=False)

StateMachineClass() type[statemachine.StateMachine][source]

Get the StateMachine class for the given django Model that inherits from MachineMixin

class archivebox.workers.models.ModelWithStateMachine[source]

Bases: archivebox.workers.models.BaseModelWithStateMachine

StatusChoices[source]

None

status: django.db.models.CharField[source]

‘StatusField(…)’

retry_at: django.db.models.DateTimeField[source]

‘RetryAtField(…)’

state_machine_name: str | None[source]

None

state_field_name: str[source]

‘status’

state_machine_attr: str[source]

‘sm’

bind_events_as_methods: bool[source]

True

active_state[source]

None

retry_at_field_name: str[source]

‘retry_at’

class Meta[source]

Bases: archivebox.workers.models.BaseModelWithStateMachine

abstract[source]

True

class archivebox.workers.models.BaseStateMachine(obj, *args, **kwargs)[source]

Bases: statemachine.StateMachine

Base class for all ArchiveBox state machines.

Eliminates boilerplate init, repr, str methods that were duplicated across all 4 state machines (Snapshot, ArchiveResult, Crawl, Binary).

Subclasses must set model_attr_name to specify the attribute name (e.g., ‘snapshot’, ‘archiveresult’, ‘crawl’, ‘binary’).

Example usage: class SnapshotMachine(BaseStateMachine): model_attr_name = ‘snapshot’

    # States and transitions...
    queued = State(value=Snapshot.StatusChoices.QUEUED, initial=True)
    # ...

The model instance is accessible via self.{model_attr_name} (e.g., self.snapshot, self.archiveresult, etc.)

Initialization

model_attr_name: str[source]

‘obj’

__repr__() str[source]
__str__() str[source]