Scheduled Archiving

Using Cron

To schedule regular archiving you can use any task scheduler like cron, at, sytsemd, etc.

ArchiveBox ignores links that are imported multiple times (keeping the earliest version that it’s seen). This means you can add cron jobs that regularly poll the same file or URL for new links, adding only new ones as necessary.

For some example configs, see the etc/cron.d and etc/supervisord folders.

Examples

Example: Import Firefox browser history every 24 hours

This example exports your browser history and archives it once a day:

Create /opt/ArchiveBox/bin/firefox_custom.sh:

#!/bin/bash

cd /opt/ArchiveBox
./bin/archivebox-export-browser-history --firefox ./output/sources/firefox_history.json
archivebox add < ./output/sources/firefox_history.json  >> /var/log/ArchiveBox.log

Then create a new file /etc/cron.d/ArchiveBox-Firefox to tell cron to run your script every 24 hours:

0 24 * * * www-data /opt/ArchiveBox/bin/firefox_custom.sh

Example: Import an RSS feed from Pocket every 12 hours

This example imports your Pocket bookmark feed and archives any new links once a day:

First, set your Pocket RSS feed to “public” under https://getpocket.com/privacy_controls.

Create /opt/ArchiveBox/bin/pocket_custom.sh:

#!/bin/bash

cd /opt/ArchiveBox
curl https://getpocket.com/users/yourusernamegoeshere/feed/all | archivebox add  >> /var/log/ArchiveBox.log

Then create a new file /etc/cron.d/ArchiveBox-Pocket to tell cron to run your script every 12 hours:

0 12 * * * www-data /opt/ArchiveBox/bin/pocket_custom.sh