Publishing Your Archive

There are two ways to publish your archive: using the archivebox server or by exporting and hosting it as static HTML.

1. Use the built-in webserver

# set the permissions depending on how public/locked down you want it to be
archivebox config --set PUBLIC_INDEX=True
archivebox config --set PUBLIC_SNAPSHOTS=True
archivebox config --set PUBLIC_ADD_VIEW=True

# create an admin username and password for yourself
archivebox manage createsuperuser

# then start the webserver and open the web UI in your browser
archivebox server 0.0.0.0:8000
open http://127.0.0.1:8000

This server is enabled out-of-the-box if you’re using docker-compose to run ArchiveBox, and there is a commented-out example nginx config with SSL set up as well.

2. Export and host it as static HTML

archivebox list --html --with-headers > index.html
archivebox list --json --with-headers > index.json

# then upload the entire output folder containing index.html and archive/ somewhere
# e.g. github pages or another static hosting provider

# you can also serve it with the simple python HTTP server
python3 -m http.server --bind 0.0.0.0 --directory . 8000
open http://127.0.0.1:8000

Here’s a sample nginx configuration that works to serve your static archive folder:

location / {
    alias       /path/to/your/ArchiveBox/data/;
    index       index.html;
    autoindex   on;
    try_files   $uri $uri/ =404;
}

Make sure you’re not running any content as CGI or PHP, you only want to serve static files!

Urls look like: https://archive.example.com/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html


Security Concerns

Re-hosting other people’s content has security implications for any other sites sharing your hosting domain. Make sure you understand the dangers of hosting untrusted archived HTML/JS/CSS on a shared domain. Due to the security risk of serving some malicious JS you archived by accident, it’s best to put this on a domain or subdomain of its own to keep cookies separate and help limit the effectiveness of CSRF attacks and other nastiness.

More info:

  • https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview

  • https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#publishing

  • https://github.com/ArchiveBox/ArchiveBox/wiki/Security-Overview#%EF%B8%8F-things-to-watch-out-for-%EF%B8%8F