Publishing Your Archive

There are two ways to publish your archive: using the archivebox server or by exporting and hosting it as static HTML.


1. Use the built-in web server

# set the permissions depending on how public/locked down you want it to be
archivebox config --set PUBLIC_INDEX=True
archivebox config --set PUBLIC_SNAPSHOTS=True
archivebox config --set PUBLIC_ADD_VIEW=True

# create an admin username and password for yourself
archivebox manage createsuperuser

# then start the webserver and open the web UI in your browser
archivebox server 0.0.0.0:8000
open http://127.0.0.1:8000

This server is enabled out-of-the-box if you’re using docker-compose to run ArchiveBox, and there is a commented-out example nginx config with SSL set up as well. If hosting publicly, it’s essential to place an SSL termination server in front of ArchiveBox (e.g. traefik, caddy, or cloudflared),

[!TIP] Advanced: You can use nginx to serve the static /archive/ dir directly from the filesystem to increase performance.
To protect the /admin/ dashboard, it should ideally be served from a different domain using redirects.


2. Export and host it as static HTML

archivebox list --html --with-headers > index.html
archivebox list --json --with-headers > index.json

# then upload the entire output folder containing index.html and archive/ somewhere
# e.g. github pages or another static hosting provider

# you can also serve it with the simple python HTTP server
python3 -m http.server --bind 0.0.0.0 --directory . 8000
open http://127.0.0.1:8000

Here’s a sample nginx configuration that works to serve your static archive folder:

location / {
    alias       /path/to/your/ArchiveBox/data/;
    index       index.html;
    autoindex   on;
    try_files   $uri $uri/ =404;
}

Make sure you’re not running any content as CGI or PHP, you only want to serve static files!

Urls look like: https://demo.archivebox.io/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html




Security Concerns

[!CAUTION] Re-hosting untrusted archived content on a domain can potentially compromise all apps on that domain!
(including other subdomains)

Make sure you thoroughly understand the dangers of hosting untrusted HTML/JS/CSS that may be captured during archiving, and how viewing it can enable CSRF attacks across all apps on the same domain. If a logged-in user happens to visit an archived page with malicious Javascript embedded, it would allow the JS to hijack any cookies on the domain and pretend to be them, potentially exfiltrating or modifying other Snapshots/data on your server.

(This is why we don’t support serving ArchiveBox from a subdirectory like myapps.example.com/archivebox/, it’s too dangerous to share domains)

The industry standard approach is to use a separate domain for untrusted content, for example Github uses githubusercontent.com and Google uses googleusercontent.com for all user-uploaded files. If hosting ArchiveBox publicly, do the same and keep it on an isolated domain in order to mitigate potential damage of leaked cookies, CORS, and CSRF attack.

Protecting the Admin Dashboard

To protect the Admin dashboard, it’s also recommended to serve all content under /archive/ on a separate domain from /admin/. We do this on our servers using a simple redirect rule in nginx/cloudflare like so:

Cloudflare redirect rule for /archive/ to another domain

Note: This is still recommended, but less critical if your /archive/ folder does not contain any archived JS (e.g. if you set SAVE_WGET=False and SAVE_DOM=False).

More info: