Publishing Your Archive¶
There are two ways to publish your archive: using the archivebox server
or by exporting and hosting it as static HTML.
1. Use the built-in webserver¶
# set the permissions depending on how public/locked down you want it to be
archivebox config --set PUBLIC_INDEX=True
archivebox config --set PUBLIC_SNAPSHOTS=True
archivebox config --set PUBLIC_ADD_VIEW=True
# create an admin username and password for yourself
archivebox manage createsuperuser
# then start the webserver and open the web UI in your browser
archivebox server 0.0.0.0:8000
open http://127.0.0.1:8000
This server is enabled out-of-the-box if you’re using docker-compose
to run ArchiveBox,
and there is a commented-out example nginx config with SSL set up as well.
2. Export and host it as static HTML¶
archivebox list --html --with-headers > index.html
archivebox list --json --with-headers > index.json
# then upload the entire output folder containing index.html and archive/ somewhere
# e.g. github pages or another static hosting provider
# you can also serve it with the simple python HTTP server
python3 -m http.server --bind 0.0.0.0 --directory . 8000
open http://127.0.0.1:8000
Here’s a sample nginx configuration that works to serve your static archive folder:
location / {
alias /path/to/your/ArchiveBox/data/;
index index.html;
autoindex on;
try_files $uri $uri/ =404;
}
Make sure you’re not running any content as CGI or PHP, you only want to serve static files!
Urls look like: https://archive.example.com/archive/1493350273/en.wikipedia.org/wiki/Dining_philosophers_problem.html
Security Concerns¶
Re-hosting other people’s content has security implications for any other sites sharing your hosting domain. Make sure you understand the dangers of hosting untrusted archived HTML/JS/CSS on a shared domain. Due to the security risk of serving some malicious JS you archived by accident, it’s best to put this on a domain or subdomain of its own to keep cookies separate and help limit the effectiveness of CSRF attacks and other nastiness.
Copyright Concerns¶
Be aware that some sites you archive may not allow you to rehost their content publicly for copyright reasons, it’s up to you to host responsibly and respond to takedown requests appropriately.
You may also want to blacklist your archive in /robots.txt
if you don’t want to be publicly assosciated with all the links you archive via search engine results.
Please modify the FOOTER_INFO
config variable to add your contact info to the footer of your index.