ArchiveBox Logo
latest
  • Intro
    • Quickstart
    • Overview
      • Key Features
      • Input formats
      • Output formats
      • Dependencies
      • Caveats
    • Background & Motivation
      • Comparison to Other Projects
        • User Interface & Intended Purpose
        • Private Local Archives vs Centralized Public Archives
        • Storage Requirements
      • Learn more
    • Documentation
      • Getting Started
      • Reference
      • More Info
    • ArchiveBox Development
      • Setup the dev environment
      • Common development tasks
        • Run the linters
        • Run the integration tests
        • Make migrations or enter a django shell
        • Build the docs, pip package, and docker image
        • Roll a release
  • Getting Started
    • Quickstart
      • 1. Set up ArchiveBox
      • 2. Get your list of URLs to archive
      • 3. Add your URLs to the archive
      • ✅ Done!
    • Install
      • Supported Systems
      • Dependencies
      • Automatic Setup
      • Manual Setup
        • 1. Install dependencies
          • macOS
          • Ubuntu/Debian
          • BSD
          • Install ArchiveBox using pip
          • Check that everything worked and the versions are high enough.
        • 2. Get your bookmark export file
        • 3. Run archivebox
        • Next Steps
      • Docker Setup
    • Docker
      • Overview
      • Docker Compose
        • Setup
        • Usage
        • Accessing the data
        • Configuration
      • Docker
        • Setup
        • Usage
        • Accessing the data
          • Using a bind folder
          • Using a named Docker data volume
        • Configuration
  • General
    • Usage
      • CLI Usage
        • Run ArchiveBox with configuration options
        • Import a single URL
        • Import a list of URLs from a txt file
        • Import list of links from browser history
      • UI Usage
      • Disk Layout
        • Large Archives
      • Python Shell Usage
      • Python API Usage
    • Configuration
      • General Settings
        • OUTPUT_DIR
        • OUTPUT_PERMISSIONS
        • ONLY_NEW
        • TIMEOUT
        • MEDIA_TIMEOUT
        • TEMPLATES_DIR
        • FOOTER_INFO
        • URL_BLACKLIST
      • Archive Method Toggles
        • SAVE_TITLE
        • SAVE_FAVICON
        • SAVE_WGET
        • SAVE_WARC
        • SAVE_PDF
        • SAVE_SCREENSHOT
        • SAVE_DOM
        • SAVE_SINGLEFILE
        • SAVE_READABILITY
        • SAVE_GIT
        • SAVE_MEDIA
        • SUBMIT_ARCHIVE_DOT_ORG
      • Archive Method Options
        • CHECK_SSL_VALIDITY
        • SAVE_WGET_REQUISITES
        • RESOLUTION
        • CURL_USER_AGENT
        • WGET_USER_AGENT
        • CHROME_USER_AGENT
        • GIT_DOMAINS
        • COOKIES_FILE
        • CHROME_USER_DATA_DIR
        • CHROME_HEADLESS
        • CHROME_SANDBOX
      • Shell Options
        • USE_COLOR
        • SHOW_PROGRESS
      • Dependency Options
        • CHROME_BINARY
        • WGET_BINARY
        • YOUTUBEDL_BINARY
        • GIT_BINARY
        • CURL_BINARY
        • SINGLEFILE_BINARY
        • READABILITY_BINARY
    • Troubleshooting
      • Installing
        • Python
        • Chromium/Google Chrome
        • Wget & Curl
      • Archiving
        • No links parsed from export file
        • Lots of skipped sites
        • Lots of errors
        • Lots of broken links from the index
        • Removing unwanted links from the index
      • Hosting the Archive
    • Security Overview
      • Usage Modes
        • Public Mode [Default]
    • IMPORTANT: Don’t use ArchiveBox for private archived content right now as we’re in the middle of resolving some security issues with how JS is executed in archived content.
      • ~~Private Mode~~
      • ~~Stealth Mode~~
      • Do not run as root
      • Output Folder
        • Permissions
        • Filesystem
        • Publishing
    • Publishing Your Archive
      • 1. Use the built-in webserver
      • 2. Export and host it as static HTML
      • Security Concerns
      • Copyright Concerns
    • Scheduled Archiving
      • Using Cron
      • Examples
        • Example: Import Firefox browser history every 24 hours
        • Example: Import an RSS feed from Pocket every 12 hours
    • Chromium Install
      • Installing Chromium
        • macOS
        • Ubuntu/Debian
      • Installing Google Chrome
        • macOS
        • Ubuntu/Debian
      • Troubleshooting
  • API Reference
    • Configuration Options
      • General Settings
        • OUTPUT_DIR
        • OUTPUT_PERMISSIONS
        • ONLY_NEW
        • TIMEOUT
        • MEDIA_TIMEOUT
        • TEMPLATES_DIR
        • FOOTER_INFO
        • URL_BLACKLIST
      • Archive Method Toggles
        • SAVE_TITLE
        • SAVE_FAVICON
        • SAVE_WGET
        • SAVE_WARC
        • SAVE_PDF
        • SAVE_SCREENSHOT
        • SAVE_DOM
        • SAVE_SINGLEFILE
        • SAVE_READABILITY
        • SAVE_GIT
        • SAVE_MEDIA
        • SUBMIT_ARCHIVE_DOT_ORG
      • Archive Method Options
        • CHECK_SSL_VALIDITY
        • SAVE_WGET_REQUISITES
        • RESOLUTION
        • CURL_USER_AGENT
        • WGET_USER_AGENT
        • CHROME_USER_AGENT
        • GIT_DOMAINS
        • COOKIES_FILE
        • CHROME_USER_DATA_DIR
        • CHROME_HEADLESS
        • CHROME_SANDBOX
      • Shell Options
        • USE_COLOR
        • SHOW_PROGRESS
      • Dependency Options
        • CHROME_BINARY
        • WGET_BINARY
        • YOUTUBEDL_BINARY
        • GIT_BINARY
        • CURL_BINARY
        • SINGLEFILE_BINARY
        • READABILITY_BINARY
    • Data Folder Layout
      • CLI Usage
        • Run ArchiveBox with configuration options
        • Import a single URL
        • Import a list of URLs from a txt file
        • Import list of links from browser history
      • UI Usage
      • Disk Layout
        • Large Archives
      • Python Shell Usage
      • Python API Usage
    • Command Line Interface
      • CLI Usage
        • Run ArchiveBox with configuration options
        • Import a single URL
        • Import a list of URLs from a txt file
        • Import list of links from browser history
      • UI Usage
      • Disk Layout
        • Large Archives
      • Python Shell Usage
      • Python API Usage
    • Web Interface
      • CLI Usage
        • Run ArchiveBox with configuration options
        • Import a single URL
        • Import a list of URLs from a txt file
        • Import list of links from browser history
      • UI Usage
      • Disk Layout
        • Large Archives
      • Python Shell Usage
      • Python API Usage
    • Python API
      • archivebox package
        • Subpackages
          • archivebox.cli package
          • archivebox.config package
          • archivebox.core package
          • archivebox.extractors package
          • archivebox.index package
          • archivebox.parsers package
        • Submodules
        • archivebox.main module
        • archivebox.manage module
        • archivebox.system module
        • archivebox.util module
        • Module contents
    • REST API
      • archivebox package
        • Subpackages
          • archivebox.cli package
          • archivebox.config package
          • archivebox.core package
          • archivebox.extractors package
          • archivebox.index package
          • archivebox.parsers package
        • Submodules
        • archivebox.main module
        • archivebox.manage module
        • archivebox.system module
        • archivebox.util module
        • Module contents
  • Meta
    • Roadmap
      • Planned Specification
        • v0.5: Remove live-updated JSON & HTML index in favor of archivebox export
        • v0.6: Code cleanup / refactor
        • v0.7: Schema improvements
        • v0.8: Security
        • v0.9: Performance
        • v1.0: Full headless browser control
        • v2.0 Federated or distributed archiving + paid hosted service offering
        • Major long-term changes
        • Smaller planned features
      • Past Releases
    • Changelog
    • Donations
    • Web Archiving Community
      • The Master Lists
      • Web Archiving Projects
        • Bookmarking Services
        • From the Archive.org & Archive-It teams
        • From the Rhizome.org/WebRecorder.io/Conifer team
        • From the Old Dominion University: Web Science Team
        • From the Archives Unleashed Team
        • From the IIPC team
        • Other Public Archiving Services
        • Other ArchiveBox Alternatives
        • Smaller Utilities
      • Reading List
        • Blogs
        • Articles
        • ArchiveBox-Specific Posts, Tutorials, and Guides
        • ArchiveBox Discussions in News & Social Media
      • Communities
        • Most Active Communities
        • Web Archiving Communities
        • General Archiving Foundations, Coalitions, Initiatives, and Institutes
ArchiveBox
  • Docs »
  • ArchiveBox
  • Edit on GitHub

Welcome to ArchiveBox!

Just getting started?
Check out the Quickstart guide.
Need help with something?
Ping us on Twitter or Github.
Want to join the community?
See our Community Wiki page.
ArchiveBox Logo

ArchiveBox¶

“The open-source self-hosted internet archive.”

Website | Github | Source | Bug Tracker

mkdir my-archive; cd my-archive/
pip install archivebox

archivebox init
archivebox add https://example.com
archivebox info

Documentation¶

  • Intro
    • Quickstart
    • Overview
    • Background & Motivation
    • Documentation
    • ArchiveBox Development
  • Getting Started
    • Quickstart
    • Install
    • Docker
  • General
    • Usage
    • Configuration
    • Troubleshooting
    • Security Overview
    • IMPORTANT: Don’t use ArchiveBox for private archived content right now as we’re in the middle of resolving some security issues with how JS is executed in archived content.
    • Publishing Your Archive
    • Scheduled Archiving
    • Chromium Install
  • API Reference
    • Configuration Options
    • Data Folder Layout
    • Command Line Interface
    • Web Interface
    • Python API
    • REST API
  • Meta
    • Roadmap
    • Changelog
    • Donations
    • Web Archiving Community
Next

© Copyright 2020, Nick Sweeting Revision 2c69b012.

Read the Docs v: latest
Versions
master
latest
v0.4.21
v0.4.20
v0.4.19
v0.4.9
v0.4.3
v0.4.2
v0.4.1
v0.4.0
v0.2.4
v0.2.3
v0.2.2
v0.2.1
v0.2.0
v0.1.0
v0.0.3
v0.0.2
v0.0.1
Downloads
pdf
html
epub
On Read the Docs
Project Home
Builds

Free document hosting provided by Read the Docs.