From 021991ea93549ad1969dea96c5a4d127fbed2512 Mon Sep 17 00:00:00 2001 From: tom42 Date: Wed, 31 Aug 2022 23:54:52 -0400 Subject: [PATCH] Update README.md --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 3106625..3832ec1 100644 --- a/README.md +++ b/README.md @@ -30,6 +30,7 @@ Feel free to contribute! ### Web Archiving * [ArchiveBox](https://github.com/pirate/ArchiveBox): The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more... +* [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler): Browsertrix Crawler is a simplified (Chrome) browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container * [Collect](https://github.com/xarantolus/Collect): A server to collect & archive websites that also supports video downloads * [grab-site](https://github.com/ludios/grab-site): The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns * [Heritrix](https://github.com/internetarchive/heritrix3): Extensible, web-scale, archival-quality web crawler