mirror of
https://github.com/simon987/awesome-datahoarding
synced 2025-04-22 03:36:45 +00:00
Compare commits
No commits in common. "81c5306822533b29c02d618f57b4c8a5eec46b77" and "3cbf8a35845ad5d4711b9c8d7cb96892a27501fa" have entirely different histories.
81c5306822
...
3cbf8a3584
@ -30,7 +30,6 @@ Feel free to contribute!
|
||||
|
||||
### Web Archiving
|
||||
* [ArchiveBox](https://github.com/pirate/ArchiveBox): The open source self-hosted web archive. Takes browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...
|
||||
* [Browsertrix Crawler](https://github.com/webrecorder/browsertrix-crawler): Browsertrix Crawler is a simplified (Chrome) browser-based high-fidelity crawling system, designed to run a complex, customizable browser-based crawl in a single Docker container
|
||||
* [Collect](https://github.com/xarantolus/Collect): A server to collect & archive websites that also supports video downloads
|
||||
* [grab-site](https://github.com/ludios/grab-site): The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
|
||||
* [Heritrix](https://github.com/internetarchive/heritrix3): Extensible, web-scale, archival-quality web crawler
|
||||
|
Loading…
x
Reference in New Issue
Block a user