mirror of
https://github.com/simon987/awesome-datahoarding
synced 2025-04-10 14:16:48 +00:00
Note: This is only a first draft/brainstorm. I will try to organize the list with more useful sections in the future Feel free to contribute!
Download/Scraping utilities
- Rclone: A command line program to sync files and directories to and from various cloud storage providers
- aria2: A lightweight multi-protocol & multi-source command-line download utility
- wget: Utility for non-interactive download of files from the Web (HTTP & FTP)
- curl: Tool and library for transferring data with URL syntax, supporting many protocols
- Youtube-DL: A command-line program to download videos from YouTube and a few hundred more sites
- annie: Youtube-DL alternative writtent in Golang
- wikiteam: set of tools for archiving wikis
- FicSave: Online fanfiction downloader
- FanFicFare: Tool for making eBooks from stories on fanfiction and other web sites
- yt-mango: Youtube metadata archiver
- Youtube-MA: Youtube metadata archiver
- CrowLeer: Powerful C++ web crawler based on libcurl
- floatplane_ripper: Script to rip all videos from https://floatplane.rip/
- grab-site: The archivist's web crawler: WARC output, dashboard for all crawls, dynamic ignore patterns
- dzi-dl: Deep Zoom Image Downloader
- iiif-dl: Command-line tile downloader/assembler for IIIF endpoints/manifests
- ChanThreadWatch: Saves threads from *chan-style boards and checks for updates until the thread dies
- Sonarr: PVR for Usenet and BitTorrent users
- Radarr: A fork of Sonarr to work with movies à la Couchpotato
- Jackett: API support for torrent trackers (works with Sonarr, Radarr and others)
- Sick-Beard: PVR for newsgroup users (with limited torrent support)
- Lidarr: Music collection manager for Usenet and BitTorrent users
- Mylar: An automated Comic Book downloader (cbr/cbz) for use with SABnzbd, NZBGet and torrents
- bazarr: Companion application to Sonarr and Radarr for downloading subtitles
- RipMe: RipMe is an album ripper for various websites. Runs on your computer. Requires Java 8.
- Instagram Scraper: Instagram-scraper is a command-line application written in Python that scrapes and downloads an instagram user's photos and videos. Use responsibly.
- gallery-dl: Fownload image galleries and collections from pixiv, exhentai, danbooru and more
- FlexGet: Multipurpose automation tool for content like torrents, nzbs, podcasts, comics, series, movies, etc
- PyInstaLive: Instagram live stream downloader.
- RedditDownloader: Scrapes Reddit to download media of your choice
- HTTrack: Download a website from the Internet to a local directory
- Heritrix: Extensible, web-scale, archival-quality web crawler
- wail: Web Archiving Integration Layer: One-Click User Instigated Preservation
- Collect: A server to collect & archive websites that also supports video downloads
Compression
- KGB Archiver: compression tool with unbelievable high compression rate
- peazip: File archiver utility
Network
- NetLimiter: Internet traffic control and monitoring tool for Windows
File systems
- NTFS drivers for MacOS
- httpdirfs: A filesystem which allows you to mount HTTP directory listings
- mergerfs: a featureful union filesystem
File conversion
- AAXtoMP3: convert AAX files to common MP3, M4A, M4B, flac and ogg formats through a basic bash script frontend to FFMPEG
- html2warc: Convert web resources to a single warc file
Utility Scripts
- rclone_dirsize: Get size of http directory listing with rclone
- youtube-dl_soundcloud: snippet for using youtube-dl to download soundcloud playlists
- rm_empty_subdir: Remove empty sub-directories on Windows
- void-cat-uploader: This script automatically uploads all files inside a directory to https://void.cat.
- Backblaze B2 sync backup script: Script to sync mutliple directories with Backblaze B2
- Misc download scripts: Scripts for downloading content from various websites
Content sharing
- opds: Easy to use, Open & Decentralized Content Distribution
- ipfs: Protocol and network designed to create a content-addressable, peer-to-peer method of storing and sharing hypermedia in a distributed file system
- h5ai: HTTP web server index for Apache httpd, lighttpd, nginx and Cherokee
Data curation
- DeepSort: AI powered image tagger backed by DeepDetect
- diskover: File system crawler, disk space usage, file search engine and file system analytics powered by Elasticsearch
- fucking-weeb: A library manager for animu (and TV shows, and whatever).
- Everything: Locate files and folders by name instantly (Windows)
- beets: Music library manager and MusicBrainz tagger
- picard: MusicBrainz tagger
- Calibre: Ebook manager
- WinDirStat: Disk usage statistics viewer and cleanup tool for Windows
- jdupes: Powerful duplicate file finder
- Mp3tag: Powerful and easy-to-use tool to edit metadata of audio files (Windows/Mac)
- FileBot: FileBot is the ultimate tool for organizing and renaming your Movies, TV Shows and Anime
- MediaInfo: Convenient unified display of the most relevant technical and tag data for video and audio files
- tree: 'tree' command for linux
- grepWin: A powerful and fast search tool using regular expressions (Windows)
- TeraCopy: Copy your files faster and more securely
- baobab: Graphical disk usage analyzer
- phockup: Media sorting tool to organize photos and videos from your camera
- MediaElch: Media manager for Kodi
APIs & Online tools
Hardware / Monitoring
- CrystalDiskInfo: A HDD/SSD utility software which supports a part of USB, Intel RAID and NVMe.
- smartmontools: Control and monitor storage systems using the (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks
- Hard Drive Sentinel: Multi-OS SSD and HDD monitoring and analysis software
Description