--- title: "Automating Youtube Archival" date: 2019-05-11 draft: false tags: ["youtube-dl", "automation"] author: "simon987" --- Google has been known to terminate entire Youtube channels without notice in the hope of staying advertiser-friendly. **58 million "problematic" videos were deleted from the platform in Q3 2018** [1](#sources). This week we are exploring various Youtube archival solutions that utilizes [youtube-dl](https://github.com/ytdl-org/youtube-dl/), a command-line program that extracts and download videos from web pages. {{< figure src="/ytdl/1.png" title="Channels removed, by removal reason">}} # Installation Install youtube-dl via pip to ensure that you have the latest version. Google often pushes updates that breaks youtube-dl so you always want to stay up to date. You might be able to install it via your distibution’s package manager but it’s often several versions behind. ``` pip install --upgrade youtube-dl youtube-dl --version ``` # Basic usage You can now use the youtube-dl command to download a single video, a playlist or a channel: ``` youtube-dl https://www.youtube.com/watch?v=XXXXXXXXXX youtube-dl https://www.youtube.com/channel/XXXXXXXXXX ``` You can use the same command to download videos from various websites.Youtube-dl is also a Python library that you can use in scripts: {{}} #!/usr/bin/python import youtube_dl yt_dl = youtube_dl.YoutubeDL() yt_dl.download("https://www.youtube.com/watch?v=XXXXXXXXXXXX") {{}} ## Command line arguments & Scripting This document is not a replacement for youtube-dl’s documentation, you can find the updated list of command-line arguments on its [Github page](https://github.com/ytdl-org/youtube-dl/). Below is a bash script that will download everything specified in `list.txt`, a text file with a youtube channel or video on each line. The script will save the URLs of the videos in archive.txt as it downloads them to speedup the subsequent executions. The `--write-info-json` and `--write-thumbnail` options ensures that we also download the video metadata such as the description and the title. {{}} youtube-dl -a list.txt -o '%(uploader)s/%(title)s-%(id)s.%(ext)s' \ --write-thumbnail\ -f "bestvideo[ext=webm]+bestaudio[ext=webm]/bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio/best"\ --write-info-json\ --geo-bypass\ --ignore-errors\ --download-archive archive.txt {{}} You can decide to run the script periodically or to schedule it with cron: ``` crontab -e ``` {{}} # Download new videos every 2 hours 0 */2 * * * /mnt/Archive/Youtube/archive_yt.sh {{}} ## Live streams While a cron job downloads all videos uploaded by a channel (if the uploader does not delete the video between executions), it does not handle live streams. Youtube-dl allows you to download live streams with the same command but you obviously have to start the execution during the stream. Below is a different approach that takes advantage of the Youtube email notification feature. This simple Python script reads your last 3 emails and searches for a Youtube link in the email body. It will immediately start downloading the video using the youtube-dl Python library. You can use this method to download uploaded videos as well as live streams. If you are using a Gmail account, you will need to generate an [App Password](https://security.google.com/settings/security/apppasswords) to allow the script to login. {{}} #!/usr/bin/python import imaplib import re import youtube_dl # Initialize the youtube-dl downloader, nooverwrites param will # skip videos that are already downloaded or # currently being downloaded yt_dl = youtube_dl.YoutubeDL(params={ "nooverwrites": True, "nopart": True, "format": "bestvideo[ext=webm]+bestaudio[ext=webm]/bestvideo[ext=mp4]+bestaudio[ext=m4a]/bestvideo+bestaudio/best" }) # This regex pattern matches Youtube video links YT_LINK = re.compile("Fv%3D([^%]*)%") mail = imaplib.IMAP4_SSL("imap.gmail.com") mail.login("username@gmail.com", "password") mail.list() mail.select("INBOX") # Fetch the last 3 emails _, data = mail.search(None, "ALL") last_emails = list(reversed(data[0].split()))[:3] for num in last_emails: _, data = mail.fetch(num, "(RFC822)") body = data[0][1].decode() # Check for pattern match match = YT_LINK.search(body) if match: url = "https://youtube.com/watch?v=" + match.group(1) yt_dl.download([url, ]) # immediately download {{}} ## Automatically upload to rclone remote To take advantage of cloud storage, you can setup [ytdlrc](http://github.com/bardisty/ytdlrc) to automatically move videos to an rclone remote as they are downloaded. This simple script is completely interchangeable with youtube-dl and can be set up on a machine with low disk space. The script uses your existing youtube-dl and rclone configuration and is ideal for setting up automatic Youtube archival on a cheap VPS. ## Archiving Metadata If you wish to save a video’s metadata without downloading the actual video, there are command-line utilities dedicated to this task. * [Youtube-MA](https://github.com/CorentinB/YouTube-MA) * [yt-mango](https://github.com/terorie/yt-mango) ---- # Sources * https://transparencyreport.google.com/youtube-policy/removals