In January, I developed something I termed the “Brain-rot Bot”.
Rationale
The whole thing started when I was on YouTube Shorts and I kept seeing Minecraft parkour videos with an AI text-to-speech voice reading out a story from Reddit’s r/AmITheAsshole.1
I thought to myself, “These videos are so low quality, I could probably make a program that makes a video like this automatically, from start to finish.” So I set out to do just that. I also wanted to find a use for my Raspberry Pi which I had left sitting at home with ssh
exposed.
The beginning
The first thing to figure out was how to scrape the Reddit posts. I thought I might have to deal with their API, but reading simple post data can be done by inserting .json
after the URL: https://reddit.com/r/AmITheAsshole/.json
Great! A bit of JSON parsing later and I can access post data. Irrelevant posts such as updates needed to be filtered, and they needed to be within a length that could be read out within 1 minute, so I also filtered posts that were too long. Additionally, I expanded acronyms that were confusing and censored profanity and undesirable words. Finally, I kept track of each post that I had used to make sure it wouldn’t be used again.
All this was done with a basic Python script. The next step was to feed this data into a text-to-speech (TTS) program.
The middle
Finding a high quality TTS program that could run on a Raspberry Pi 3 seemed like it was going to be very difficult. After some looking around, I came across piper. It was exactly what I needed! So I could now get a pretty good text-to-speech audio of the text. It was starting to look like the project was really going to be possible.
Next step was to attack video. I downloaded a few random Minecraft parkour gameplay videos from YouTube, and cropped them to the 9:16 ratio needed for Shorts. I wrote an ffmpeg script that would merge the video and the audio together. This was actually a great exercise and taught me a lot about ffmpeg and video encoding. For some reason, I decided to use bash, which made everything infinitely more painful that it probably had to be, but I got it working eventually.
It was still missing something: the subtitles. All the videos have these subtitles that appear on screen when each word is spoken. I knew about OpenAI Whisper, but didn’t think it could be run on a Pi. That was until I found whisper.cpp, a high performance rewrite of OpenAI Whisper that supports Raspberry Pi!
Both piper and whisper.cpp really impressed me with their performance on the Pi. After optimizing the ffmpeg script to use the h264_v4l2m2m
encoder, I was also able to render the subtitles (solely using ffmpeg!) onto the video at a rapid pace.
The end
Now that the actual video file was finished, in an average time of roughly 6 minutes depending on the Reddit post length, it was time to upload it. I followed some guides on how to get approval for the YouTube API and used youtube-upload to upload the videos.
I implemented logging, fixed a few bugs, and started a cronjob to run the script every 4 hours before going to sleep. I awoke to see 2 videos uploaded to the YouTube channel I created (which I have since deleted). Success.
Although the speed of the whole thing was enough that I could have uploaded hundreds of videos a day, I didn’t do so for a few reasons, but mainly because I didn’t want to contribute to the enshittification of Shorts. I uploaded about 14 videos, 2 per day for a week, before deleting the channel and putting an end to it.
The moral of the story
Most short form content is of such poor quality that it can be generated in just over 6 minutes by a fully automated script written by a first-year uni student.
Interested in the code for this project? Have a comment? Contact me through my socials or my contact form.
Footnotes
-
I’ve since disabled YouTube Shorts. ↩