My Podcast Workflow Pt1

Read this in 7 minutes

In May, after fiddling with some AWS SDKs, I tweeted this joke that I thought was kinda far-fetched:

Imagine if I brought all my podcast publishing workflow in house & everything was automated?
Upload podcast -> AWS transcribe -> create post with podcast & transcription -> update RSS -> schedule tweets/linkedin posts on publish pic.twitter.com/f4rQ3omzdb
— lola. (@lolaodelola) May 6, 2019

But by the time I had tweeted it, I had convinced myself to actually do it. The thing is, I want to have ultimate control over my content and do that for as cheaply as possible, so SoundCloud wasn’t really it.

The nice thing about this is, just by tweeting, I had already broken down the tasks into bite-sized chunks:

Upload the podcast to a host (I went with S3)
Transcribe the podcast (with AWS Transcribe)
Create a post with the transcription (I decided on Middleman)
Add to RSS feed
Schedule social posts (which I didn’t do and probably won’t)

Looking at this list, the only thought I had was “why not?”, it’s straight forward enough and I’m familiar with all the tools I need so nothing’s really stopping me. So, I began.

I did first attempt to build this in Rails but I felt it was too much for what I needed, essentially I just needed something that I could create blog posts with and considering how many blog generating tools there are out there, Rails was overkill and ActionText, while an exciting addition to Rails, isn’t where I needed it to be just yet.

Next, I tried Gatsby in an effort to give React a chance and learn it in my own time, on my own terms, I was trying to be fair, you know? I don’t like saying “never”, but never again. I ended up fighting with a dependency that refused to compile consistently across environments, sometimes it was fine and other times it wasn’t and even after speaking to the maintainers, I couldn’t figure out what was up. To be fair, this isn’t React’s fault at all, it’s just a shame that whenever I try to pick up React to do something really simple, I get a headache.

So, I landed on Middleman a static site generator built in Ruby. Since it’s a static site generator, it doesn’t have a server, so I can’t do anything that will need a server e.g. show dynamic content, collect and save form data etc. I can just create static content that doesn’t change, and honestly, that’s all I really need. Middleman has scope for an XML/RSS feed builder too.

I knew I was going to host the episodes on S3 but before this project, I had never really thought of it as a data store, in the way Redis or Postgres are data stores. I thought of it more as a bucket of stuff that’s hosted on Amazon’s servers. But S3 is a data store, so you can upload objects with key-value pairs and retrieve the object’s value with it’s key. Essentially, on the most basic of basic levels, S3 buckets are similar to Ruby hashes:

# A bucket is created with objects in it
 podcast = {
    "ep_1" => "ep_1.mp3",
    "ep_2" => "ep_2.mp3",
    "ep_3" => "ep_3.mp3"
}

# An object's value is retirived by accessing its key
puts podcast["ep_1"]
=> "ep_1.mp3"

Not only that but the AWS S3 Ruby SDK also allows me to upload metadata with each object so I can also include things like duration, content_type, date, etc.

Choosing what service to use for transcription was probably the easiest part of this. I had fiddled with both Google’s and AWS’s transcription SDKs for another project, so I had ample hands-on testing time. I prefer Google’s literally only because it looks and feels better to use but their SDK is still in beta and it’s ill-advised to use beta software in a production setting. Also, AWS’ Transcribe SDK works so well with S3 already out of the box. I can give it an S3 object, it’ll transcribe it and put the transcription in the same or another S3 bucket. Having things in the same ecosystem is ideal, it means I don’t have to wonder too much about where any objects are and retrieving/sending objects is more of a streamlined process.

The last part of this process which I hadn’t really considered in my initial scoping was analytics. SoundCloud provides a really nice aggregation of plays and downloads over a specified time range. Analytics is a tricky one because getting accurate readings is difficult. Google Analytics relies on JavaScript, tracking pixels and cookies and is pretty unreliable for what I want. A lot of my readers/listener seem to disable JavaScript and for people that listen on anywhere other than my site, they’re plays would be lost. I feed my site through Cloudflare to take advantage of CNAME flattening for the domain and Cloudflare gives me decent enough server analytics for free. I can’t see what exact pages are hit but the numbers are the same as what Netlify analytics gave when I was paying for that. However, not being able to see what pages are hit is a big deal when you’re trying to see individual episode play counts. I considered AWS’ Cloudfront as that gives a little more detail however, I couldn’t create a distribution with them without emailing. The issue with both Clodflare and Cloudfront is that their analytics didn’t go back far enough, at most I could go back 60 days. After some desperate searching I found Podtrac which is a simple way to get accurate download/play count for free and isn’t restricted to a few months of analytics. I didn’t need a server and the code I added was minimal, so perfect!

The only part of this process I won’t be doing is the social media scheduling. In order for it to be worth my time, I need to be able to batch schedule posts at a self-defined interval which none of the APIs I looked at can really do. I may as well just go into Hootsuite or Tweetdeck myself and schedule individual posts.

Also, shout out to Pry (for debugging) and dotenv (for reading environment variables).

In part 2, I’ll go into detail about the actual code that does this, because I’m not creating my own server or database as I would with a Rails project, I’ve had to think creatively about how things are stored, retrieved and passed on. Instead of having controllers and models, this code lives in a script that I manually run. It’s a simple as running ruby my_podcast_script.rb <my_audio_file> <my_audio_name> from my terminal and everything kicks off.