Using Markov Chains to Generate Podcast Titles

3 minute read

Published:

One of my favourite podcasts is TOFOP. It’s a weekly Australian comedy podcast hosted by Wil Anderson and Charlie Clausen. According to Wikipedia: common features are humorous personal anecdotes, and detailed discussions on bizarre hypothetical situations.

In the 200th episode Charlie remarks that they could get a mathematician to create an algorithm to generate an “A.I. TOFOP”.

I couldn’t find any TOFOP transcripts, but I did find the episode names and descriptions on Wikipedia for the first ~130 episodes.

I decided to program a markov chain algorithm to generate random TOFOP episodes.

Markov Chains are a type of model that describe a sequence of events. By feeding in a sequential vector of words used in podcast descriptions, given initial state, the model can stochastically generate the next word and so on, ultimately constructing a simulated podcast description.

Setup

I use the markovchain package in R and the tidytext package for text manipulation.

The Data

The data was manually scraped from https://en.wikipedia.org/wiki/TOFOP. If you want to reproduce, I have saved it on my github.

Here is a preview of the data:

glimpse(tofop)
## Observations: 131
## Variables: 2
## $ title       <chr> "Super-piss", "Donuts & Clutch", "Richard Greico U...
## $ description <chr> "Wil and Charlie discuss the vagaries of fan mail,...

And here is an example of a typical episode title and description:

paste(tofop[1,1], tofop[1,2], sep = ": ")
## [1] "Super-piss: Wil and Charlie discuss the vagaries of fan mail, the scariness of hillbillies and the redundancy of The Flash."

I decided to separate the titles from the descriptions and model them separately.

# Handling Titles
all_titles <- tofop %>% 
  select(title) %>% 
  mutate(title = str_replace_all(title, "[[:punct:]]", "")) 

title_vector <- unlist(strsplit(all_titles$title, ' '))

# Handling Descriptions
all_descriptions <- tofop %>% 
  select(description) %>% 
  mutate(description = str_replace_all(description, "[[:punct:]]", ""),
         description = str_replace_all(description, "Wil and Charlie", "")) 

description_vector <- unlist(strsplit(all_descriptions$description, ' '))

Model Fitting

In order to get a final result, I set up the function as follows:

  • Read in title and description vectors
  • Specify the length of the title and the description to be generated by the markov chain
  • Provide the initial state of the description to start building on
  • Paste it together with some formatting
random_tofop <- function(titles, descriptions, title_length, descr_length, initial){
  
  # fit title model
  title_fit <- markovchainFit(titles)
  title_gen <- markovchainSequence(n = title_length, markovchain = title_fit$estimate)
  
  # fit the description model
  descriptions_fit <- markovchainFit(descriptions)
  desc_gen <- markovchainSequence(include.t0 = FALSE, t0 = initial, n = descr_length, markovchain = descriptions_fit$estimate)
  
  # random episode number  
  ep_num <- sample.int(1000, 1)
  
  # output  
  cat("Episode", ep_num, "- ", title_gen, ":", "Wil and Charlie discuss", desc_gen)
}

Random TOFOP

Here are some results:

random_tofop(titles = title_vector, descriptions = description_vector, title_length = 3, descr_length = 15, initial = "discuss")
## Episode 615 -  Superpod 2 This : Wil and Charlie discuss the strange appeal of hot jam donuts question Supermans alone time  discuss Wils Terminatorlike  
## Episode 368 -  Had A Bartman : Wil and Charlie discuss how everything is better than plot of Drovers Run, chart the supernatural properties of Robbo The Traipse and the Wil  
## Episode 831 -  Damme Love : Wil and Charlie discuss the brand of being hid under the same crap as a global fist fight and Greg Behrendt

References

The method I used and some other good posts are outlined below:

https://en.wikipedia.org/wiki/Markov_chain

https://en.wikipedia.org/wiki/TOFOP

https://www.kaggle.com/rtatman/markov-chain-romance-title-generator-in-r

https://makingnoiseandhearingthings.com/2016/10/05/can-a-computer-write-my-blog-posts/

https://itsalocke.com/blog/how-to-maraaverickfy-a-blog-post-without-even-reading-it/

http://techeffigytutorials.blogspot.com.au/2015/01/markov-chains-explained.html