BlueySearch

how to build and deploy a basic data science backed web app in R

Dean Marchiori

deanmarchiori.rbind.io

2022-11-24

The Problem

“Dad, put on the Bluey where Bingo is in hospital and they talk on the video thing” - my 3 year old

Solution

A website that lets you type in the vague descriptions of a small child and it will return a mathematically ranked list of closest matching Bluey episodes.

BlueySearch App

What this talk is really about

  • Doing something basic but interesting in R
  • Turning it into a web app
  • Deploying to a cloud service
  • BONUS ROUND: Deploying with Docker

What is R?

  • R is a free software environment for statistical computing and graphics. 1
  • R ranks 12th in the TIOBE index (Oct 22)
  • https://cran.r-project.org/

Comparing text / documents

statement text
1 Wollongong is a cool place to live
2 Wollongong has some cool beaches
3 I like the beaches in Sydney
            Terms
Docs         cool live wollongong beaches sydney
  statement1    1    1          1       0      0
  statement2    1    0          1       1      0
  statement3    0    0          0       1      1

Cosine similarity

\[ \frac{\bf{A \cdot B }}{\lVert A \rVert \lVert B \rVert} = \frac{\sum_{i=1}^{n}A_iB_i}{\sqrt{\sum_{i=1}^{n}A_i^2} \sqrt{\sum_{i=1}^{n}B_i^2}} \]

source: https://deepai.org/machine-learning-glossary-and-terms/cosine-similarity

Comparing text / documents

Document-Term matrix

            Terms
Docs         cool live wollongong beaches sydney
  statement1    1    1          1       0      0
  statement2    1    0          1       1      0
  statement3    0    0          0       1      1

\[ \bf{A} = [1, 1, 1, 0, 0] \\ \bf{B}= [1, 0, 1, 1, 0] \]

\[ \frac{\bf{A \cdot B }}{\lVert A \rVert \lVert B \rVert} = \frac{\sum_{i=1}^{n}A_iB_i}{\sqrt{\sum_{i=1}^{n}A_i^2} \sqrt{\sum_{i=1}^{n}B_i^2}} \]

Distance matrix

           statement1 statement2 statement3
statement1         NA                      
statement2  0.6666667         NA           
statement3  0.0000000  0.4082483         NA

The data

  • IMDB
title description
Magic Xylophone As Bluey and Bingo begin to squabble over their magic xylophone (that has the power to freeze their Dad in space and time) Dad seizes control and freezes Bluey, leaving Bingo as her only hope.
Hospital Doctor Bluey is needed for assistance when Dad gets a very curious x-ray from Nurse Bingo. It seems that he has a cat in his tummy, leaving Bluey no choice but for her to operate immediately.
[1] 141   2

Wranging text data

episodes %>% 
  mutate(description = paste(title, description)) %>% 
  tidytext::unnest_tokens(word, description)  
# A tibble: 4,783 × 2
   title           word     
   <chr>           <chr>    
 1 Magic Xylophone magic    
 2 Magic Xylophone xylophone
 3 Magic Xylophone as       
 4 Magic Xylophone bluey    
 5 Magic Xylophone and      
 6 Magic Xylophone bingo    
 7 Magic Xylophone begin    
 8 Magic Xylophone to       
 9 Magic Xylophone squabble 
10 Magic Xylophone over     
# … with 4,773 more rows

Remove Stop Words (as, to, a, the, …)

episodes %>% 
  mutate(description = paste(title, description)) %>% 
  tidytext::unnest_tokens(word, description) %>% 
  anti_join(stop_words) 
# A tibble: 2,131 × 2
   title           word     
   <chr>           <chr>    
 1 Magic Xylophone magic    
 2 Magic Xylophone xylophone
 3 Magic Xylophone bluey    
 4 Magic Xylophone bingo    
 5 Magic Xylophone begin    
 6 Magic Xylophone squabble 
 7 Magic Xylophone magic    
 8 Magic Xylophone xylophone
 9 Magic Xylophone power    
10 Magic Xylophone freeze   
# … with 2,121 more rows

Document-Term Matrix

                                  Terms
Docs                               dog army stop journey hide hides bored ups
  Army                               1    1    0       0    0     0     0   0
  Bumpy and the Wise Old Wolfhound   1    0    0       0    0     0     0   0
  Sheep Dog                          1    0    1       0    0     0     0   0
  Shops                              0    0    0       0    0     0     0   0
  Sleepytime                         0    0    1       0    0     0     0   0
  Smoochy Kiss                       0    0    0       0    1     0     0   0
  Space                              0    0    0       1    0     0     0   0
  Spy Game                           0    0    0       0    0     0     0   1
[1]  141 1023

End to End

What is Shiny?

“Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions.”

source: https://shiny.rstudio.com/

Inside Shiny

library(shiny)

# Define UI for application that draws a histogram
ui <- fluidPage(

    # Application title
    titlePanel("Old Faithful Geyser Data"),

    # Sidebar with a slider input for number of bins 
    sidebarLayout(
        sidebarPanel(
            sliderInput("bins",
                        "Number of bins:",
                        min = 1,
                        max = 50,
                        value = 30)
        ),

        # Show a plot of the generated distribution
        mainPanel(
           plotOutput("distPlot")
        )
    )
)

# Define server logic required to draw a histogram
server <- function(input, output) {

    output$distPlot <- renderPlot({
        # generate bins based on input$bins from ui.R
        x    <- faithful[, 2]
        bins <- seq(min(x), max(x), length.out = input$bins + 1)

        # draw the histogram with the specified number of bins
        hist(x, breaks = bins, col = 'darkgray', border = 'white',
             xlab = 'Waiting time to next eruption (in mins)',
             main = 'Histogram of waiting times')
    })
}

# Run the application 
shinyApp(ui = ui, server = server)

Deploying to shinyapps.io

Live demo?

Dockerizing it

.
├── Dockerfile
├── siligong-app
   └── app.R
   └── data.csv
FROM rocker/shiny:latest

RUN R -q -e 'install.packages("glue")'

COPY /siligong-app /srv/shiny-server/siligong-app

EXPOSE 3838

CMD ["/usr/bin/shiny-server"]

Dockerizing it

docker build -t siligong .
docker run --rm -d -p 3838:3838 siligong

http://localhost:3838/siligong-app/

Cheers, any questions?