Latest Posts

2019

2018

Installing RStudio Server on AWS

2 minute read

Published:

Having Rstudio running in-browser, where ever you go, with completely scalable performance is an attractive option.

Relationships in Data Analysis

1 minute read

Published:

Recently, Roger Peng wrote about Relationships in Data Analysis on Simply Statistics. It was a good read and focused on the human side of data analysis. Critical to the success of any analysis project is understanding the nature of the relationships involved. Peng goes on to explore the implications of when roles are combined, and how this can impact the outcome of an analysis. Consider the person sponsoring an analysis also being the audience, how would the communication of this analysis differ is the audience were not the patrons, but were instead deep subject matter experts?

Using Markov Chains to Generate Podcast Titles

3 minute read

Published:

One of my favourite podcasts is TOFOP. It’s a weekly Australian comedy podcast hosted by Wil Anderson and Charlie Clausen. According to Wikipedia: common features are humorous personal anecdotes, and detailed discussions on bizarre hypothetical situations.

Analysing The Simpsons using Text Analysis

12 minute read

Published:

How do the characters of The Simpsons interact? What type of language do they use? Has this changed much since the first episode in 1989?

How predictive is my predictive model?

4 minute read

Published:

Binary classification models are very common. The outcome is typically to classify records into two groups e.g. Sick/Healthy, Will Buy/Wont Buy, Hot dog/Not hot dog.

Iterative Data Visualisation of the Global Terrorism Database

14 minute read

Published:

Many data visualisations are presented as a final polished version which give little insight into how they were created. In this post I will try to articulate my thought process, mistakes and notes along the way to convey a realistic and iterative workflow to produce a plot.

Creating Heatmap Tiles in R

3 minute read

Published:

This demonstrates a basic data visualisation technique in ggplot2 that gets used a bit, but always gets a good response when used in business facing presentations. I think its the intuitive calendar feel and use of great palettes like viridis. It’s really flexible and beats doing a line graph or similar, but can be tricky to remember how to do it.