Latest Posts

2019

2018

Surf Check - Automating Weather Forecast Emails in R

5 minute read

Published:

A non-R related pursuit of mine is surf photography, which is highly weather dependent. Often, existing weather forecasts just don’t have the required detail or are too hard to search out.

Installing RStudio Server on AWS

2 minute read

Published:

Having Rstudio running in-browser, where ever you go, with completely scalable performance is an attractive option.

LOESS (Local Polynomial Regression)

3 minute read

Published:

When conducting exploratory data analysis, scatter plots are commonly used tools to visualise a relationship between two variables.

Relationships in Data Analysis

1 minute read

Published:

Recently, Roger Peng wrote about Relationships in Data Analysis on Simply Statistics. It was a good read and focused on the human side of data analysis. Critical to the success of any analysis project is understanding the nature of the relationships involved. Peng goes on to explore the implications of when roles are combined, and how this can impact the outcome of an analysis. Consider the person sponsoring an analysis also being the audience, how would the communication of this analysis differ is the audience were not the patrons, but were instead deep subject matter experts?

Random Podcast Episode Generator

3 minute read

Published:

One of my favourite podcasts is TOFOP. It’s a weekly Australian comedy podcast hosted by Wil Anderson and Charlie Clausen. According to Wikipedia: common features are humorous personal anecdotes, and detailed discussions on bizarre hypothetical situations.

Simpsons Character Text Analysis

12 minute read

Published:

How do the characters of The Simpsons interact? What type of language do they use? Has this changed much since the first episode in 1989?

How good is my model?

4 minute read

Published:

Binary classification models are very common. The outcome is typically to classify records into two groups e.g. Sick/Healthy, Will Buy/Wont Buy, Hot dog/Not hot dog.

Global Terrorism Database

14 minute read

Published:

Many data visualisations are presented as a final polished version which give little insight into how they were created. In this post I will try to articulate my thought process, mistakes and notes along the way to convey a realistic and iterative workflow to produce a plot.

Data Science Teams

3 minute read

Published:

Structuring a data science team for success is a highly personal matter. Huge differences exist in the type of business, level of analytical maturity, size, existing structures etc. Despite this, there are a number of good articles which try to provide generalizable advise in structuring a data science team for success.

Creating Heatmap Tiles in R

3 minute read

Published:

This demonstrates a basic data visualisation technique in ggplot2 that gets used a bit, but always gets a good response when used in business facing presentations. I think its the intuitive calendar feel and use of great palettes like viridis. It’s really flexible and beats doing a line graph or similar, but can be tricky to remember how to do it.