This page lists some projects I’ve enjoyed working on in the past few years.

I’m a data scientist and in my own time I enjoy working on data science related projects in football ( soccer. ) I mostly use R and sometimes D3js and Python, along with some other tools.

I’d love to take up some projects in professional football! If you’re interested in collaborating, using some of this code, or have an idea or project in mind that you’d like my help on, feel free to connect with me on mail - mail dot thecomeonman at gmail dot com, or find me on Twitter or e-mail me at mail dot thecomeonman at gmail dot com. Interesting non-football data science projects are also welcome!

Coda Bonito

Plot sonars Performance lanes Pitch control
xG build up Football pitch Radars

An R library that has some football analysis and visualisation helper functions.

POV - a ggplot friendly 3D manipulation library

An R library with low level code which can be used to generate visuals like this -

Another video of Thiago’s passing

Aguero’s EPL career

Making friends with tracking data

A demo of some tracking data related helper functions. Most of the underlying code is available in my CodaBonito R library.

Done in R. Methodology and example code included.

Football models and visualisations

Quantifying defensive contribution from tracking data

A model to quantify the impact of a player on preventing goals, passes, and carries from happening.

Stats Preform Pro Forum 2021 Talk

Article in SpaceSpaceSpace

Article in Analytics FC Blog

xG Infographics

Countering the over-simplified xG based interpretations with a detailed infographic.

Player profile visualisations

xPo: a framework to value football actions

An action value measurement framework to evaluate the contributions of players who aren’t the ones registering assists or scoring goals. The same concept can also be expanded to fit to defensive actions.

This concept and the dashboard won an honourable mention at the Seattle Sounders Analytics Competition 2020.

Two of the observations from the dashboard is Barcelona’s preference to attack from the right and Dani Alves’ strong contributions to their attack.

Done in R, D3js. Methodology and sample code included.

Player similarity / replacement based on aggregated match level data

Built a model to identify players with similar or better profiles. Built an interactive visualisation around it to understand various aspects of the player better.

In the linked post above, I analyse multiple players amongst the transfer rumours at the point and anecdotally validate the model with some of the results that show up, eg. Fernandinho’s list has pretty much every player whom Guardiola has either played in the same position earlier or was rumoured to be interested in buying - Alcantara, Jorginho, Busquets, and Rodri, amongst others who have similar playing styles. I keep updating the model and some of the most recent runs are on (this thread for some transfers during the summer transfer window of 2020)[]

Done in R, D3js. D3js code included.

Playing style similarity based on spatial data from passes

Built a model to identify similar player or team styles depending on the way passes are made and received.

Amongst other things, I used it to evaluate whether Liverpool choosing to practice against Benfica for their UCL 2019 final against Spurs made sense. The same method can be extended to compare and identify similar playing styles for any other team.

Some other applications of this model:

Done in R. Methodology included.

What Changed at Manchester City From 2016-17 to 2017-18

Compared some things that changed at Manchester City between their hohum 2016-17 season and multiple record breaking 2017-18 season.

Done in Python. Methodology included.

Adhoc analysis of the 2013 FPL season

A very unorganised set of posts from way back where I analyse fantasy football data.

Done in R. Methodology included.


Clustering using recursive division

We consider the problem of clustering categorical datasets, with a view to arrive at simple, easily interpretable clusters. We propose CURD, a recursive partitioning algorithm that expresses clusters as leaf nodes of a decision tree.

Link to paper

Done in R. Methodology and code included.

Scraping rental websites

A script to scrape data from multiple house rental websites and compile them in one place on a Google sheet. Saves you time on hunting across multiple websites.

Done in R. Methodology and code included.

ggTimeSeries, a ggplot library for time series visualisation in R

Some interesting and useful visualisation helper functions for time series data.

Done in R. Methodology and code included.

An R-Shiny based tool for Monte Carlo simulations

A very flexible tool which can be used in a browser to create and run your own monte carlo simulations.

Done in R. Methodology and code included.

An R-Shiny based tool for solving the warehouse problem

A tool which can be used in a browser to maximise the accessibility of charging stations with the minimum number of charging stations over a given road network. You can use it for maximising the accessibility of anything else though, doesn’t have to be just charging stations.

Done in R. Methodology and code included.