Methodology by example

Methodology

The model establishes a baseline for a player’s style of play and also get an idea of the variation around that baseline the player exhibits. The baseline value and variation is calculated for each variable. So, for instance, passing accuracy, shots per 90, and successful dribbles will each have their own baseline value and variation, as will every variable.

Here is a classy 2X2 matrix to explain the utility of the baseline value and the variation.

HighBaselineValue LowBaselineValue
HighVariation Strengths of this player
Not critical to his output
Something the player can do
Not critical to his output
LowVariation Strengths of this player
Critical to his output
Something the player doesn’t do

Comparison

We now compare each variable between the PoI and all other players who play in similar roles. The comparison is made with the PoI’s baseline and weighed by the variation. High baseline value, low variation variables dictate a majority of the comparison and low baseline value, high variation variables have almost no effect on the comparison.

Shortlisting

A shortlist is drawn based on the worst difference amongst the variables from the PoI. This is so that we find players that don’t impose a significant compromise on any of the individual variables. For eg. let’s take a player who has lower passing accuracy, lower shots per 90, and lower successful dribbles compared to the PoI, but higher goals per 90. Based on the baseline and the variation if lower passing accuracy is the worst compared to the PoI then the other three variables don’t affect the shortlisting.

Order of suitability

The order of suitability is then decided based on the total shortcomings of the shortlisted players. In this step, the lower shots per 90 and lower successful dribbles also affect the ranking while the higher goals per 90 is still not incorporated. This is again keeping in mind that we want to find players that offer the least compromise on the skills of the PoI. Any variable where the comparison with the PoI is in favour of the other player is a bonus buth expectaion from the player being compared is just to meet the PoI’s baseline.

FAQ

Why are other players above the PoI? They are better

More details about the methodology

It's June 2022 and I get paid to do this stuff now so if it helps you, here are some more details about the model I had. It's been too long and this is as much as I can recollect without going through code.

Split the data by each player and any additional qualifiers like season, role, etc. so long as you don't spread your data too thin. Each split is basically one of the entities which you want to compare to another entitiy to find similarities. Let's say you want to find players similar to Messi's 2019 performance so maybe you split your data by player-season combinations amongst which one of the splits would be Messi-2019.

From each split, take random samples with replacement of, say, 5 matches. This gives you a distribution from which you can calculate a mean/median, and a variance in the value of each stat. Think of these values as what you could expect to see if you were to randomly observe 5 matches they played in, sort of like a scout maybe. We are in central limit theorem territory so we can retain the mean, median, and variance and discard the rest of the data.

As an explanation of the philosophy behind the model, let us club the values we see for a particular feature into four categories - high mean, high coeff of variance ( std dev / mean ), low mean, high coeff of variance, high mean, low coeff of variance, low mean, low coeff of variance. If we were to divide the mean and std dev values you see for non-penalty goals per 90 for all the players amongst these four groups, you could interpret it as follows - players with high mean - low coeff of variance do well at this feature - such as strikers with high consistent output. Players with HM-HCV are the ones probably good at it but are either inconsistent or the team doesn't need them to be consistent at - maybe a midfielder who sometimes has more attacking responsibilities and scores often at those times but not at other times. Players with LM-LCV and LM-HCV may or may not have the skill but don't need to be performing on that feature at all - maybe centre backs.

For each feature, we now compute the Z score for all other entities based on the feature means and std devs of Messi-2019. So if Messi in 2019 scored 0.6 goals per 90 with a variance of 0.25, and another player-season scored 0.55 goals per 90, the second player would have a Z score on this feature of ( 0.55 - 0.6 ) / 0.25 = -0.2. We can similarly calculate a Z score for all the other features for all other player-seasons.

We now have a numerical representation of each player wrt to Messi-2019. We can now do a bunch of things rank and filter them. We could rank them by the number of features a player has a negative Z score on, or the average Z score across all features, or something more nuanced such as the average Z score but only onthe subset of feature that Messi-2019 had a relatively low CV on.

As with most algorithms, your choice of splits, your choice of features, your choice of ranking logic, and many other choices that you're making will eventually decide the quality of your results. What works for one player may not necessarily work for another. Given how long it has been since I used this model, you're probably better off experimenting yourself and finding what works for you. Also don't get too hung up on the ranks - you'll have more suitable players towards the top of the rankings and less suitable players towards the bottom but I wouldn't fuss too much about rank 1 being much better than rank 5.

For an idea of the results you can expect, some other places I used this:

  • A Twitter thread with a bunch of results from back in summer 2020 - https://twitter.com/thecomeonman/status/1306654311661760512 with, I think, some default settings.
  • A post about finding replacements for Messi and Suarez from summer 2020 - https://github.com/thecomeonman/Replacing-Suarez-and-Messi. This is a more agressive way of using this model where I run it multiple times over different seasons and picking those who appear towards the top more often.

Results

For every player, I’ve calculated two shortlists. The first shortlist is only amongst players who play in the top 5 leagues. This shortlist is to validate that the results we’re getting make sense, the players are actually similar to the baseline player, etc. The second shortlist is the more interesting one which spans many more leagues but considers only players less than the age of 25. I don’t yet have a way to adjust for league difficulty so right now all players are treated as having played equally skilled opposition.

How to use the interactive visualisation

The visualisation has three charts, two parallel coordinates ( the ones with lots of lines, ) and a horizontal stacked bar chart.

The two parallel coordinate charts are structured similarly, with a thick almost vertical black line marking the baseline player’s numbers, and all the other players arranged around it. You can hover or click over a particular player’s lines and points to get an idea of the difference from the baseline. One of the parallel coordinate charts highlights the values for various raw metrics this player has. The other chart shows a higher level score for the player for various player roles. There is no absolute meaning to the values in the second chart, it is only a comparison between various players and the baseline player.

The stacked bar chart indicates the positives and the negatives of each player over the baseline. In this chart, the player himself also has positives and negatives compared to the baseline. This corresponds to the sort of variation you see between games from the player themselves. Each part of the stacked bar relates to one of the variables being used to assess this player. The red ones are the ones where the player is lacking compared to the PoI and are the ones dictating their presence and position in the list. The green ones are where the player is better than the PoI. You can hover or click on any of the parts to get more details.

There are four modes which you can operate the visualisation in. These priarily reflect how the axes of the parallel coordinates.

  • Playing style: The axes are scaled to reflect the variation of that variable in the player’s performances.

  • Rank within all: The axes linearly go from the lowest rank to the highest rank, with the values of the metric itself not necessarily being uniformly spaced.

  • Scale within all: The axes linearly go from the lowest to the highest value amongst all the players that were compared to the baseline player.

  • Scale within subset: The axes linearly go from the lowest to the highest value amongst all the players that made it to the shortlist.

Here are the results for everyone’s faovourite find-the-next candidate, Virgil Van Dijk -

Shortlist 1


I ascribe the following tags to this player - aerial_defensive, aerial_offensive, defender, passer_deep

Players who’ve played at least 600 minutes across the following positions - rcb, lcb, rcb3, lcb3, cb are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All aerial_defensive aerial_offensive defender passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

Shortlist 2


I ascribe the following tags to this player - aerial_defensive, aerial_offensive, defender, passer_deep

Players who’ve played at least 600 minutes across the following positions - rcb, lcb, rcb3, lcb3, cb are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All aerial_defensive aerial_offensive defender passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

Data Used

I got Wyscout data from Oct 2018 to Sept 2019 to use in this exercise.

More examples

D. Henderson

Top 5 league matches


I ascribe the following tags to this player - goalkeeper

Players who’ve played at least 600 minutes across the following positions - gk are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All goalkeeper
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - goalkeeper

Players who’ve played at least 600 minutes across the following positions - gk are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All goalkeeper
Axis scaling: Scale within subset Scale within all candidates Rank within all

A. Onana

Top 5 league matches


I ascribe the following tags to this player - goalkeeper

Players who’ve played at least 600 minutes across the following positions - gk are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All goalkeeper
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - goalkeeper

Players who’ve played at least 600 minutes across the following positions - gk are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All goalkeeper
Axis scaling: Scale within subset Scale within all candidates Rank within all

H. Maguire

Top 5 league matches


I ascribe the following tags to this player - aerial_defensive, aerial_offensive, defender, passer_deep

Players who’ve played at least 600 minutes across the following positions - rcb, lcb, rcb3, lcb3, cb are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All aerial_defensive aerial_offensive defender passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - aerial_defensive, aerial_offensive, defender, passer_deep

Players who’ve played at least 600 minutes across the following positions - rcb, lcb, rcb3, lcb3, cb are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All aerial_defensive aerial_offensive defender passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

A. Wan-Bissaka

Top 5 league matches


I ascribe the following tags to this player - defender, runner, aerial_defensive

Players who’ve played at least 600 minutes across the following positions - rb, rwb, rb5 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All defender runner aerial_defensive
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - defender, runner, aerial_defensive

Players who’ve played at least 600 minutes across the following positions - rb, rwb, rb5 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All defender runner aerial_defensive
Axis scaling: Scale within subset Scale within all candidates Rank within all

B. Chilwell

Top 5 league matches


I ascribe the following tags to this player - defender, runner, crosser

Players who’ve played at least 600 minutes across the following positions - lb, lwb, lb5 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All defender runner crosser
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - defender, runner, crosser

Players who’ve played at least 600 minutes across the following positions - lb, lwb, lb5 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All defender runner crosser
Axis scaling: Scale within subset Scale within all candidates Rank within all

E. Can

Top 5 league matches


I ascribe the following tags to this player - defender, passer, passer_deep

Players who’ve played at least 600 minutes across the following positions - rdmf, ldmf, dmf, rcmf3, rcmf, lcmf, lcmf3 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All defender passer passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - defender, passer, passer_deep

Players who’ve played at least 600 minutes across the following positions - rdmf, ldmf, dmf, rcmf3, rcmf, lcmf, lcmf3 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All defender passer passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

Fernandinho

Top 5 league matches


I ascribe the following tags to this player - defender, passer, passer_deep

Players who’ve played at least 600 minutes across the following positions - rdmf, ldmf, dmf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All defender passer passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - defender, passer, passer_deep

Players who’ve played at least 600 minutes across the following positions - rdmf, ldmf, dmf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All defender passer passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

J. Grealish

Top 5 league matches


I ascribe the following tags to this player - playmaker_10, passer_deep, dribbler, scorer

Players who’ve played at least 600 minutes across the following positions - lcmf, rcmf, lcmf3, rcmf3 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All playmaker_10 passer_deep dribbler scorer
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - playmaker_10, passer_deep, dribbler, scorer

Players who’ve played at least 600 minutes across the following positions - lcmf, rcmf, lcmf3, rcmf3 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All playmaker_10 passer_deep dribbler scorer
Axis scaling: Scale within subset Scale within all candidates Rank within all

Thiago Alcântara

Top 5 league matches


I ascribe the following tags to this player - playmaker_10, passer_deep, defender, dribbler

Players who’ve played at least 600 minutes across the following positions - ldmf, dmf, rdmf, lcmf, rcmf, lcmf3, rcmf3 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All playmaker_10 passer_deep defender dribbler
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - playmaker_10, passer_deep, defender, dribbler

Players who’ve played at least 600 minutes across the following positions - ldmf, dmf, rdmf, lcmf, rcmf, lcmf3, rcmf3 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All playmaker_10 passer_deep defender dribbler
Axis scaling: Scale within subset Scale within all candidates Rank within all

David Silva

Top 5 league matches


I ascribe the following tags to this player - scorer, playmaker_10, passer_deep

Players who’ve played at least 600 minutes across the following positions - ramf, amf, lamf, lcmf3, rcmf3 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All scorer playmaker_10 passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - scorer, playmaker_10, passer_deep

Players who’ve played at least 600 minutes across the following positions - ramf, amf, lamf, lcmf3, rcmf3 are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All scorer playmaker_10 passer_deep
Axis scaling: Scale within subset Scale within all candidates Rank within all

C. Eriksen

Top 5 league matches


I ascribe the following tags to this player - scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Philippe Coutinho

Top 5 league matches


I ascribe the following tags to this player - runner, dribbler, scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All runner dribbler scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - runner, dribbler, scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All runner dribbler scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

H. Ziyech

Top 5 league matches


I ascribe the following tags to this player - runner, dribbler, scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All runner dribbler scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - runner, dribbler, scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All runner dribbler scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

L. Messi

Top 5 league matches


I ascribe the following tags to this player - runner, dribbler, scorer, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All runner dribbler scorer passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - runner, dribbler, scorer, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All runner dribbler scorer passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

C. Pulišić

Top 5 league matches


I ascribe the following tags to this player - runner, dribbler, scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All runner dribbler scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - runner, dribbler, scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All runner dribbler scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

E. Hazard

Top 5 league matches


I ascribe the following tags to this player - runner, dribbler, scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All runner dribbler scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - runner, dribbler, scorer, crosser, passer_deep, playmaker_10

Players who’ve played at least 600 minutes across the following positions - rw, lw, ramf, amf, lamf, rwf, lwf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All runner dribbler scorer crosser passer_deep playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

H. Kane

Top 5 league matches


I ascribe the following tags to this player - scorer, aerial_offensive, playmaker_10

Players who’ve played at least 600 minutes across the following positions - cf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All scorer aerial_offensive playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - scorer, aerial_offensive, playmaker_10

Players who’ve played at least 600 minutes across the following positions - cf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All scorer aerial_offensive playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

L. Suárez

Top 5 league matches


I ascribe the following tags to this player - scorer, dribbler

Players who’ve played at least 600 minutes across the following positions - cf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All scorer dribbler
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - scorer, dribbler

Players who’ve played at least 600 minutes across the following positions - cf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All scorer dribbler
Axis scaling: Scale within subset Scale within all candidates Rank within all

S. Agüero

Top 5 league matches


I ascribe the following tags to this player - scorer, runner, dribbler

Players who’ve played at least 600 minutes across the following positions - cf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All scorer runner dribbler
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - scorer, runner, dribbler

Players who’ve played at least 600 minutes across the following positions - cf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All scorer runner dribbler
Axis scaling: Scale within subset Scale within all candidates Rank within all

R. Lewandowski

Top 5 league matches


I ascribe the following tags to this player - scorer, runner, dribbler, aerial_offensive, playmaker_10

Players who’ve played at least 600 minutes across the following positions - cf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only top 5 leages - EPL, Ligue 1, La Liga, Bundesliga, and Serie A players considered

Player role: All scorer runner dribbler aerial_offensive playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Young matches


I ascribe the following tags to this player - scorer, runner, dribbler, aerial_offensive, playmaker_10

Players who’ve played at least 600 minutes across the following positions - cf are considered. Only those games where they played in any of the aforementioned positions were included in the data.

Only players of age 25 and below considered.

Players from any team in the following tournaments were considered - Argentina-Superliga, Austria-Bundesliga, Belgium-FirstDivision, Brazil-SerieA, Croatia-1HNL, England-Championship, England-EPL, Europe-ChampionsLeague, Europe-EuropaLeague, France-Ligue1, France-Ligue2, France-National1, Germany-Bundesliga, Italy-SerieA, Netherlands-Eredivisie, Portugal-PrimeiraLiga, Russia-PremierLeague, Scotland-PremierLeague, SouthAmerica-CopaLibertadores, Spain-PrimeiraDivision, Turkey-SuperLig, USA-MLS

Player role: All scorer runner dribbler aerial_offensive playmaker_10
Axis scaling: Scale within subset Scale within all candidates Rank within all

Anecdotal Validation

( Apologies for the EPL heavy examples but those are players I’m more familiar with. )

( Thank god for Guardiola and Man City’s money. Makes it much easier to do this anecdotal validation. )

I was particularly happy to see some of the lists -

  • Van Dijk, with Laporte on top of his list. VVD was a transfer target for Guardiola but they settled for Laporte instead.
  • Fernandinho, with pretty much every player whom Guardiola has either played in the same position earlier or was rumoured to be interested in buying.
  • David Silva, with Bernardo Silva and Ilkay Gundogan, they rotate in the same position for the same team.
  • Pulisic has Hazard and, interestingly, Hudson Odoi on the list, both of them have played for Chelsea in a similar position as Pulisic who eventually joined Chelsea later in January. W. Zaha, another player Chelsea are rumoured to be interested in, is also on the list.
  • Hazard’s list also has Willian, who plays on the other side of Chelsea’s attack, and Hudson Odoi who plays in the same position.
  • Aguero, with Jesus. Both of thme play in the same position for Guardiola’s team.

Comments

Some things need more work:

  • Goalkeeper similarities. Right now there is just one category of a keeper.
  • Better understanding of the variables. I’ve chosen variables for player roles based on my partial understanding of the data. I won’t be surprised if I have misunderstood some of them.
  • More nuanced categories. I have overarching categories like playmaker, dribbler, etc. but I feel there can be finer categories than this. This will go hand in hand with improvements in understanding the meaning of the variables.
  • More data. This is the cop out. There is a lot more data, even aggregated, which could be made available and help in better comparisons. Physicality related data such as top speed, kms run, etc. could be an easy addition.

Get in touch

Have comments or other player requests? Find me on Twitter - @thecomeonman!