xPo, A Way to Summarise Attacking Strategy

Update on 11th April 2020: There was a calculation error which caused passes and moves to not get equal weightage as shots which diminised their contribution. It's been fixed now.

What is xPo?

xPo expands to expected potential.

Why Do We Need xPo?

I wanted to figure out a way to measure the value of various attacking actions on the pitch. xPo is just an estimate of the potential xG an action could eventually generate.

I've listed some differences with the most commonly used method, xG Chain, and a recently popular method, xT in the Comparisons With Other Models section.

What is xG?

For those not familiar with xG, it's just a probability assigned to a shot for it ending up in a goal. There are multiple ways to calculate it and you'll find a whole host of material about it online.

I need xG to calculate xPo and I employ probably the most simplest of methods to calculate xG which I describe later.

How is xPo calculated?

If you have the ball, then there are three things you can do with it:

Pass it to some place
Run with it to some place
Shoot

Each of those actions may or may not eventually lead to a goal but we can estimate how often they are likely to lead to one. To do that, I evaluate three sets of numbers -

xPo for shots: This is the same as the xG of those shots.
xPo for passes: In the same spirit as xG, I evaluate the eventual xG a play is likely to generate after a pass moving the ball from one area of the pitch to another.
xPo for runs: Calculated in the same way as xPo for passes

The xPo for a shot is the same as the xG for a shot so first we calculate xG for all the shots we have in the database. This is straightforward - we take all shots from a certain area of the pitch, see how many of them were goals, and the xG for that area of the pitch is therefore the number of goals / number of shots for that area. We could use more sophisticated measures of xG if we could calculate it, but we'll keep it simple for now.

Once we've calculated the xG for all areas of the pitch, we can attribute these xG values back to the shots. Now onwards we care about the xG for a shot, and we don't care about the actual result of the shot being a goal or not anymore.

Now we start looking at calculating xPo for passes and runs. Let's say that in the first play that we look at there were 5 passes and 3 runs and that play eventually resulted in a shot with an xG of 0.3. All 5 passes and all 3 runs would get an equal and complete credit of an xPo of 0.3. Let's say the next play were looking at didn't result in a shot then all the actions in that play get credited with an xPo of 0.

It so happens that both the plays described above had a pass that originated in the same area as each other and ended in the same area as each other, i.e. we have two instances of a similar pass being played, one in each play. In one case the eventual xG is 0.3, and in the other the eventual xG is 0. The average eventual xG for such a pass therefore is 0.15 so we update the xPo for that pass as 0.15.

We continue this process across all plays, considering all runs and all passes and keep aggregating the eventual xG generated by similar passes and runs. When we've finally gone through all the plays, we arrive at the final xPo value for every particular action a team has performed. Every pass between a set of coordinates, every run between a set of coordinates, and every shot from any coordinate now has an xPo value. Ta da!

However, given the limited quantity of football that gets played, it's not feasible to get enough samples of a pass or a run or a shot at exactly the same set of coordinates. To get around this, I include events starting and ending in areas around the specific set of coordinates being looked at as well. The farther these other events are from the exact pair of coordinates, the lesser weightage they get in the final xPo calculation for those coordinates.

Data Processing

I have had to infer plays and runs from events data. I marked plays based on criteria like whether the team making the passes changed, or if the ball went out of play, etc. I extracted runs as all movement between the end coordinate of a pass and the start coordinate of the next pass. There are likely some edge cases that will sneak though and look funny but I'm working on making this better too. A large proportion of the data is still correct though so I'm still letting this out in the wild anyway.

I also considered the movement between a pass as a run only if the displacement met a length threshold. In this case the cut off was a length of 20 units where the pitch has been stretched to fit a 120 X 80 unit size.

How To View This

Select a Team

I have used data for some EPL games from the 2019/20 season until March 2020. I've used only teams which haven't had any managerial changes during the season because a consistent strategy over the whole season is easier to capture than a mix of two or more stategies.

There are extra comments for Liverpool which may help you understand what's happening more easily so if you're seeing this for the first time then you may want to choose Liverpool.

Viewing Pointers

Different colours mean different scales. The brightest red is not the same as the brightest blue is not the same as the brightest yellow, etc. Comparing the brightness of two different colours is meaningless.

You can opt to compare different selections on a common colour scale by leaving "Scale color relative only to selection" unchecked. By checking it you shrink the colour scale to span only across the selection's numbers rather than the common colour scale which makes it easier to see patterns for players which don't have very strong numbers for either probabilities or overall contribution.

I'd also recommend a high-ish brightness setting and disabling your red tinting software if you have it on.

Planning for an Opponent

An interesting use of xPo could be to identify overlaps of a team's strengths with an opponent's weaknesses. A simple implementation of this is demonstrated below with Liverpool and Aston Villa respectively.

For every action at or between a set of coordinates, I look at the xPo from Liverpool's actions and the xPo from all of Aston Villa's oppositions's actions and calculate their geometric mean, i.e. xPo combined = square root of xPo of Liverpool X xPo of Aston Villa's opponents. Actions where Liverpool's actions and Aston Villa's opponents' actions both lead to high xPo values, the combined xPo should also be a high value. In all other situation, the combined xPo should get suppressed to lower values.

This method highlights three areas in red that Liverpool can expect better returns than what they get against their typical opponent. These are - 1, the area near Aston Villa's left corner, 2, the area to the left of Aston Villa's goal, and 3, the area around the right edge of Aston Villa's box. It also suggests that the blue area to the right of Aston Villa's goal and some other spots in the nearby area may not be as rewarding as usual.

Liverpool also scored more proficiently than Aston Villa conceded so the above observation should only be considered in a relative manner. It is possible that Liverpool would score with their regular probabilities from the area to the right of Aston Villa's goal but in that case they would be scoring with much higher probabilities from the other three areas mentioned above.

The models that I'm familiar with that people use to perform this sort of analysis are listed below along with my opinion on the advatanges or disadvantages of each of them compared to xPo.

xG chain

The motive behind this model is to split credit for goals / shots amongst all the participants involved in the play leading up to it, so in a way it's not doing exactly the same thing as xPo. However, adding these up across actions or matches, etc. has been an oft used metric to establish the overall credit players should get towards their team's goals.

This takes the xG of the shot and splits it up equally between all actions in that play. It's quick and simple and easy to understand. This does not, however, assign a value to every action. It only assigns a value to an action that led to a shot.

xThreat

xT tries to assign a value to having the ball in a certain part of the pitch. The value of an action is therefore the difference between the value of the ball being in its start position, before the action, and in its end position, after the action.

Some things that I consider pros for xPo over xT -

xPo assigns a value to an action executed from and, when relevant, to a set of coordinates. xT assigns a value to the ball being in a certain part of the pitch.
xPo distinguishes a pass and a run and assigns a separate value to both. xT counts both as ball movement and clubs them together.
xPo does not draw a grid on the pitch, it is calculated for any area or pair of areas on the pitch and can be calculated for any resolution. xT considers a grid on the pitch and is evaluated for a block in the grid. A small movement on the pitch may move you across blocks and change your xT value unrealistically but your xPo value should not get affected so much. Karun mentioned that this can be worked around by interpolating the xT values for coordinates that fall between the centers of the grid.

Some things that I consider pros for xT over xPo -

xPo only cares for the action and the end outcome of that play and assumes that by averaging over all the plays the ensuing actions get incorporated somehow in the xPo value. xT explicitly considers possible sequences of play going through various parts of the pitch.
xPo only gives credit when a shot happened at the end of the play whereas xT also gives credit even for situations where they got into a possible shooting position even if they didn't actually shoot.

And some trade offs -

xPo may spread the data too thin since it tries to capture an event between any pair of coordinates. xT models a block in the grid so there is sufficient data but it might mix too many situations together.

VAEP and G+

This uses the last three actions at any point in time, and various attributes of those three actions as inputs to a function that estimates the probability that a goal will be scored or conceded in the next 10 moves.

VAEP has two main advantages over xPo. Firstly, VAEP is a far more robust formulation given that it considers all these extra attributes. Second, it incorporates the probability of conceding so you also get an idea of the downside of an action.

The main disadvantage of it though is that given it's such a rich formulation, it will need quite a lot of data to calibrate. xPo is just a slightly complex aggregation of whatever data is made available to it and it doesn't need to be calibrated. It can therefore be run on small sets of data too and return a more specific set of values for, say, a team or a player which can be important if their playing styles are distinct from others. This also makes it easier to consume, since you don't need context of the previous two events.

Opta's Possession Value

I don't think they have revealed their methodology but they estimate a value based on the probability of scoring or conceding after an action. The only thing they have revealed is that they use 5 previous moves to estimate the PV. The pros and cons, I imagine, would thus be similar to VAEP.

I had originally done this as a tool to help a team prepare for a match and I have a demo of what an xPo based dashboard could look like - here. An earlier version of this was my submission for the Seattle Sounders' 2020 Soccer Analytics Conference and got an honourable mention.

There is also code available to generate the raw data needed for the dashboard here which you can repurpose to use on other datasets.

Do you have any feedback, comments, questions, or interesting observations? Would you like to use xPo at your club? You can find me on Twitter or send me an e-mail at mail dot thecomeonman at g mail dot com

Non TL;DR Section

The demos here and on the dashboard are restricted to only three events but the same logic can be extended to any other event too. For eg. you could evaluate an xPo for, say, tackles, in different parts of the pitch by looking at how much xG the possessions with tackles in that area eventually led to.

A shortcut that some value estimation frameworks apply is to just use the aggregated xPo from runs, passes, and shots at the location that the ball was won, add the value of the opposition's xPo at that point, since the tackle robbed them of that much xPo, and attribute that as the total xPo of that action.

An idea I got from Jan Van Haaren of SciSports is to also extend this to calculate the chances of a team conceding from a particular action. The way this can be modelled is similar to how xPo is calculated, except you use the xG generated by the next possession of the opponent. The net xPo of an action would then be the difference of the scoring xPo and conceding xPo.

This can be trivially extended to calculate the total xPo generated of players, the xPo contribution per action, the percent of xPo WRT to the team's overall xPo generationm etc. and get a measure of the contribution of that player.