Tag: ipl

  • Relative Runs and Logistic Regression Models

    Recently, I undertook a certificate in sports data analytics through the Irish university ATU. The certificate involved two modules – one focused on the use of statistics in an academic context and the other on machine learning models and AI. Both modules were, in different ways, equally challenging and interesting.

    I made a real effort to apply some of the ideas behind Relative Runs to our assignments, most notably in the course on machine learning models. I’ve had an interest in how things like regression models can be applied to cricket since seeing them mentioned in a couple of great books (Cricket 2.0 and Hitting Against the Spin, to my recollection). When we got onto how to build and analyse machine learning models, I couldn’t wait to play with them using Relative Runs.

    I focused mostly on what are called logistic regression models, which, perhaps counterintuitively, are not a type of regression model, but rather a kind of classification model. What that means is that these models work by using input data to predict a binary output, rather than a continuous numerical output (which is broadly what regression models do). In short, these models are using input data to predict one of two classes, hence the name ‘classification model’. These two classes can be any variety of binary feature, from wins and losses to an injury occurrence or non-occurrence.

    In the case of my model, I wanted to look at how Relative Runs could be used to predict overall performance success. What I had was my analysis of the 2025 IPL season, including Relative Runs scores for all the top batsmen across that season. What I wanted to pair that with was some kind of binary output feature that I could double as an indicator of overall tournament success. This is tricky conceptually because we are starting with a player metric, and we want to know whether this input feature bears some relationship with an output team metric.

    At first, I was not optimistic that this route would be anything other than an interesting academic exercise. It proved more than that, however!

    I chose as my output variable the binary feature of whether a player’s team made the playoffs or not. That is, whether or not that player’s team finished in the top 4 or bottom 6 at the end of the regular season. In essence, the idea here was to say, rather than splitting the table in half, let’s split it into post-season entry or not. Most IPL franchises would deem making the playoffs as a key indicator of success, and in many ways, that is what they are buying when they build teams – they are paying to make the playoffs (and then hopefully win it all). If there is some underlying relationship between a stat like Relative Runs and team performance, making the playoffs is a good metric to start with in terms of testing that relationship out.

    So I had my output variable: making the playoffs or not. What I wanted to know next was whether there is a meaningful predictive relationship between certain input player metrics and that output variable. As input variables, I chose to focus on runs, strike rate, and Relative Runs per Innings. I ran logistic regression models to ascertain which of those inputs generated the best models in terms of predicting teams’ success, as measured by making the playoffs. And what’s more, I also tested combinations of those input variables with the same output target. The results were more interesting than I expected!

    First, regarding the tests with one player metric as input, the model with runs as the only input variable performed worst in terms of predictive accuracy, followed by strike rate, then by Relative Runs per innings. What this means, in short, is that according to the dataset used (which, granted, was only using the top 50 batters in the 2025 IPL), Relative Runs per Innings was the best predictor of team success of those three individual metrics, when using logistic regression. The model accuracy was only 0.67, and the precision was 0.5, which isn’t great, but it was the best of the three, which pleasantly surprised me.

    Combining the input variables was even more fascinating. You’d think more input variables mean more accuracy in the models, and that’s broadly what I found to be true.

    I ran models that used Relative Runs per Innings + strike rate, Relative Runs per Innings + runs, and lastly, all three together. The worst of those was Relative Runs per Innings + strike rate, while the other two generated the same key evaluation scores, so I put them both through what’s called a ‘k-fold cross validation’, which runs the model ‘k times’ using different slices of data. That extra step showed that the model using runs and Relative Runs per Innings was more accurate than the model which used all three input variables, curiously. This could be proof that the strike rate actually created noise in the model, as including it hindered accuracy.

    The mean accuracy of the best model was 0.7. What does this mean? In short, it means that the model, which used Relative Runs per Innings and runs as inputs, correctly predicted the output of making the playoffs or not 70% of the time. That’s not an astronomical result, but what is really encouraging from these models is that it gives proof that Relative Runs improved the accuracy of the models in terms of predictive success, and actually bore a stronger relationship with team success than both players’ runs and strike rate did on their own.

    Of course, it should be remembered that this is all based on one rather small dataset, but still, that is a fascinating result and a good indication of how Relative Runs could be used going forward. If the stat bears a strong relationship with team success, it could be a very useful tool for talent identification. Going big picture, we often scan the run charts and strike rates of batsmen in tournaments to find ‘the best’ players… But perhaps Relative Runs is a better starting point for these conversations than either of those traditional stats. Bigger datasets and more nuanced models will add depth to that conversation.

    I also dipped into regression models using the same three inputs, but with the output variable of the teams’ final rankings in the season, from 1 to 10. These regressions were interesting, but pretty much every combination of inputs resulted in fairly inaccurate models. That was probably down to the fact that using rank as a target output was not a great choice, and I’d try the whole process again with a finer-grained metric, such as win percentage. That is, I’d like to discover whether there is a good model to be made out of the relationship between Relative Runs and runs as inputs, and players’ win percentages as outputs.

    Broadly speaking, the logistic regression analyses worked a lot better, but there’s room to do a lot more regression analyses. Indeed, there’s room to do a lot more research with both types of models using much larger datasets, and utilising more varieties of the Relative Runs universe: that is, Relative Strike Rate, Relative Economy.

    If you’re interested in diving deeper into my machine learning model analyses using Relative Runs, here is a pdf copy of my submitted report.

    Another cool component of the certificate I completed was learning how to use Power BI to create reports and dashboards. If you have a Power BI account, you can take a look via this link at a report I built to display the batting stats of the top 49 batters from the 2025 IPL season, including Relative Runs and Relative Runs per Innings.

  • 2025 IPL Batting Analysis

    The 2025 IPL season is behind us and so it’s time to take a look at an analysis of the best batters at the tournament using Relative Runs (RR).

    But why? Well, what RR allows us to do is find hidden or overlooked value that traditional stats don’t otherwise reveal. In the case of this tournament, or any long competition, RR is very useful. That’s because traditional, (let’s say non-relative or ‘absolute’) statistics face some philosophical issue. Namely, the value of runs across matches is not consistent. The longer the tournament and the more diverse the conditions, the more this is a factor.

    Let’s flesh that last point out a bit. It’s trivially true that scoring 50 runs in a T20 match in which the total is 250, means less than in a total of 150. This is where the power of RR lies; it gives us a measure of the contribution of a batting score relative to the innings that it exists within. Not only that, RR provides a neat and tidy numerical reading that is easy to digest.

    Because RR is zero-sum – that is, the combined RR scores of an innings add to zero – the stat has an intuitive resonance. An RR score of 0 is exactly par, anything above or below demonstrates the runs that a player scores over/under, respectively, the expected score or mean (in our case, the ‘Par’) of an innings.

    This brief rationale for RR holds for all cricket matches but in the case of the IPL, a long tournament in which innings totals range from 120 to 250, RR is particularly useful for analysing the contribution of players across the entire season.

    As with any stat, RR is not the perfect measure of absolutely everything, but in the following discussion, we will point out its strengths and weaknesses in terms of providing pertinent analysis.

    For example, RR is a stat that looks at runs and not strike rates (more on our related Relative Strike Rate (RSR) another time). In the case of lower order ‘finishers’ in T20 cricket, RR might be less interesting than RSR, in the same way that we tend not to talk about averages with finishers in favour of looking at their strike rates.

    That’s probably enough preamble and justification, so let’s get into the findings – if you’re curious or need a refresher, you can read more about the formulation of Relative Runs here


    The best batters in the 2025 IPL 

    Let’s start with a bit of context for the forthcoming analysis: We are going to be mainly looking at the best batters in the tournament. 

    It’s a huge tournament of just over 70 games with more than 200 players taking part so this analysis will not be exhaustively looking at every single innings batted, but rather honing in on the top performing batters and using RR to evaluate their contributions.

    But who were the top batsmen? Well, we’re going to focus on the top 50-60 run scorers in what follows. Without detailing who they all are here (here’s a full list that you can peruse), below are the top 15 in order of runs scored at the tournament with runs, averages and strike rates listed. These are, fairly uncontroversially, the main batting stats used in everyday parlance. Hopefully soon, RR (or RR/Inns) is added to that list one day.

    The top 15 run scorers

    Sai Sudharsan (759; 54.21; 156.17), Suryakumar Yadav, (717; 65.18; 167.91), Virat Kohli (657; 54.74; 144.71), Shubman Gill (650; 50.00; 155.87), Mitchell Marsh (627; 48.23; 163.70), Shreyas Iyer (604; 50.33; 175.07), Yashasvi Jaiswal (559; 43.00; 159.71), Prabhsimran Singh (549; 32.29; 160.52), KL Rahul (539; 53.90; 149.72), Jos Buttler (538; 59.77; 163.03), Nicholas Pooran (524; 43.66; 196.25), Heinrich Klaasen (487; 44.27; 172.69), Priyansh Arya (475, 27.94, 179.24), Aiden Markram (445; 34.23; 148.82), Abhishek Sharma (439; 33.76; 193.39).

    The top 15 Relative Runs scorers

    In terms of total RR scored, the top 15 looked liked this:

    Suryakumar Yadav (284.45), Sai Sudharsan (268.38), Mitchell Marsh (267.05), Virat Kohli (256.23), KL Rahul (227.01), Yashasvi Jaiswal (219.34), Shreyas Iyer (187.64), Jos Buttler (173.13), Shubman Gill (159.38), Heinrich Klaasen (153.23), Ajinkya Rahane (140.16), Nicholas Pooran (134.55), Prabhsimran Singh (132.64), Abhishek Sharma (105.23), Aiden Markram (98.15).

    This is an interesting list for sure but the top of the chart is naturally going to be weighted towards those who batted more. That is, those who didn’t get injured and/or went deeper in the tournament. Total runs (and by extension the ‘Orange Cap’ winner) also faces this quite obvious objection as a good measure of the best batters.

    Really, our key measure of value should be RR per innings (RR/Inns), which answers the question of how much each player contributed relatively per outing. So, let’s have a look at that list.

    The top 15 RR/Inns scorers

    In terms of RR/Inns, the top 15 looked liked this:

    Mitchell Marsh (20.54), Sai Sudharsan (17.89), Suryakumar Yadav (17.78), KL Rahul (17.46), Virat Kohli (17.08), Yashasvi Jaiswal (15.67), Dewalt Brevis (15.50), Jos Buttler (13.32), Ayush Mhatre (12.02), Heinrich Klaasen (11.79), Ajiknkya Rahane (11.68), Shreyas Iyer (11.04), Shubman Gill (10.62), Vaibhav Suryavanshi (10.59), Nicholas Pooran (9.61).

    As you can see, the 15th player is the first to drop below a RR/Inns score of 10. That means, the top 14 all contributed at least 10 runs more than the mean of the innings they batted in, on average.

    That feels like not just a nice round number to cordon off a top group, but a fair measure of an elite contribution. So, let’s consider this top 14 the elite batters according to RR in the IPL. These were the guys who did significantly better than their own teammates, game in, game out, over the season; granted, some this list only played half the matches of the group stage.

    The next group would be those who notched 5-10 RR/Inns, and then 0-5. Batters who are in the negative in terms of RR/Inns have, as intuition would suggest, scored less than Par, or less than excepted.

    In some cases, such as he case of finishers, this isn’t necessarily problematic (as mentioned above, other stats are arguably better to evaluate finishers) but for top-order (even most middle order) batters, being in the negative in terms of RR/Inns marks a batter as ‘below par’.

    Lucknow captain Rishabh Pant is a good example of a below-par batter in the top 50 total scorers. He had a pretty poor season, aside from one terrific ton in his last game. Pant scored 269 runs but his RR/Inns was -5.80. Meaning that he averaged almost 6 runs less than his side’s Par score in each innings.

    If we exclude his last innings (the third best RR score in an innings in the entire IPL season), Pant’s RR/Inns was way down at -12.57. This is really nice indication of just how poor his output was and a figure that is, arguably, more instructive than his 269 runs at an average of 24.45 and strike rate of 133.16. Although, those numbers aren’t pretty reading, either.

    Marsh in a league of his own

    Putting Pant aside, what can we learn from this data at a glance about the best batters?

    Well, what immediately stands out is how Mitch Marsh comes to the top of the pile in terms of RR per innings. What this means is that his relative contribution to his team was the greatest of any batter in the tournament. He didn’t top any charts or win any of the official awards or even make many notable Teams of the Tournament but, by this metric, he was the best batter in the 2025 IPL.

    Other standout players include the top run scorers (Sudharsan, SKY & Kohli) and, more interestingly, KL Rahul. These five (including Marsh) were the only batters to score over 17 RR/Inns. Marsh is in a league of his own, though, at 20.54.

    Many of the leading run scorers get into this top group (the top 15 of RR/Inns), which is expected as it follows that high run scorers are going to have high RR scores, but it’s not a 1:1 correlation.

    Look at the difference between Shubman Gill and his top scoring peers, his RR per innings is pretty low comparatively (10.62) despite being the fourth highest run scorer in the league. Surely, he was hampered by Sudharsan’s relatively greater success. That is, Sudharasan’s incredible season drags Gill’s numbers down a bit, in terms of RR.

    The rising stars

    Coming in at the bottom of the top 10 in terms of RR/Inns is one of the more interesting players in this analysis and that’s Dewalt Brevis. He came into the CSK lineup only for the latter half of the tournament and really impressed. So much so that his RR/Inns is one of the highest across the board, albeit derived from fewer innings than much of his competition.

    The same can be said of Brevis’ teammate Ayush Mhatre (12.02 RR/Inns) and 14-year-old Rajasthan sensation Vaibhav Suryavanshi (10.59/Inns). These three rising stars exploded in the second half of the tournament and one can only wonder what their stats would look like had they played the full league phase. Presumably, we’ll find out next season.

    The top 60 run scorers

    Growing it out to the top 60 run scorers, there are some key trends. As expected, RR tracks with runs scored largely but not entirely. If they correlated exactly, it wouldn’t be a particularly interesting stat.

    As you can see in the chart above (RR/Inns vs runs), many batters loosely follow the trend line but some exist well above or below that line. These are the players that become of interest. Clearly, being well above the line suggests significant over-performance, and vice versa for being below it.

    You can see Marsh, Rahul et al. in the top right quadrant, mostly following the trend line. On the left of the chart, there is a cluster of positive outliers – Brevis, Rahane, Suryavashi and Mhatre. These guys are the lower-scoring over-performers, you could say.

    One player that is also interesting in this sense is CSK’s Rachin Ravindra. The Kiwi scored an uninspiring 191 runs (average 27.28, strike rate 128.18) but his RR/Inns score was 8.43. That is the 16th best RR/Inns in the season. However, his core stats tell a fuller story.

    The Kiwi was dropped midway through the season – essentially replaced by Mhatre/Devon Conway – due to his poor strike rate. So, while his RR/Inns was pretty impressive, there were other factors for his exclusion from the side. Also, being an overseas player, he is more prone to being dropped for such under-performance. Or rather, once he was out, it was impossible for him to get back in.

    This is an interesting case study of when RR does not tell the whole story, or might tell the wrong story. Another way of looking at this could be to say that perhaps Ravindra was a little unlucky to be completely excluded from the side and might be a good pickup for another franchise in the next auction, assuming CSK don’t retain him.

    Good, great and amazing

    Looking at the chart, the top 60 batters cluster into groups – those who scored over 500 runs, between 300 and 500, and under 300. Think of this as your run scorers being in an exceptional group, above average and decent. 300 runs is about the point where batters all go into the positive in terms of RR/Inns, hence the use of ‘above average’.

    In the elite group, it’s worth noting, having a slightly lower RR/Inns than the trend isn’t necessarily the worst thing. It can be a product of the specifics of the team in which the player exists.

    For example, the Gujarat top three (Sudharsan, Gill, Buttler) were a pretty special case this season. They all scored very heavily, remarkably so. Rarely, if ever, has an IPL side relied so much on the sustained output of a top three. What their incredible form did was lower the RR potential for each of them as none could become a huge outlier in the side. All of this adds depth to how we should read the results above.

    Sudharsan’s season was worthy of his accolades, it’s just that, according to RR, Marsh was more valuable. It’s a moot point, but it could be argued that RR shows that Marsh would have scored more for GT than Sudharsan did (if the players swapped sides), but we’d never know. I’m sure, real runs are more important than theoretical ones to many readers.

    Another point related to GT is that, just as Sudharsan was less relatively impressive than Marsh in virtue of being in a better team, Gill and Buttler were also significantly affected in terms of their RR potential by Sudharsan’s incredible season.

    Just as we could argue Marsh would score more than Sudharsan if he were at GT, one could equally argue that Gill’s objectively impressive 650 runs would have generated a higher RR score if he were in a poorer side (for example, in Marsh’s Lucknow). They would have counted for more RR in a lower scoring side but again, we’ll never know how he’d have performed in a different team context and in different match conditions.

    What we do know, though, is who wins our batting awards based on Relative Runs!

    Relative Runs batting awards for the 2025 IPL

    Most Relative Runs:

    1st: Suryakumar Yadav (284.45)
    2nd: Sai Sudharsan (268.38)
    3rd: Mitchell Marsh (267.05)

    Most RR/Inns:

    1st: Mitchell Marsh (20.54)
    2nd: Sai Sudharsan (17.89)
    3rd: Suryakumar Yadav (17.78)

    Most RR in an innings:

    1st: Abhishek Sharma (81.75 for his 141 vs Punjab Kings)
    2nd: Priyansh Arya (76.5 for his 103 vs CSK)
    3rd: Rishabh Pant (75.4 for his 118 vs RCB)