/
You can NOT be serious!   Dr Tim Paulden  EARL 2014, London, 16 September 2014 You can NOT be serious!   Dr Tim Paulden  EARL 2014, London, 16 September 2014

You can NOT be serious! Dr Tim Paulden EARL 2014, London, 16 September 2014 - PowerPoint Presentation

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
342 views
Uploaded On 2019-11-01

You can NOT be serious! Dr Tim Paulden EARL 2014, London, 16 September 2014 - PPT Presentation

You can NOT be serious Dr Tim Paulden EARL 2014 London 16 September 2014 How to build a tennis model in 30 minutes Innovation amp Development Manager Atass Sports Introduction ATASS Sports ID: 762046

tennis model score res model tennis res score log deviance probability binomial family logistic probabilities glm brank matchid arank

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "You can NOT be serious! Dr Tim Paulden..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

You can NOT be serious! Dr Tim Paulden EARL 2014, London, 16 September 2014 How to build a tennis model in 30 minutes (Innovation & Development Manager, Atass Sports)

IntroductionATASS SportsSports forecastingHardcore statistical researchFusion of ‘academic’ and ‘pragmatic’Today…Building a very basic tennis modelHighlighting some key ideas

Tennis modellingData obtained from tennis-data.co.ukSpreadsheet for each yearYou can easily get the data yourself!Ultimate goal of modelling is to determine the probability of different outcomesCan we forecast the probability of victory in a match from the players’ world rankings?How do we identify a "good" model?

Concept 1: Model calibrationAn effective model must be well-calibratedThe probabilities produced by the model must be consistent with the available dataThink in terms of “bins” – if we gather together all the cases where our generated win probability lies between 0.6 and 0.7 (say), the observed proportion of wins should match the mean win probability for the bin (roughly 0.65)Here’s an extract from Nate Silver’s recent bestseller, “The Signal and the Noise”…

Concept 2: Model scoreSuppose we use a model to produce probabilities for a large number of sporting events (e.g. a collection of tennis matches)We can assess the model's quality by summing log(p) over all predictions, where p is the probability we assigned to the outcome that occurred - this is the model scoreThe closer we match the "true" probabilities, the higher the model score (closer to zero)

The data set…tennis has 68,972 rows of data, with each match appearing twice (A vs B and B vs A) > dim(tennis) [1] 68972 11> head(tennis) matchid date day ago surf bestof aname arank bname brank res 1 1 20010101 3747 4503 hard 3 clement a 18 gaudenzi a 101 1 2 1 20010101 3747 4503 hard 3 gaudenzi a 101 clement a 18 0 3 2 20010101 3747 4503 hard 3 goldstein p 81 jones a 442 1 4 2 20010101 3747 4503 hard 3 jones a 442 goldstein p 81 0 5 3 20010101 3747 4503 hard 3 haas t 23 smith l 485 1 6 3 20010101 3747 4503 hard 3 smith l 485 haas t 23 0 > tail(tennis) matchid date day ago surf bestof aname arank bname brank res 68967 34484 20130427 8246 4 clay 3 rosol l 48 simon g 16 1 68968 34484 20130427 8246 4 clay 3 simon g 16 rosol l 48 0 68969 34485 20130427 8246 4 clay 3 garcia lopez g 87 mayer f 29 1 68970 34485 20130427 8246 4 clay 3 mayer f 29 garcia lopez g 87 0 68971 34486 20130428 8247 3 clay 3 rosol l 48 garcia lopez g 87 1 68972 34486 20130428 8247 3 clay 3 garcia lopez g 87 rosol l 48 0

From ranks to probabilitiesHow might we map the players' rankings onto a win probability? We’ll look at an extremely rudimentary approach in a moment as a worked exampleBut first, consider for a moment how you might mathematically combine the players’ rankings to get a win probability for each playerWhat are the important properties?

A "first stab" – Model 1Suppose our first guess is that if the two players' rankings are A and B, the probability of A winning the match is B/(A+B) matchid aname arank bname brank res aprob1 1 1 clement a 18 gaudenzi a 101 1 0.84873950 2 1 gaudenzi a 101 clement a 18 0 0.15126050 3 2 goldstein p 81 jones a 442 1 0.84512428 4 2 jones a 442 goldstein p 81 0 0.15487572 5 3 haas t 23 smith l 485 1 0.95472441 6 3 smith l 485 haas t 23 0 0.04527559 7 4 henman t 10 rusedski g 69 1 0.87341772 8 4 rusedski g 69 henman t 10 0 0.12658228 9 5 hewitt l 7 arthurs w 83 1 0.92222222 10 5 arthurs w 83 hewitt l 7 0 0.07777778

A "first stab" – Model 1In this case, the model score is -22194The "null" model in which each player is always assigned a probability of 0.5 gets a model score of -23904So Model 1 gives an improvement of 1710 over the null model (closer to zero is better)

How about the calibration?Let's generate a calibration plot for Model 1We'll use bins of width 0.1 (0 to 0.1, 0.1 to 0.2, etc), closed at the left hand side (e.g. 0.6 ≤ x < 0.7)For each bin, we consider all instances where our model probability lies inside the bin, and plot a point whose x-coordinate is the mean of the model probabilities and whose y-coordinate is the observed proportion of wins for these instancesExample: In this case, for the bin 0.6 ≤ x < 0.7, the point plotted is (0.648, 0.588)

Systematic bias of Model 1

A quick fix...Probabilities systematically too extreme, so could try and blend Model 1 with 0.5What weighting on Model 1 minimises the model score?A weighting of 0.71 on Model 1 is best – the model score improves from -22194 to -21333Obtaining best weighting can be done as a one-liner in R…

Quick fix (one-liner in R) glm(tennis$res~tennis$aprob1, family=binomial(link ="identity")) Call : glm(formula = tennis$res ~ aprob1, family = binomial(link = "identity")) Coefficients : ( Intercept) aprob1 0.1444 0.7112 Degrees of Freedom: 68971 Total (i.e. Null); 68970 Residual Null Deviance: 95620 Residual Deviance: 85330 AIC: 85330

Bias reduced, but still apparent

A substantial improvementmatchidarankbrank resaprob11 1810110.849 79194 10.508341814 1 0.022 141 118 18 1 0.132 7897 1 314 1 0.997 matchid arank brank res aprob2 1 18 101 1 0.748 7 91 94 1 0.506 34 181 4 1 0.160 141 118 18 1 0.239 7897 1 314 1 0.853 Model 1 Score -22194 Model 2 Score -21333

Stepping up a gearInvlogit function – widely used to predict binary sports outcomes (logistic regression)

Logistic regressionInvlogit function – widely used to predict binary sports outcomes (logistic regression)Let's do a logistic regression of the result on the difference in rank, (B – A)This is equivalent to player A's win probability being:invlogit( k*(B – A))The optimal value of k can be found using glm

Logistic regressionrankdiff = tennis$brank - tennis$arank g1 = glm(tennis$res~rankdiff-1 , family=binomial(link =" logit "))

Logistic regressionsummary(g1) Call:glm (formula = tennis$res ~ rankdiff - 1, family = binomial(link = " logit")) Deviance Residuals: Min 1Q Median 3Q Max -4.229 -1.123 0.000 1.123 4.229 Coefficients: Estimate Std. Error z value Pr (>|z|) rankdiff 0.0061067 0.0000987 61.87 <2e-16 *** --- Signif . codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 95615 on 68972 degrees of freedom Residual deviance: 89920 on 68971 degrees of freedom AIC: 89922 Number of Fisher Scoring iterations: 4

This has terrible calibration!

Logistic regressionThe model score comes out as -22480Worse than Model 1!Model 0 -23904 (probabilities all 0.5)Model 1 -22194 (simple B/(A+B) model)Model 2 -21333 (Model 1 squeezed)We need a better way of capturing the curvature…

Our paper...Developing an improved tennis ranking systemDavid Irons, Stephen Buckley and Tim PauldenMathSport International, June 2013Updated version to appear this year in JQAS(Journal of Quantitative Analysis of Sports)

Our paper...A decent model for generating probabilities isinvlogit( 0.58*(log(B) - log(A)) )where invlogit is the function exp(x)/(exp(x)+1) matchid arankbrankres aprob3118101 10.731791 94 1 0.505 34 181 4 1 0.099 141 118 18 1 0.252 7897 1 314 1 0.966 Model 3 Score -21285

Our paper…logterm = log(tennis$brank )-log( tennis$arank) g1 = glm (tennis$res~logterm-1 , family=binomial(link =" logit "))

Our paper…summary(g1) Call:glm (formula = tennis$res ~ logterm - 1, family = binomial(link = "logit")) Deviance Residuals: Min 1Q Median 3Q Max -2.614 -1.055 0.000 1.055 2.614 Coefficients: Estimate Std. Error z value Pr (>|z|) logterm 0.578851 0.006371 90.86 <0.0000000000000002 *** --- Signif . codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 95615 on 68972 degrees of freedom Residual deviance: 85139 on 68971 degrees of freedom AIC: 85141 Number of Fisher Scoring iterations: 3

Our paper…sum(log(g1$fitted.values[which(tennis$res==1)])) [ 1] -21284.7

Our paper…summary(g1) Call:glm (formula = tennis$res ~ logterm - 1, family = binomial(link = "logit")) Deviance Residuals: Min 1Q Median 3Q Max -2.614 -1.055 0.000 1.055 2.614 Coefficients: Estimate Std. Error z value Pr (>|z|) logterm 0.578851 0.006371 90.86 <0.0000000000000002 *** --- Signif . codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 95615 on 68972 degrees of freedom Residual deviance: 85139 on 68971 degrees of freedom AIC: 85141 Number of Fisher Scoring iterations: 3

Almost perfect calibration

Some comparisonsmatchidarankbrank resaprob21 1810110.748 79194 10.506341814 1 0.160 141 118 18 1 0.239 7897 1 314 1 0.853 matchid arank brank res aprob3 1 18 101 1 0.731 7 91 94 1 0.505 34 181 4 1 0.099 141 118 18 1 0.252 7897 1 314 1 0.966 Model 2 Score -21333 Model 3 Score -21285

Some comparisonsmatchidarankbrank resaprob17897 131410.997 matchid arankbrank res aprob3 7897 1 314 1 0.966 Model 1 Model 3 matchid arank brank res aprob2 7897 1 314 1 0.853 Model 2

Coming full circleIn fact, a bit of algebra shows that invlogit( 0.58*(log(B) - log(A)) ) is exactly the same asB0.58 / (A0.58 + B 0.58)And invlogit( B-A ) is the same as exp(B)/(exp(A) + exp(B))Try the simplest thing that could possibly work!

Graphically…

A final extensionWhat about the effect of the number of sets?Let's take the best model (Model 3) and look at the calibration plots…

Model 3 - All matches

Model 3 - Best of 3 sets

Model 3 - Best of 5 sets

Model 4This suggests we should have a combined model – "Model 4" – based on the rules that are in operationFor "best of 3 sets":invlogit( 0.54*(log(B) - log(A)) )For "best of 3 sets":invlogit( 0.72*(log(B) - log(A)) )

Model 4 – All matches

Model 4 – Best of 3 sets

Model 4 – Best of 5 sets

The best model score so farFor Model 4, the model score is -21252A final comparison:Model 0 -23904 (probabilities all 0.5)Model 1 -22194 (simple B/(A+B) model)Model 2 -21333 (Model 1 squeezed)Logistic -22480 (based on B-A)Model 3 -21285 (logistic with logs)Combined -21252 (split version of Model 3)

Some further questionsHow can we incorporate some of the other data available into the model?SurfaceIndividual playersMapping rankings to probabilities is only one component of the modelling process……you could use your own rankings or ratings!

Final thoughtsTry it yourself!www.tennis-data.co.ukModelling principles:Start SimpleGeneralise GraduallyCapture CurvatureBanish Bias

Thank you for listening!Dr Tim Pauldentim.paulden@atass-sports.co.uk