|| UPR Clone/ Weighting
|I am trying to clone my UPR to make it UserFactor1 so that I can use it to create a new UPR.|
Is there any way to simply clone my UPR to the new Groupname or do I have to go through each record of my existing UPR and individually clone each record?
Also, I am interested in trying to determine how to better weight the factors in the UPR. Specifically I am concerned with secondary factors such as Rail Position, Age of Horse etc. If my UPR would give a horse a .20% chance of winning and I feel that his post 12 will reduce his chances by 10% I would like to weight it so the horse comes up as .18% chance. I am trying to figure out how to best do that in the UPR.
"I am trying to clone my UPR to make it UserFactor1 so that I can use it to create a new UPR.--end quote
Is there any way to simply clone my UPR to the new Groupname or do I have to go through each record of my existing UPR and individually clone each record?"
The UPR Tools Interface has a Clone button that can be used to clone an individual record. (But it does not have a Clone all records that make up a GroupName function.)
So if you want to create a new GroupName that starts out as a clone of an existing GroupName and make edits/mods to it from there - and if you are working within the UPR Tools Interface:
You would cycle through the records of the initial GroupName - and working one record at a time - clone the current record, edit the GroupName field (replacing the old GroupName with the new GroupName) followed by hitting the Save button.
If you have a copy of Access 2003 installed on your machine, an alternate (easier) way might be to:
1. Make a backup copy of the c:\2004\JCapper.mdb file first before getting started.
2. Open the c:\2004\JCapper.mdb file in Access.
3. Open the ImpactValues table sorted on the GroupName column.
4. Highlight all records for the old GroupName.
5. Hit CTRL-C to copy all highlighted records to the Clipboard.
6. Scroll to the very bottom of the interface and click on the * character to the left of the very bottom row.
7. Hit CTRL-V to paste/append all records from the Clipboard to the bottom of the interface.
8. Edit the GroupName field (from old GroupName to new GroupName) for each of the new rows added in step 7 above.
"Also, I am interested in trying to determine how to better weight the factors in the UPR. Specifically I am concerned with secondary factors such as Rail Position, Age of Horse etc. If my UPR would give a horse a .20% chance of winning and I feel that his post 12 will reduce his chances by 10% I would like to weight it so the horse comes up as .18% chance. I am trying to figure out how to best do that in the UPR."--end quote
With enough experience you can make a pretty good ballpark estimate about the proper weights - and tune from there.
That said, you won't know what you have until you've rebuilt a quarterly data folder or two (which has the drawback of being time consuming.)
One thing that I practice - and it really helps with getting a good ballpark estimate for the initial weights - is to maintain a folder that contains a 'smattering' of 175-250 randomly selected racecards (a racecard being a matching .jcp and .xrd file) from the past 9-12 months for tracks that I play.
When I make the type of changes to a UPR GroupName that you are talking about I start out with a ballpark estimate as to the weights - but my first build will be on the 'smattering' folder mentioned above rather than my much larger quarterly folders.
Then, once I see what I have based on that first build I'll go through a process where I make small incremental changes to the weights followed by a rebuild (repeating the process as needed) using the same 'smattering' folder until I am satisfied with the results.
The advantage here is one of being able to dial in the weights quickly because I am using a folder that doesn't take all that long to rebuild.
Once I think I have the weights close to dialed in - THEN I'll start building larger data folders to see what I have.
Do you know of the proper way to look at the R regression summary to determine how to weight the factors in your UPR?
For example, let's say that after doing the regression summary I felt I had the best model with 5 factors. I tried weighting the UPR based on the t-values but that is not correct. I'm trying to determine the best way to weight the factors using the results of the regression analysis as the basis.
"Do you know of the proper way to look at the R regression summary to determine how to weight the factors in your UPR?--end quote
For example, let's say that after doing the regression summary I felt I had the best model with 5 factors. I tried weighting the UPR based on the t-values but that is not correct. I'm trying to determine the best way to weight the factors using the results of the regression analysis as the basis."
The best way for me is to pick a starting point with the factor weights - builld a small database - note the win rate and roi for the top ranked UPR horses - make a small incremental change to the weights - rebuild the same small database - note the impact the change in weights has on the top ranked UPR horses - and repeat the process a few times - with the objective being to maximize win rate and roi for the top ranked UPR horses.
From there, validate the model by observing the performance (roi, win pct) of the top ranked UPR horses going forward in time on new races.
Then, if and only if the model 'validates' acceptably - integrate the model into my process for daily live play.
The math behind that is similar (but does have some differences) compared to what R does when logistic regression is run in the mlogit module on a development sample.
And certainly the validation process is similar to what needs to be done before integrating any new model (including one created in R) for use in live play.
But the UPR scoring algorithim in JCapper is done a little differently than the formula Wong used in Precision.
The following may or may not help you get a better starting point for your UPR weights...
In Precision, author CX Wong plugged the beta coefficients generated by R for each factor in his model into the following formula to generate a "score" for each horse:
[Score] = (EulorNumber) ^ [(Beta1 x NumVal_Factor1) + (Beta2 x NumVal_Factor2) + (Beta3 x NumVal_Factor3)]
Assume you've created a model based on 3 factors.
Further assume the beta coefficients generated in R for your factors are as follows:
Further assume the numeric values of the three factors in your model for the 1 horse in a given race are:
Plugging the above info into the formula, we get a "score" for the example horse of 4.34 calculated as follows:
[Score] = (2.718) ^ [(-0.0750 x 2.60) + (0.0175 x 82.78) + (0.0025 x 92.50)]
[Score] = (2.718) ^ [(-0.195) + (1.44865) + (0.23125)]
[Score] = (2.718) ^ [(1.4849)]
[Score] = 4.434
Once the "scores" for each horse have been calculated, his win prob estimate for any given horse is based on the following formula:
[Prob] = [Score]/(Sum of "scores" in the race)
This is great, let me tinker with it a bit.
Also - It hit me that since I am only using one userfactor, I have cloned the UPR I am working into UF2,3,4,5. Now I can try the UPR 5 ways with each database build to test if my changes are headed in the right direction.
|I'm missing something. I tried using the beta weights for the UPR weights and it fell apart. I'm missing something.|
|What I was trying to do was show you how Wong (and others) use beta coefficients generated in logistic regression in a scoring formula - and from there, how they use scores to generate a prob estimate.|
I never suggested using beta coefficients generated in R as your UPR weights. Obvously, you can't do that. (It doesn't work.)
If you study Wong's formula it should (eventually) become clear that the lower a beta coefficient the greater the degree of importance for that factor.
Weights in JCapper UPR behave in an opposite manner: The higher the weight the greater the degree of importance.
UPR as it exists in JCapper is a score that is nothing more than a simple weighed avg among the factors in your model.
Assume a 3 factor model where F1, F2, and F3 are the numeric values of your factors and W1, W2, and W3 are the weights for your factors.
Your scoring math looks something like this:
UPR = [Score]
UPR = [(F1 x W1) + (F2 x W2) + (F3 x W3)] / (W1 + W1 + W3)
Hope that helps.
I'm not asking the question correctly. I am trying to determine how the model tells me to choose and weight the factors in my UPR. Let's assume my model boiled things down to 3 factors (to keep things simple). (FigConsensus F12, FormConsensus F06, and RaceStrength F15) All meet the p value criteria as meaningful.
I thought the t value would be the best way to start to estimate how to accurately weight the factors:
lm(formula = winpayoff ~ valF12 + valF06 + valF15)
Estimate Std. Error t value Pr(>|t|)
(Intercept) -2.19e-01 6.87e-03 -31.93 < 2e-16 ***
valF12 4.50e-03 5.88e-05 76.55 < 2e-16 ***
valF06 5.71e-04 9.25e-05 6.17 6.8e-10 ***
valF15 6.26e-04 5.64e-05 11.09 < 2e-16 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.324 on 138016 degrees of freedom
Multiple R-squared: 0.0605, Adjusted R-squared: 0.0605
F-statistic: 2.96e+03 on 3 and 138016 DF, p-value: <2e-16
I tried to use the t values of 76.55, 6.17, 11.09. Based on the t values I put the percentages as
and used those numbers as the weight for each factor. I realize there is overlap and collinearity but I was hoping this would be a good starting point as to how to weight my factors in the UPR. I am reading the summary incorrectly?
|Pr(>|t|) values are used to indicate significance. They aren't intended to be used the way you are attempting to use them (as weights.)|
Take a look at Wong's scoring formula (posted above) and compare it to the UPR Tools weighted average formula I posted (also above.)
Both will get you to a similar place.
But the formulas for each are designed in such a way that a different type of significance value is required when it's time to plug numbers into the formulas.
Wong is using the mlogit module in R to generate beta coefficients and Pr(>|t|) values. From there he is checking to see if Pr(>|t|) values are low enough to suggest enough significance to make it worthwhile to introduce a new factor into the model.
If the decision is made to introduce a new factor into the model he plugs the beta coefficient for the new factor into his algorithm's scoring formula.
Contrast that to UPR Tools where I'm plugging an initial estimate for degree of importance (Weight) into a weighted average formula when I create entries for a new GroupName.
From there, after creating the initial GroupName - I'm performing multiple steps en route to maximizing win rate and roi for the GroupName's top ranked horses.
My own under the hood process involves the following:
- Creating a folder that contains 200 or so recent race cards on it (there is no exact right number) for tracks I play to be used as a development sample.
- Creating a slightly larger folder that contains 300 or so recent race cards on it (there is no exact right number) for tracks I play to be used as a validation sample.
It's important to note that I prefer the races in my validation samples to be run during a time period immediately following the races in my development sample.
Experience has taught me to validate models by confronting them with fresh races. Doing it that way tends to make the models I develop reflect current public betting trends as opposed to public betting trends as they existed at some point in the past. (I find this last part to be far more important than the number of races in the samples themselves.)
- Building a database on my development sample folder and noting win rate and roi for the top ranked UPR horses.
- Making an incremental change to the weights - for one factor in the model - and in one direction - and repeating step 3 above.
- If my incremental changes to the weights have moved win rate and roi in the right direction: Make a further incremental change in the same direction and repeat steps 3-4-5-6.
- If my incremental changes to the weights have moved win rate an roi in the wrong direction: Make a further incremental change in the opposite direction and repeat steps 3-4-5-6.
- Repeat steps 3-4-5-6 for each factor in the model until win rate and roi for the GroupName's top horses approach maximized values.
Hint: When incremental changes to factor weights result in minimal or statistically insignificant changes to win rate and roi - that's when I assume Weight for each factor is close to optimal. From there it's time to move on to the next step: Validation.
- Run a build database routine on the validation folder and note win rate and roi for the GroupName's top horses.
If validation sample win rate and roi are not statistically or materially different than that noted for the model in the final development sample - at that point I'm willing to accept the idea that the new GroupName passes validation.
It's at this point (and not before) that I'll consider using the new GroupName for live play.
On the other hand, if validation sample win rate and roi are statistically or materially worse than that noted in the final development sample: I tend to think of the new GroupName as having failed validation.
If that's the case I'll either scrap it or dive back in, adjust the factor mix, create new development and validation samples, and repeat the whole process.
There really aren't any shortcuts (other than automation) to the above process.
Believe it or not these steps that I perform manually will work no matter what values you use as your initial factor weights - PROVIDED you follow them through to completion - to where win rate and roi become maximized and weights are approaching optimal values.
Of course the closer your initial weights are to optimal the shorter the time it takes to complete the process.
One final note: Although it hasn't been mentioned in this thread, rest assured Wong and others like him use development and validation samples just like I do. Or is it the other way around and I'm the one who uses them like they do?
"I'm not asking the question correctly. I am trying to determine how the model tells me to choose and weight the factors in my UPR."--end quote
The thing is, the values generated in R don't tell you how to arrive at factor weights for use in a weighted average formula like the one I am using in UPR Tools. For that you have to work through a process similar to what I posted above.
My takeaway is that the two most important values generated by the mlogit module in R are Estimate and Pr(>|t|).
When Pr(>|t|) is low enough it can indicate significance for an individual factor.
If and when you decide to add a new factor to a model based on a Pr(>|t|) value low enough to indicate significance - you can do so by plugging the Estimate or beta coefficient for the new factor into a scoring algorithm formula similar to that used by Wong in Precision.
After adding a new factor to a model and running the revised model through the mlogit model in R, it's probably a good idea to compare the Log Likelihood value for the new model against the Log Likelihood value for the old model. In theory, the model with the higher Log Likelihood is the better model.
However, experience has taught me that the better way to compare two models is to observe how each model performs when confronted with fresh races going forward in time.
If it's obvious that one model is outperforming the other... I tend not to care about the Log Likelihood values.
To me, performance going forward in time is the acid test.