Database Handicapping Software- JCapper

JCapper Message Board

          General Discussion
                      -- Statistical programs

Home Register
Log In
By Statistical programs
SILVER01HDW
2/28/2015
12:01:15 PM
Is anyone using R or another statistical program to optimize patterns and results?

Reply
jeff
2/28/2015
1:55:47 PM
I am.

I started an outside of JCapper data project about a year ago.

The following bullet points describe the "How did I go about it?" part:

• About a year ago, after reading Precision by CX Wong, I became interested in doing an outside of JCapper pricing model.

• I read everything I could find on the topic of Logisitic Regression.

Hint: I was able to find several titles on Google Books where I was able to read 60%-70% or so of the complete book (for free.)

Hint: My efforts in this area lead me to Youtube where I was able to watch free video of college level courses being taught (including a class at Stanford) where the course topic (and the keywords emphasized in my search) was Logistic Regression.

• I downloaded "R" from http://www.r-project.org/

• I created a version of the JCX File Exports Module that enables the player to use custom sql expressions to drive export of JCapper tables to .csv file.

Hint: Once you have data sitting in a .csv file you can open the .csv file in Excel 2010 - and from there: clean the data up/make transformations, etc.

• From there, once you understand the basics - you can connect the MLogit Module in "R" to your .csv files - and let it perform statistical analysis (Logistic Regression) for you.

• From there, you are in a position (certainly a much better position than you otherwise would be without performing statistical analysis on your data) to build a pricing model - or UPR, UserFactors, and UDMs for that matter.

The above process entails a LOT of work. Along the way I think I've gained a much deeper understanding than I had before of not just racing data and model building in general... but an understanding of crowd behavior and what it actually takes to build models that perform reasonably well going forward in time.


-jp


.


Reply
SILVER01HDW
2/28/2015
3:16:01 PM
Thanks Jeff, that is helpful. I haven't read Wong's book but do intend on reading it at some point in the near future. I was looking for the right package to install on R so this is helpful. One more question, is it possible to convert the playlist file that is generated in notepad to a CSV file?

~Edited by: SILVER01HDW  on:  2/28/2015  at:  3:16:01 PM~

Reply
jeff
2/28/2015
4:59:39 PM
PL_Profile.txt files have (if I recall correctly) 500 plus data fields (or columns) per row.

They can be opened in Excel 2010 - which is designed to handle that many columns and more.

However, Excel 2003 has a limit of 255 columns per row. For that reason - Excel 2003 is not a good choice for handling PL_Profile.txt files.



BASIC OPERATING INSTRUCTIONS for getting a PL_Profile.txt file into Excel 2010:

1. Working from inside of Windows Explorer (or My Computer) find the desired PL_Profile.txt file, right-click it, and select COPY.

2. Right-click (not on a file but in the 'white space' inside the folder where the PL_Profile.txt file you are working with is located and select PASTE.

This will cause Windows to create a copy of your PL_Profile.txt file on the folder where you are working.

Hint: Creating a copy leaves the original intact - which prevents you from 'breaking' the integrity of JCapper Build Database routines run on the folder where you are working.

3. Right click the copy created in step 2 above, and rename the file. While renaming the file, change the file extension from .txt to .csv.

4. Double click the renamed copy (which is now a .csv file) from steps 2 and 3 above - and provided you have Excel 2010 installed on your machine - you should find that the file now opens in Excel 2010.

That's it!


-jp

.


Reply
NYMike
3/12/2015
11:35:03 AM
Jeff,
Your answer explains how. Can you shed a little light on why? What are you looking at and how would that information be used? I am interested in Logistic Regression but I'm not quite connecting the dots.

Thanks,

Mike

Reply
jeff
3/13/2015
3:28:08 PM
Why do it?

Short answer: Better decisions during live play.

How and related insights?

I don't have that kind of free time right now.

In order to cover the subject matter adequately, I'd end up writing the equivalent of several chapters from a book.

And if I were to do that - I would not be surprised one bit if what I ended up writing looked an awful lot like the book I've already recommended in this thread:
Precision by CX Wong




-jp

.

~Edited by: jeff  on:  3/13/2015  at:  3:28:08 PM~

Reply
jeff
3/13/2015
3:20:43 PM
One of the situations I face daily is calculating a strike price. Or, more specifically - deciding whether or not the odds offered on a horse I am about to bet are high enough that the odds combined with the horse's probability of winning represents a +EV (positive expected value) situation.

Mathematically, the only situations I should be betting are those offering +EV. It goes without saying (obviously) that situations offering -EV (negative expected value) are what I need to avoid.

Bet only +EV over time - and the result is exponential bankroll growth.

On the flip side of things - sprinkle enough -EV in with the bets - and the expected result (eventually) is complete loss of bankroll.

That said, all horseplayers are human beings and subject to mistakes. I'm convinced some of us are capabable of playing a near perfect game in fits and spurts. But none of us are capable of playing a near perfect game perpetually.

Speaking strictly for myself, the goal I am striving for - and the reason I recommend statistical tools to analyze data - is improved accuracy when it comes to identifying +EV situations.






Let's try some SIMLIFIED examples where I create (certainly not the entire thing - I'm not looking to write a book here) but individual parts of a pricing model based on data analysis of a few basic areas of the game: early, late, class, ability from speed figs, form, human connections, breeding, and track profile.

For purposes of these examples I'll be breaking out specific areas of the game in terms of the following JCapper factors:

EARLY:
• EarlyConsensus

LATE:
• LateConsensus

CLASS:
• ClassConsensus

FIGS:
• FigConsensus (primary)
• CFA (secondary)

FORM:
• FormConsensus

HUMAN CONNECTIONS:
• Situational Data Window samples for trainer.
• Situational Data Window samples for rider.

TRACK PROFILE:
• Situational Data Window samples for early, late, and perhaps railposition/gate draw.

EDIT: After posting that and re-reading it I'm struck with the thought that none of this is going to be simple. (But let's see where it leads.)


More to come...



-jp

.


~Edited by: jeff  on:  3/13/2015  at:  3:20:43 PM~

Reply
jeff
3/14/2015
9:12:00 PM
Big Picture Data Sample:

To start things off, let's get something that represents (the most basic glimpse of) the big picture. The following data sample is driven by a sql expression that gets us every starter that raced on an outer (Main) dirt surface during calendar year 2014:


query start: 3/14/2015 2:21:52 PM
query end: 3/14/2015 2:24:27 PM
elapsed time: 155 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
`
SQL: SELECT * FROM STARTERHISTORY
WHERE [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2014#
AND INTSURFACE=1
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 411151.80 416241.60 415691.90
Bet -550914.00 -550914.00 -550914.00
-----------------------------------------------------
P/L -139762.20 -134672.40 -135222.10
`
Wins 37208 74017 107303
Plays 275457 275457 275457
PCT .1351 .2687 .3895
`
ROI 0.7463 0.7555 0.7545
Avg Mut 11.05 5.62 3.87
`
`
By: SQL-F19 Rank (EarlyConsensus)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -8429.50 77660.00 0.8915 9324 38830 .2401 1.7777
2 -13419.90 74672.00 0.8203 6801 37336 .1822 1.3485
3 -17328.70 73492.00 0.7642 5529 36746 .1505 1.1139
4 -18752.30 72766.00 0.7423 4716 36383 .1296 0.9596
5 -21409.10 70824.00 0.6977 3815 35412 .1077 0.7976
6 -21603.40 63748.00 0.6611 2894 31874 .0908 0.6722
7 -17157.10 48782.00 0.6483 1902 24391 .0780 0.5773
8 -11051.10 32196.00 0.6568 1114 16098 .0692 0.5123
9 -5398.10 19248.00 0.7196 623 9624 .0647 0.4792
10 -3211.60 10742.00 0.7010 316 5371 .0588 0.4356
11 -1223.50 4420.00 0.7232 119 2210 .0538 0.3986
12 -518.10 1906.00 0.7282 43 953 .0451 0.3340
13 -135.80 320.00 0.5756 10 160 .0625 0.4627
14 -112.00 126.00 0.1111 2 63 .0317 0.2350
15 -4.00 4.00 0.0000 0 2 .0000 0.0000
16 -2.00 2.00 0.0000 0 1 .0000 0.0000
17 0.00 0.00 0.0000 0 0 .0000 0.0000
18 -2.00 2.00 0.0000 0 1 .0000 0.0000
19 -4.00 4.00 0.0000 0 2 .0000 0.0000
`
`
`
By: SQL-F22 Rank (LateConsensus)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -15036.30 81924.00 0.8165 9455 40962 .2308 1.7088
2 -16422.10 76866.00 0.7864 6960 38433 .1811 1.3407
3 -16921.90 76006.00 0.7774 5840 38003 .1537 1.1377
4 -16872.80 74376.00 0.7731 4821 37188 .1296 0.9597
5 -18289.20 70802.00 0.7417 3828 35401 .1081 0.8005
6 -18132.60 61690.00 0.7061 2834 30845 .0919 0.6802
7 -16562.80 45402.00 0.6352 1667 22701 .0734 0.5436
8 -10005.00 30040.00 0.6669 933 15020 .0621 0.4599
9 -4984.80 17876.00 0.7211 527 8938 .0590 0.4365
10 -3882.10 9712.00 0.6003 228 4856 .0470 0.3476
11 -1574.20 3980.00 0.6045 81 1990 .0407 0.3013
12 -867.10 1810.00 0.5209 25 905 .0276 0.2045
13 -97.10 292.00 0.6675 8 146 .0548 0.4057
14 -102.20 126.00 0.1889 1 63 .0159 0.1175
15 -4.00 4.00 0.0000 0 2 .0000 0.0000
16 -2.00 2.00 0.0000 0 1 .0000 0.0000
17 -2.00 2.00 0.0000 0 1 .0000 0.0000
18 -2.00 2.00 0.0000 0 1 .0000 0.0000
19 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: SQL-F27 Rank ClassConsensus)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -10978.40 79238.00 0.8615 11899 39619 .3003 2.2234
2 -11989.40 75720.00 0.8417 7630 37860 .2015 1.4920
3 -13130.50 73586.00 0.8216 5621 36793 .1528 1.1310
4 -17487.10 72714.00 0.7595 4160 36357 .1144 0.8471
5 -19707.60 70364.00 0.7199 3071 35182 .0873 0.6462
6 -20478.80 63034.00 0.6751 2131 31517 .0676 0.5006
7 -16791.60 48112.00 0.6510 1332 24056 .0554 0.4099
8 -13278.10 31530.00 0.5789 722 15765 .0458 0.3390
9 -7510.40 19222.00 0.6093 376 9611 .0391 0.2896
10 -5513.70 10604.00 0.4800 168 5302 .0317 0.2346
11 -1899.00 4404.00 0.5688 67 2202 .0304 0.2253
12 -649.80 1924.00 0.6623 27 962 .0281 0.2078
13 -237.80 332.00 0.2837 3 166 .0181 0.1338
14 -98.00 118.00 0.1695 1 59 .0169 0.1255
15 -4.00 4.00 0.0000 0 2 .0000 0.0000
16 -2.00 2.00 0.0000 0 1 .0000 0.0000
17 -2.00 2.00 0.0000 0 1 .0000 0.0000
18 -4.00 4.00 0.0000 0 2 .0000 0.0000
19 0.00 0.00 0.0000 0 0 .0000 0.0000
`
`
`
By: SQL-F13 Rank (FigConsensus)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -10769.50 77654.00 0.8613 11796 38827 .3038 2.2491
2 -11723.50 75606.00 0.8449 7720 37803 .2042 1.5118
3 -14431.60 75016.00 0.8076 5645 37508 .1505 1.1142
4 -17156.50 74436.00 0.7695 4243 37218 .1140 0.8440
5 -21554.50 71898.00 0.7002 3123 35949 .0869 0.6431
6 -19991.60 64064.00 0.6879 2208 32032 .0689 0.5103
7 -19577.30 47826.00 0.5907 1206 23913 .0504 0.3734
8 -11148.90 30848.00 0.6386 707 15424 .0458 0.3393
9 -6984.60 18140.00 0.6150 334 9070 .0368 0.2726
10 -4610.80 9778.00 0.5285 143 4889 .0292 0.2165
11 -1204.20 3700.00 0.6745 57 1850 .0308 0.2281
12 -384.70 1338.00 0.7125 18 669 .0269 0.1992
13 -91.00 258.00 0.6473 5 129 .0388 0.2869
14 -5.50 224.00 0.9754 3 112 .0268 0.1983
15 -76.00 76.00 0.0000 0 38 .0000 0.0000
16 -44.00 44.00 0.0000 0 22 .0000 0.0000
17 -2.00 2.00 0.0000 0 1 .0000 0.0000
18 -2.00 2.00 0.0000 0 1 .0000 0.0000
19 -4.00 4.00 0.0000 0 2 .0000 0.0000
`
`
`
By: SQL-F08 Rank (CFA)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -11012.90 74122.00 0.8514 9064 37061 .2446 1.8106
2 -10746.10 74908.00 0.8565 7711 37454 .2059 1.5242
3 -15229.80 75368.00 0.7979 6125 37684 .1625 1.2033
4 -19231.90 75788.00 0.7462 4872 37894 .1286 0.9518
5 -20626.10 73304.00 0.7186 3723 36652 .1016 0.7520
6 -21485.30 65360.00 0.6713 2595 32680 .0794 0.5879
7 -16782.40 48194.00 0.6518 1576 24097 .0654 0.4842
8 -11337.10 31064.00 0.6350 838 15532 .0540 0.3994
9 -7423.70 17878.00 0.5848 417 8939 .0466 0.3454
10 -3487.50 9524.00 0.6338 193 4762 .0405 0.3000
11 -1927.10 4010.00 0.5194 70 2005 .0349 0.2585
12 -349.60 1166.00 0.7002 20 583 .0343 0.2540
13 -72.70 178.00 0.5916 4 89 .0449 0.3327
14 -40.00 40.00 0.0000 0 20 .0000 0.0000
15 -10.00 10.00 0.0000 0 5 .0000 0.0000
16 0.00 0.00 0.0000 0 0 .0000 0.0000
17 0.00 0.00 0.0000 0 0 .0000 0.0000
18 0.00 0.00 0.0000 0 0 .0000 0.0000
19 0.00 0.00 0.0000 0 0 .0000 0.0000
`
`
`
By: SQL-F07 Rank (FormConsensus)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -12404.90 78976.00 0.8429 7657 39488 .1939 1.4355
2 -14105.80 75812.00 0.8139 6466 37906 .1706 1.2628
3 -16876.20 74216.00 0.7726 5675 37108 .1529 1.1322
4 -18508.80 73120.00 0.7469 5213 36560 .1426 1.0556
5 -19375.50 70562.00 0.7254 4491 35281 .1273 0.9424
6 -19069.90 63056.00 0.6976 3454 31528 .1096 0.8110
7 -16480.00 47434.00 0.6526 2090 23717 .0881 0.6524
8 -9940.90 31622.00 0.6856 1178 15811 .0745 0.5516
9 -5365.00 19072.00 0.7187 594 9536 .0623 0.4611
10 -4432.60 10330.00 0.5709 262 5165 .0507 0.3755
11 -1852.50 4390.00 0.5780 97 2195 .0442 0.3272
12 -959.10 1874.00 0.4882 28 937 .0299 0.2212
13 -266.30 308.00 0.1354 2 154 .0130 0.0961
14 -114.70 132.00 0.1311 1 66 .0152 0.1122
15 -2.00 2.00 0.0000 0 1 .0000 0.0000
16 -2.00 2.00 0.0000 0 1 .0000 0.0000
17 -4.00 4.00 0.0000 0 2 .0000 0.0000
18 0.00 0.00 0.0000 0 0 .0000 0.0000
19 -2.00 2.00 0.0000 0 1 .0000 0.0000




More to come....


-jp

.




~Edited by: jeff  on:  3/14/2015  at:  9:12:00 PM~

Reply
jeff
3/15/2015
5:31:29 PM
Simplistic Big Picture Probability Estimation:

Suppose for the sake of argument, we are evaluating a horse that is ranked 1st in FigConsensus.

If nothing else is known about the horse: A peek at the above data sample suggests a win probability of approximately 30 percent.

That said, you and I both know that this 30 percent number is certainly not an accurate prob estimate.

For example, by breaking the data in the above sample out by field size - it becomes easy to see that win prob for rank=1 FigConsensus horses on the dirt in a 4 horse race is one thing - while win prob for the same rank=1 FigConsensus on the dirt in a 14 horse race is something else entirely:


query start: 3/14/2015 4:56:50 PM
query end: 3/14/2015 4:57:08 PM
elapsed time: 18 seconds
'
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
'
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF13=1
AND [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2014#
AND INTSURFACE=1
'
'
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 66884.50 67798.70 67029.20
Bet -77654.00 -77654.00 -77654.00
-----------------------------------------------------
P/L -10769.50 -9855.30 -10624.80
'
Wins 11796 19630 24380
Plays 38827 38827 38827
PCT .3038 .5056 .6279
'
ROI 0.8613 0.8731 0.8632
Avg Mut 5.67 3.45 2.75
'
'
By: Field Size
'
Value P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 0.00 0.00 0.0000 0 0 .0000 0.0000
2 0.00 8.00 1.0000 3 4 .7500 2.4687
3 -14.40 108.00 0.8667 28 54 .5185 1.7067
4 -159.60 1300.00 0.8772 281 650 .4323 1.4230
5 -1052.40 7060.00 0.8509 1314 3530 .3722 1.2252
6 -2502.90 16380.00 0.8472 2724 8190 .3326 1.0948
7 -2093.30 17916.00 0.8832 2809 8958 .3136 1.0321
8 -1916.60 13842.00 0.8615 1942 6921 .2806 0.9236
9 -1334.00 9308.00 0.8567 1249 4654 .2684 0.8834
10 -989.60 6910.00 0.8568 888 3455 .2570 0.8460
11 -460.80 2680.00 0.8281 311 1340 .2321 0.7639
12 -186.00 1800.00 0.8967 219 900 .2433 0.8009
13 -66.70 208.00 0.6793 16 104 .1538 0.5064
14 -4.40 130.00 0.9662 10 65 .1538 0.5064
15 6.20 2.00 4.1000 1 1 1.0000 3.2915
16 0.00 0.00 0.0000 0 0 .0000 0.0000
17 0.00 0.00 0.0000 0 0 .0000 0.0000
18 0.00 0.00 0.0000 0 0 .0000 0.0000
19 5.00 2.00 3.5000 1 1 1.0000 3.2915



So, knowing that field size changes things... What happens if we control for field size by narrowing things to 7 and 8 horse fields only?

And from there break the data out by FigConsensus numeric value?:


query start: 3/14/2015 5:05:19 PM
query end: 3/14/2015 5:05:26 PM
elapsed time: 7 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
`
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF13=1
AND FIELDSIZE >= 7
AND FIELDSIZE <= 8
AND [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2014#
AND INTSURFACE=1
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 27748.10 27867.30 28128.50
Bet -31758.00 -31758.00 -31758.00
-----------------------------------------------------
P/L -4009.90 -3890.70 -3629.50
`
Wins 4751 7895 10206
Plays 15879 15879 15879
PCT .2992 .4972 .6427
`
ROI 0.8737 0.8775 0.8857
Avg Mut 5.84 3.53 2.76
`
`
By: SQL-F13 (FigConsensus) Numeric Value:
`
>=Min < Max P/L Bet Roi Wins Plays Pct Impact
--------------------------------------------------------------------------------------
-999.00 65.00 0.00 0.00 0.0000 0 0 .0000 0.0000
65.00 67.50 32.00 16.00 3.0000 2 8 .2500 0.8356
67.50 70.00 -47.10 62.00 0.2403 2 31 .0645 0.2156
70.00 72.50 -194.70 644.00 0.6977 56 322 .1739 0.5813
72.50 75.00 -142.10 1102.00 0.8711 110 551 .1996 0.6672
75.00 77.50 -570.70 3458.00 0.8350 388 1729 .2244 0.7500
77.50 80.00 -458.90 3406.00 0.8653 435 1703 .2554 0.8537
80.00 82.50 -714.50 6478.00 0.8897 887 3239 .2738 0.9153
82.50 85.00 -475.30 4672.00 0.8983 739 2336 .3164 1.0573
85.00 87.50 -864.90 6184.00 0.8601 1017 3092 .3289 1.0993
87.50 90.00 -305.70 2840.00 0.8924 549 1420 .3866 1.2922
90.00 92.50 -268.00 2896.00 0.9075 566 1448 .3909 1.3064
92.50 95.00 0.00 0.00 0.0000 0 0 .0000 0.0000
95.00 97.50 0.00 0.00 0.0000 0 0 .0000 0.0000
97.50 100.00 0.00 0.00 0.0000 0 0 .0000 0.0000
100.00 102.50 0.00 0.00 0.0000 0 0 .0000 0.0000
102.50 105.00 0.00 0.00 0.0000 0 0 .0000 0.0000
105.00 107.50 0.00 0.00 0.0000 0 0 .0000 0.0000
107.50 110.00 0.00 0.00 0.0000 0 0 .0000 0.0000
110.00 999999.00 0.00 0.00 0.0000 0 0 .0000 0.0000


The above data sample makes it easy to see that even though we are controlling for field size...

Win prob for low FigConsensus numeric value (at say 67.5 to 70) is one thing - while win prob for FigConsensus high numeric value ( at say 90 plus) is much higher.


What if we take the same rank=1 FigConsensus data on Main outer dirt courses while controlling for field size (7 & 8 horse fields only) and break things out by odds range?:


query start: 3/14/2015 5:54:01 PM
query end: 3/14/2015 5:54:09 PM
elapsed time: 8 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
`
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF13=1
AND FIELDSIZE >= 7
AND FIELDSIZE <= 8
AND [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2014#
AND INTSURFACE=1
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 27748.10 27867.30 28128.50
Bet -31758.00 -31758.00 -31758.00
-----------------------------------------------------
P/L -4009.90 -3890.70 -3629.50
`
Wins 4751 7895 10206
Plays 15879 15879 15879
PCT .2992 .4972 .6427
`
ROI 0.8737 0.8775 0.8857
Avg Mut 5.84 3.53 2.76
`
`
By: Odds Range
`
>=Min < Max P/L Bet Roi Wins Plays Pct Impact
--------------------------------------------------------------------------------------
-999.00 0.00 0.00 0.00 0.0000 0 0 .0000 0.0000
0.00 0.50 -168.50 1266.00 0.8669 419 633 .6619 2.2123
0.50 1.00 -440.40 4332.00 0.8983 1139 2166 .5259 1.7575
1.00 1.50 -670.90 4734.00 0.8583 927 2367 .3916 1.3089
1.50 2.00 -514.60 4184.00 0.8770 683 2092 .3265 1.0912
2.00 2.50 -468.90 3444.00 0.8639 467 1722 .2712 0.9064
2.50 3.00 -342.00 2740.00 0.8752 325 1370 .2372 0.7929
3.00 3.50 -200.40 2036.00 0.9016 220 1018 .2161 0.7223
3.50 4.00 -273.70 1474.00 0.8143 128 737 .1737 0.5805
4.00 4.50 -45.90 1214.00 0.9622 113 607 .1862 0.6222
4.50 5.00 -194.80 970.00 0.7992 68 485 .1402 0.4686
5.00 5.50 -237.80 756.00 0.6854 42 378 .1111 0.3714
5.50 6.00 -4.60 646.00 0.9929 48 323 .1486 0.4967
6.00 6.50 28.30 488.00 1.0580 36 244 .1475 0.4931
6.50 7.00 -130.70 400.00 0.6733 18 200 .0900 0.3008
7.00 7.50 -84.20 372.00 0.7737 18 186 .0968 0.3234
7.50 8.00 -82.20 308.00 0.7331 13 154 .0844 0.2821
8.00 8.50 -42.80 264.00 0.8379 12 132 .0909 0.3038
8.50 9.00 2.50 250.00 1.0100 13 125 .1040 0.3476
9.00 999999.00 -138.30 1880.00 0.9264 62 940 .0660 0.2204


Obviously, even though we are controlling for field size...

When the odds for rank=1 FigConsensus are low, win prob is high - and when the odds are high win prob is low.



One more data query - and then I'll get to the point.

What if we take the same rank=1 FigConsensus on the dirt while controlling for field size and break the data out by ClassConsensus rank?:


query start: 3/14/2015 5:58:42 PM
query end: 3/14/2015 5:58:50 PM
elapsed time: 8 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
`
SQL: SELECT * FROM STARTERHISTORY
WHERE RANKF13=1
AND FIELDSIZE >= 7
AND FIELDSIZE <= 8
AND [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2014#
AND INTSURFACE=1
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 27748.10 27867.30 28128.50
Bet -31758.00 -31758.00 -31758.00
-----------------------------------------------------
P/L -4009.90 -3890.70 -3629.50
`
Wins 4751 7895 10206
Plays 15879 15879 15879
PCT .2992 .4972 .6427
`
ROI 0.8737 0.8775 0.8857
Avg Mut 5.84 3.53 2.76
`
`
By: SQL-F27 Rank (ClassConsensus)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -1796.60 15292.00 0.8825 2788 7646 .3646 1.2187
2 -615.80 7186.00 0.9143 1010 3593 .2811 0.9395
3 -794.00 4224.00 0.8120 491 2112 .2325 0.7770
4 -285.30 2392.00 0.8807 234 1196 .1957 0.6539
5 -377.00 1424.00 0.7353 122 712 .1713 0.5727
6 -87.90 740.00 0.8812 61 370 .1649 0.5510
7 -68.10 406.00 0.8323 37 203 .1823 0.6092
8 14.80 94.00 1.1574 8 47 .1702 0.5689
9 0.00 0.00 0.0000 0 0 .0000 0.0000
10 0.00 0.00 0.0000 0 0 .0000 0.0000
11 0.00 0.00 0.0000 0 0 .0000 0.0000
12 0.00 0.00 0.0000 0 0 .0000 0.0000
13 0.00 0.00 0.0000 0 0 .0000 0.0000
14 0.00 0.00 0.0000 0 0 .0000 0.0000
15 0.00 0.00 0.0000 0 0 .0000 0.0000
16 0.00 0.00 0.0000 0 0 .0000 0.0000
17 0.00 0.00 0.0000 0 0 .0000 0.0000
18 0.00 0.00 0.0000 0 0 .0000 0.0000
19 0.00 0.00 0.0000 0 0 .0000 0.0000



Again, even though we are controlling for field size...

When ClassConsensus rank=1, win prob is one thing - but when ClassConsensus rank is 5-6-7-8: win prob morphs into something much lower.




The point I'm trying to make here is this:

All by itself (if no other information is known) rank=1 FigConsensus on the dirt has a win prob of approximately 30 percent.

But each time some new piece of information is added: The picture changes.

That in itself provides a valuable clue.

The key, in my humble opinion, is finding the right piece(s) of information - or better yet - creating your own custom data points that no one else has - and/or combining your data points in a unique way.

Obviously, one of the many things a model has to be capable of is generating accurate probabilities.

But it goes a little deeper than that:

The model has to be robust. Not only should it be grounded in sound mathematical probability theory, it should also be able to handle most of the situations faced by the player during live play each day.




I'll stop here (for now) and come back as free time permits.



-jp

.


~Edited by: jeff  on:  3/15/2015  at:  5:31:29 PM~

Reply
jeff
3/15/2015
5:54:22 PM
Big Picture Prob Estimation - moving beyond the simplistic:

Focusing on the last data sample presented above, one way of moving beyond simplistic prob estimation, at least a way that seems intuitive to me, and one that allows mapping out the data points describing a probability distribution for something like the FigConsensus rank=1 and ClassConsensus rank matrix presented above might be to create a Decision Forest or series of Decision Trees.

In such a Forest, each Tree in the Forest can take a very simple form: That of the rows from a matrix like the one presented above.

For example:

The first tree in the forest describing the intersection of FigConsensus Rank=1 with ClassConsensus Rank is the row from the matrix above describing rank=1 for ClassConsensus:
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
-----------------------------------------------------------------------
1 -1796.60 15292.00 0.8825 2788 7646 .3646 1.2187


The second tree in the forest describing the intersection of FigConsensus Rank=1 with ClassConsensus Rank is the row from the matrix above describing rank=2 for ClassConsensus:
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
-----------------------------------------------------------------------
2 -615.80 7186.00 0.9143 1010 3593 .2811 0.9395


The third tree in the forest describing the intersection of FigConsensus Rank=1 with ClassConsensus Rank is the row from the matrix above describing rank=3 for ClassConsensus:
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
-----------------------------------------------------------------------
3 -794.00 4224.00 0.8120 491 2112 .2325 0.7770


And, of course, the 4th through Xth trees in the forest would be the remaining rows in the matrix.

Speaking from personal experience I have been able to make the decision tree approach work.

The advantages are:

  1. In its simplest form it's highly intuitive.


  2. It's fairly easy to implement.

    Several individual rows from a matrix (having desirable win rate and roi) are observed in the Data Window.

    From there a series of decision trees are created in the mind's eye (or whiteboarded.)

    This approach lends itself well to UDMs.

    If you want to go beyond UDMs and create an actual Model:

    The decision trees can be coded out as a 'model' where the inputs (odds, rankF0X, ValF0Y, GapF0Z, etc.) are passed to a 'function' programmed to essentially recreate each row from the matrix and return a prob estimate based on the inputs received.

    Hint: In the absence of matrix rows that have been individually coded out: The prob estimate generated by the function can be the result of mathematical calculation. (This is what stat packages such as R, SAS, and SPSS do. )

    That said, in its simplest form, with no requirement whatsoever that you know higher math: In a Decision Tree Model a prob estimate generated by the function based on the inputs fed into it can be the same as the prob estimates taken from individual rows displayed in a Data Window factor breakout matrix.

    From there the prob estimate is converted to a strike price (or min required odds) using the following formula:

    RequiredOdds = (1/ProbEstimate) -1

    From there, during live play, RequiredOdds for horses the player is thinking about betting are compared to actual odds as the field is facing up to the gate (the later into the loading process you can wait the better) - and (hopefully) intelligent play or pass decisions are made.



The disadvantages are:

  1. You have to whiteboard and code out a LOT of trees (if you hope to cover most of the situations you will encounter during live play.)


  2. Even when you put a LOT of effort into creating a LOT of trees, you always encounter cases during live play that you hadn't considered before - and therefore have no trees that cover the individual case in front of you.

    Examples:

    • A race is taken off the turf and run at some oddball distance not seen before in the data set you used to create your trees.

    • A new track code appears out of the blue or becomes available at an ADW where you have an account. BOI-BTP-FON-HPX-LRC-MVR might be examples of this that many of you encountered in the past year.

    With each new track code, you have to acquire enough data to have a relevant sample - and from there - perform Data Window R&D, see something in the Data Window, and create new trees - possibly for each surface-distance configuration - running at that track code from scratch.

    • A track makes a surface switch - making your existing trees useless in the process. In recent years APX-DMR-GGX-KEE-SAX-TPX-WOX removing their dirt surfaces and putting in synthetic would be examples of this. KEE-SAX scrapping synthetic and going back to dirt would also be examples of this. DMR removng their synthetic surface and going back to dirt and WOX replacing their Polytrack surface with Tapeta are both soon to be examples of this. Handling these types of changes basically requires you to redo your trees from scratch.

    • Your datasets were based on races from the last 6 months of the prior year - a large enough sample that you (mistakenly) believed it would contain at least a few races representing every possible case you were likely to encounter.

    However, because your data was from the second half of the previous year - that dataset failed to contain a single race where every starter in the field was a 2 yr old first time starter. Fast forward to April of the following year and you are using your trees as the basis for live play decision making and you are faced with a field full of 2 yr old first time starters at KEE and it hits you: There is 8 mtp, the race in front of you is part of a potentially large paying pick3 or pick4 sequence that you very much want to play. But you have no possible basis for using your trees to make an informed decision in that race.


  3. The game itself is something that is slowly yet constantly evolving.

    Your Trees have a shelf life of some unknown duration beforehand when you create them.

    The game's evolution process means other players will (eventually) catch on to the same things you observed when you created your trees. If and when that happens you have to scrap your Trees and create new ones from scratch.


  4. Did I mention you have to whiteboard and code out a LOT of trees?

    Keep in mind that the matrices presented in the above samples are very simplistic. Now suppose for the sake of argument (to keep things simple) you want your Model to use rank only and you want it to cover field sizes of 5-14 horses. Based on that you have 10 rows per matrix. Let's further suppose that we want the Model to include just 5 factors: EarlyConsensus, LateConsensus, ClassConsensus, FigConsensus, and FormConsensus.

    Sounds pretty simple, right?

    Believe it or not such a Model expressed as a Decision Forest could include up to 10 x 10 x 10 x 10 x 10 (or 100,000) individual trees.

    That's a LOT of Trees. Far too many to code out by hand... And that's if none of your trees are track-surface-distance specific!

    Which makes a nice lead in as to why you might want to employ a stat package such as R, SAS, or SPSS, etc.




I'll stop here (for now) and add to this as free time permits.


-jp

.






~Edited by: jeff  on:  3/15/2015  at:  5:54:22 PM~

Reply
NYMike
3/16/2015
12:51:22 PM
Jeff,
This is terrific. Keep writing as you have time. Also, everything you are saying makes sense. You don't mention anything outside of JCapper yet. Without causing you to spend too much more time, can you give a snap shot of what you can't see in the data window?


NY Mike

~Edited by: NYMike  on:  3/16/2015  at:  12:51:22 PM~

Reply
jeff
3/17/2015
1:17:44 PM
I used a custom sql expression in the JCapper JCX File Exports Module to create a .csv file named "tam7f2014.csv" on my c:\JCapper\Exe folder.

The file contains id, horsename, [date], track, race, fieldsize, winpayoff, odds, valf19 or EarlyConsensus, valf22 or LateConsensus, valf27 or ClassConsensus, valf13 or FigConsensus, and valf07 or FormConsensus data for 7 furlong dirt races (with zero first time starters) that were run at Tampa Bay Downs during calendar year 2014.

The sql expression is as follows:

SELECT id, horsename, [date], track, race, fieldsize, winpayoff, odds, valf19, valf22, valf27, valf13, valf07 FROM starterhistory

WHERE track='tam'
and dist = 1540
and intsurface = 1
and ftscount = 0

and [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2014#
ORDER BY [DATE], TRACK, RACE, officialfinishposition


From there I opened the resulting .csv file in Excel 2010, and performed the following steps to "clean up" the data:

  1. I converted the $2.00 win mutuels as reported in the Equibase Charts from the actual payoff to an integer value of 1 for winning horses and converted the emply (or zero length string) values for losing horses to an integer value of 0.

    Note: I did this because in Logistic Regression you are evaluating the likelihood of two possible outcomes. A horse can either win a race (1) or lose a race (0.)


  2. I converted the text names of all horses in the .csv file to a unique number. To accomplish this I simply gave the first horse in the file a name of 1, the second horse in the file a name of 2, and the third horse in the file a name of 3. I kept incrementing the names of each horse in the file by 1 until every horse in the file had a unique number instead of a name.

    I did this because the MLogit package in R (at least the way I am using it) requires that each row in the dataset have both a primary and a secondary unique identifier. In my case the id field is the primary unique identifier and the name field (with the names replaced by sequential numbers) is the secondary unique identifier.



I then launched R and after the interface came up keyed in the following commands which caused R's CsvRead and MLogit packages to be loaded into memory:

library(csvread)

library(mlogit)


Next I keyed the following commands into the R interface which caused my .csv file to be read into memory and the fields in the file to be mapped:

y <- read.csv("c:/jcapper/exe/tam7f2014.csv")

map.coltypes("c:/jcapper/exe/tam7f2014.csv", header = TRUE, nrows = 100, delimiter = ",")


Next, I keyed the following commands into the R interface which caused the MLogit package to use Logistic Regression to generate beta coefficients (or curve fit the data) in the Odds - FigConsensusNumericValue - ClassConsensusNumericValue matrices as they exist in my tam7f2014.csv file:

x <- mlogit.data(y,choice="winpayoff",shape="long",id.var="id",alt.var="horsename")

summary(mlogit(winpayoff ~ odds + valf13 + valf27 -1, data = x))


It took a few minutes, but once the task completed, the generated output looked like this:


nr method
7 iterations, 0h:0m:3s
g'(-H)^-1g = 2.52E+06
last step couldn't find higher value
`
Coefficients :
Estimate Std. Error t-value Pr(>|t|)
odds 0.067211 0.051214 1.3123 0.1894
valf13 0.025159 0.118567 0.2122 0.8320
valf27 0.259931 0.208039 1.2494 0.2115
`
Log-Likelihood: -1.9903



Generally, the lower the value in the Pr(>|t|) column for a given factor the better. Or more specifically, the lower the value in the Pr(>|t|) column for a given factor, the more significance that factor has in your model.

Here, I basically have a model for 7f dirt races at TAM based on 3 factors only: Odds, FigConsensus, and ClassConsensus.

In that model, Odds has the greatest degree of significance, followed closely by ClassConsensus second, followed by FigConsensus (which appears to lag well behind the other two as a distant last.)

FWIW, my interpretation of the above report is that among the 2 factors in the model other than the odds: FigConsensus is of little value because it is so strongly reflected in the odds (at least at 7f on the dirt at TAM.)


-jp

.



~Edited by: jeff  on:  3/17/2015  at:  1:17:44 PM~

Reply
jeff
3/17/2015
1:11:07 PM
In an above post I said the following:


--quote:
"In the absence of matrix rows that have been individually coded out: The prob estimate generated by the function can be the result of mathematical calculation."
--end quote


I also said the following about a rank only 5 factor model:


--quote:
"such a Model expressed as a Decision Forest could include up to 10 x 10 x 10 x 10 x 10 (or 100,000) individual trees.

That's a LOT of Trees. Far too many to code out by hand... And that's if none of your trees are track-surface-distance specific!"
--end quote


If you don't want to create 100k trees manually - and keep in mind that 100k trees is a conservative estimate if your intent is to create a robust model...

Then this is where a stat package like R, SAS, or SPSS, etc. can help you.

For example, in R, the MLogit logicstic regression package is designed to read data into memory from a properly formatted external data source (database table or .csv file.)

From there, the MLogit package in R uses a maximum likelihood function to mathematically calculate (or map out) the data points between two or more matrices like those found in the FigConsensusRank - ClassConsensusRank example presented above.

Once the data points are mapped out, the MLogit package in R will generate a report showing the Beta Coefficients for the factors in your model that you told it to evaluate.

The math in a logistic regression max likelihood algorithm is quite involved. But try not to let that (or the terminology) throw you.

In layman's terms, the algorithm is doing nothing more than plotting a curve describing the data points in your matrices.

Once you have Beta Coefficients for the factors in your model - and provided you are using significant factors in your model - the next step is to plug them in to a formula similar to the one used by Wong in chapter 5 (titled "Winning Probability and Fair Odds") of Precision and start generating probabilty estimates for the horses in a given race.



-jp

.


Reply
edcondon
3/18/2015
5:50:34 AM
"From there, during live play, RequiredOdds for horses the player is thinking about betting are compared to actual odds as the field is facing up to the gate (the later into the loading process you can wait the better) - and (hopefully) intelligent play or pass decisions are made."

For Me, this is the problem. +EV quickly turns negative with a flash of the tote (and visa versa). Deciding to bet or not while your EV teeters from + to - and while two other races are loading. Conditional Bets @ 0 to post create a lot of "whipsaw" (usually working against me).




Reply
NYMike
3/18/2015
4:30:09 PM
Jeff,
This is great stuff. Thanks.

Mike

Reply
jeff
3/20/2015
11:47:55 AM
In an above post I mentioned "Situational Data Window samples for early, late, and perhaps railposition/gate draw" as an area of interest when it comes to prob estimates.

I wanted to post some examples of what I mean by that in this thread.

Looking at today's races - Fri March 20th, 2015 - I see that I have a number of UDMs converging on the #1 SCONSET EXPRESS in LRL R3. (A little more than 50 mtp as I start to type this.)

My UDMs aren't track specific and are mostly emphasizing speed-pace-form. Because of that, the situational queries I am going to run will look at areas not covered by the 'trees' in my UDMs.

The first thing that concerns me here is RailPosition. The LRL dirt surface has had a dead rail lately and SCONSET EXPRESS drew the 1 hole.

Looking at the most recent 600 starters in my StarterHistory table (which begins flagging horses sometime in mid December 2015 going forward to yesterday) at the 6f distance on LRL dirt, with the data broken out by RailPosition, I show the following:

query start: 3/20/2015 9:09:36 AM
query end: 3/20/2015 9:09:37 AM
elapsed time: 1 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
`
SQL: SELECT TOP 600 * FROM STARTERHISTORY
WHERE TRACK='LRL'
AND INTSURFACE = 1
AND DIST = 1320
AND [DATE] >= #01-01-2014#
AND [DATE] <= #03-19-2015#
ORDER BY [DATE] DESC
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 1073.60 953.90 988.90
Bet -1224.00 -1224.00 -1224.00
-----------------------------------------------------
P/L -150.40 -270.10 -235.10
`
Wins 83 163 242
Plays 612 612 612
PCT .1356 .2663 .3954
`
ROI 0.8771 0.7793 0.8079
Avg Mut 12.93 5.85 4.09
`
`
By: Rail Position
`
Rail Pos P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -89.20 160.00 0.4425 8 80 .1000 0.7373 <---
`
2 -5.60 160.00 0.9650 11 80 .1375 1.0139
3 -50.80 160.00 0.6825 9 80 .1125 0.8295
4 15.80 160.00 1.0988 12 80 .1500 1.1060
5 26.60 158.00 1.1684 16 79 .2025 1.4934
6 -41.60 154.00 0.7299 7 77 .0909 0.6703
7 21.40 132.00 1.1621 11 66 .1667 1.2289
8 8.60 76.00 1.1132 7 38 .1842 1.3583
9 -3.60 32.00 0.8875 2 16 .1250 0.9217
10 -22.00 22.00 0.0000 0 11 .0000 0.0000
11 -6.00 6.00 0.0000 0 3 .0000 0.0000
12 -2.00 2.00 0.0000 0 1 .0000 0.0000
13 -2.00 2.00 0.0000 0 1 .0000 0.0000



I can't help but notice the terrible roi for the 1 hole at 6f on the LRL dirt surface. The 10% win rate and 0.73 impact value aren't that terrible.

That said, my interpretation of the above sample is that the public hasn't yet caught on that the rail is bad and therefore has been betting horses from the 1 hole at LRL as if it were a good thing.

Based on that I'm not expecting to get a serious deviation upward from the 5-1 MLine.


-jp

.


Reply
jeff
3/20/2015
12:17:22 PM
None of the business UDMs that I have converging on #1 SCONSET EXPRESS in LRL R3 contain factor constraints (trees) desiged to handle rider and trainer.

I generally use layering UDMs for that.

That said, the game changes over time - and staying on top of who's riding and training well (and where) takes WORK.

Lately, rather than continually updating UDMs like I used to I've taken up the practice of running situational queries in the Data Window to get the layering info I want for horses I am thinking about betting.

Widening the query parameters a bit (removing the track, surface, and dist restrictions) here's what I get for today's trainer:

query start: 3/20/2015 9:59:26 AM
query end: 3/20/2015 9:59:26 AM
elapsed time: 0 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
`
SQL: SELECT TOP 600 * FROM STARTERHISTORY
WHERE TRAINER = 'MCCARTHY KEVIN'
AND [DATE] >= #01-01-2014#
AND [DATE] <= #03-19-2015#
ORDER BY [DATE] DESC
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 120.00 120.50 133.60
Bet -192.00 -192.00 -192.00
-----------------------------------------------------
P/L -72.00 -71.50 -58.40
`
Wins 17 31 42
Plays 96 96 96
PCT .1771 .3229 .4375
`
ROI 0.6250 0.6276 0.6958
Avg Mut 7.06 3.89 3.18
`
`
****************************************************************************************
BY TRACK sorted by Track Code Run Date: 3/20/2015 9:59:26 AM
****************************************************************************************
WIN WIN WIN PLACE PLACE
TRACK PLAYS WINS PCT IMPACT ROI PLACES PCT ROI
****************************************************************************************
ATL 1 1 1 5.6471 4.1 1 1 2.4
CTX 5 0 0 0 0 2 0.4 1.14
LRL 38 1 0.0263 0.1485 0.1789 2 0.0526 0.1868
PEN 1 0 0 0 0 0 0 0
PHA 1 1 1 5.6471 4.7 1 1 1.6
SAR 1 0 0 0 0 0 0 0
SUF 39 14 0.359 2.0273 1.1385 25 0.641 1.1141
TAM 10 0 0 0 0 0 0 0
****************************************************************************************
8 Track Codes from file: StarterHistory Table
****************************************************************************************



Using the same query, here's what I get for today's rider:

query start: 3/20/2015 10:01:43 AM
query end: 3/20/2015 10:01:44 AM
elapsed time: 1 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
`
SQL: SELECT TOP 600 * FROM STARTERHISTORY
WHERE RIDER = 'ALMODOVAR GERALD'
AND [DATE] >= #01-01-2014#
AND [DATE] <= #03-19-2015#
ORDER BY [DATE] DESC
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 980.00 865.00 899.40
Bet -1202.00 -1202.00 -1202.00
-----------------------------------------------------
P/L -222.00 -337.00 -302.60
`
Wins 132 206 284
Plays 601 601 601
PCT .2196 .3428 .4725
`
ROI 0.8153 0.7196 0.7483
Avg Mut 7.42 4.20 3.17
`
`
****************************************************************************************
BY TRACK sorted by Track Code Run Date: 3/20/2015 10:01:44 AM
****************************************************************************************
WIN WIN WIN PLACE PLACE
TRACK PLAYS WINS PCT IMPACT ROI PLACES PCT ROI
****************************************************************************************
CTX 559 126 0.2254 1.0263 0.8496 199 0.356 0.7497
LRL 10 0 0 0 0 1 0.1 0.4
MNR 10 4 0.4 1.8212 1.14 4 0.4 0.67
PEN 2 1 0.5 2.2765 0.9 1 0.5 0.65
PHA 2 1 0.5 2.2765 0.95 1 0.5 0.7
TIM 18 0 0 0 0 0 0 0
****************************************************************************************
6 Track Codes from file: StarterHistory Table
****************************************************************************************




My interpretation of the above samples would be that while today's trainer and rider have been able to enjoy a measure of succeess at smaller circuit tracks such as CTX-MNR-PEN-PHA-SUF, etc...

They have not, to date, been able to crack the bigger/tougher Maryland circuit tracks such as LRL-PIM-TIM.

My reaction to this is as follows:

The prob models (that do not consider any of the situational info posted above) inherent in my UDMs would suggest a strike price in the 7/2 to 4/1 range.

After running the above queries I am going to need 6/1 to 7/1 before pulling the trigger on this horse.



-jp

.






Reply
jeff
3/20/2015
12:26:32 PM
With 1 mtp and odds for the #1 horse at LRL in R3 hovering at about 3/1... my decision is to pass.


-jp

.

Reply
NYMike
3/20/2015
12:27:29 PM
I'm assuming at 5/2 with a minute to go you will watch this one!

Mike

Reply
jeff
3/20/2015
12:43:40 PM
Result?

In LRL R3 #1 SCONSET EXPRESS set the pace while the rider kept her glued to a dead rail. Not surprisingly, she began shortening stride as they turned for home and all but stopped in the stretch.

The win prob estimates inherent in the situational queries were a lot lower than those suggested by my speed-pace-form UDMs.

Merging the two together caused me to revise my strike price upward - and allowed me to avoid making an otherwise bad bet.


-jp

.

Reply
NYMike
3/20/2015
12:48:46 PM
Jeff,
Having watched that race, how do you look at it? Do you see a horse with a reasonable chance to win, good Fig, Early, Form but poor late? At decision time he was overbet and the actual odds were still too low. For the race itself, he led, was not pushed by a contender, and still the poor late ability stopped him. Do you see this as one of the expected random walk outcomes that over time will closely resemble the percentages you felt he had to win?

And to judge as to whether this was a good bet or a bad bet, is result irrelevant to you? At 3/1 it was a bad bet and at 7/1 it was a good bet assuming the same outcome?

NY


Reply
jeff
3/20/2015
1:44:28 PM

--quote:
"Having watched that race, how do you look at it? Do you see a horse with a reasonable chance to win, good Fig, Early, Form but poor late?"
--end quote


Having just watched the race I see a horse with a reasonable chance of showing a complete form reversal if one or more of the following were to take place:

• Change to a surface where the rail is good. For example, a night at CTX when the track is dry.

• Change in RailPosition/Gate Draw. Especially if she races again at LRL. Maybe next time out draws a middle post - better if outside whatever other speed is in the race - and because of the way she stopped in today's race - likely goes off at higher odds.

• Rider Change - In my opinion, Almodovar didn't do the horse, the connections (or the bettors for that matter) any favors by going to the lead while keeping her glued to a dead rail. Her "S" HDW Runstyle indicates she's not a need the lead type. Another rider - one more adept at avoiding LRL's dead rail might well have been able to get a very different result out of today's race by sitting off the pace and/or making an effort to tip off the rail and get to a better part of the track surface.



--quote:
"Do you see this as one of the expected random walk outcomes that over time will closely resemble the percentages you felt he had to win?"
--end quote


I see today's result as anything but random - especially in light of the situational queries that I ran.

That said, had I not run the situational queries - then I'd have to base my interpretation on the info built into my UDMs - and that's all I would have had been able to go on at post time.


--EDIT:

So far as a random walk type of thing...

Results from betting are almost always a random walk type of thing. The more accurate your prob estimates the lower the degree of variance. But short of getting to 100% accuracy in your prob estimates - there is always going to be at least some degree of variance in your results.

--end Edit


Remember what I said (or tried to say) earlier in this thread:

Each new (relevant) prob estimate (or piece of info) changes (or should change) your overall prob estimate.




--quote:
"And to judge as to whether this was a good bet or a bad bet, is result irrelevant to you? At 3/1 it was a bad bet and at 7/1 it was a good bet assuming the same outcome?"
--end quote


No. The result is not relevant to me.

However, whether it was a good bet or a bad bet IS relevant to me. Based on my situational queries I set my strike price in the mid 6/1 range.

Had the odds been say 8/1 on my horse facing up to the gate - knowing what I know based on my situational queries - I would have pulled the trigger.




-jp

.


~Edited by: jeff  on:  3/20/2015  at:  1:44:28 PM~

Reply
Charlie James
3/23/2015
10:45:27 PM
Jeff, MUCHAS GRACIAS for the situational query post.

Maybe you recall before we became automated I used to keep tallies for sires and damssires in a spiral notebook. Simpler days back then because I only followed one circuit at a time: SOCAL.

No SOCAL. It's Monday. But with Mnr not cancelling tonight [Ha!] I thought why not.

Maybe you know where this is going already---

Mnr race 1 horse #3 --
#'s from the html report: a sprinkling of 1's, 2's and 3's for this and that plus 20/1 morning line odds.

Right there maybe enough for a smallish bet?? -- especially at a place like Mnr where anything can happen and does.

Normally I'd stop there. But after reading your post about the situational I decided to dig deeper.

So I copy and paste your top 600 railposition sql query into the data window -- I change track to 'mnr' and dist to 1210 and break the data out by railposition:

15 percent winners 0.71 roi. [nothing to see here]

Next I rerun the same query but this time with the data broken out by early consensus rank:

30 percent winners 0.97 roi

And the words speed bias flash before me.

Instantly followed by the words yeah right.

You see in the entire history of my betting life I have proven myself to be expertly adept at seeing speed biases after the fact only.

So let's forget that I mentioned 30 percent winners and 0.97 roi for early consensus -- for the time being.

Now HERE'S the kicker. The reason I'm posting--

When I run a query for the damssire of the #3 horse:

TOP 600 * FROM STARTERHISTORY
WHERE DAMSSIRE='BOUNDARY'
AND INTSURFACE=1
AND DIST=1210
AND [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2015#
ORDER BY [DATE] DESC

13 percent winners and 1.48 roi.

And from my days of breeding stats in a spiral notebook I ask the age old relevant question:

Q. What is this horse bred to do?

A. Apparently run 5.5f furlongs on the dirt.

But HERE'S where it gets interesting --

When I run a query for the damssire of the favorite:

TOP 600 * FROM STARTERHISTORY
WHERE DAMSSIRE='FORESTRY'
AND INTSURFACE=1
AND DIST=1210
AND [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2015#
ORDER BY [DATE] DESC

10 percent winners 0.56 roi.

And from my days of breeding stats in a spiral notebook I ask the flipside of the age old relevant question:

Q. What is the favorite not bred to do?

A. Apparently run 5.5f furlongs on the dirt.

At this point I connect the dots.

Now what I need to know is this:

Did she wire her field because of early consensus and a speed bias?

Or did she do what I think she was bred to do?

Again THANK YOU.

No way do I dig beyond the numbers on the html report without reading your post about the situational.


Reply
jeff
3/24/2015
2:36:33 PM

--quote:
"Now what I need to know is this:

Did she wire her field because of early consensus and a speed bias?

Or did she do what I think she was bred to do? "
--end quote


The short answer:

Sometimes they do a little of both. Admit it.There is such a thing as a speed bias.


Little bit longer answer:

A close look at CPace rank stats (not counting last night's results) suggests the dirt surface for the 5.5f distance at MNR has been playing to early speed so far this meet.

I say that because CPace rank 1 and 2 are both well above historical norms vs. stats for CPace rank 1 and 2 in large samples:


query start: 3/24/2015 9:57:39 AM
query end: 3/24/2015 9:57:39 AM
elapsed time: 0 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
Betting Instructions: Testing Purposes Only
`
UDM: _TrackDateExpression
`
SQL: SELECT * FROM STARTERHISTORY
WHERE TRACK='MNR'
AND INTSURFACE <= 3
AND DIST = 1210
AND [DATE] >= #03-01-2015#
AND [DATE] <= #03-22-2015#
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 279.00 267.00 188.00
Bet -354.00 -354.00 -354.00
-----------------------------------------------------
P/L -75.00 -87.00 -166.00
`
Wins 34 66 49
Plays 177 177 177
PCT .1921 .3729 .2768
`
ROI 0.7881 0.7542 0.5311
Avg Mut 8.21 4.05 3.84
`
`
By: SQL-F20 Rank (CPace)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 6.60 62.00 1.1065 12 31 .3871 2.0152
2 13.20 62.00 1.2129 9 31 .2903 1.5114
`
3 -24.20 62.00 0.6097 5 31 .1613 0.8397
4 -54.60 62.00 0.1194 2 31 .0645 0.3359
5 25.60 52.00 1.4923 4 26 .1538 0.8009
6 -21.60 34.00 0.3647 2 17 .1176 0.6125
7 -14.00 14.00 0.0000 0 7 .0000 0.0000
8 -4.00 4.00 0.0000 0 2 .0000 0.0000
9 -2.00 2.00 0.0000 0 1 .0000 0.0000


Whether or not this trend continues is anyone's guess.

However, if I run the same query through the Data Window with the data broken out as a Track Weight Report I get the following:


query start: 3/24/2015 10:07:22 AM
query end: 3/24/2015 10:07:22 AM
elapsed time: 0 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
Betting Instructions: Testing Purposes Only
`
UDM: _TrackDateExpression
`
SQL: SELECT * FROM STARTERHISTORY
WHERE TRACK='MNR'
AND INTSURFACE <= 3
AND DIST = 1210
AND [DATE] >= #03-01-2015#
AND [DATE] <= #03-22-2015#
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 279.00 267.00 188.00
Bet -354.00 -354.00 -354.00
-----------------------------------------------------
P/L -75.00 -87.00 -166.00
`
Wins 34 66 49
Plays 177 177 177
PCT .1921 .3729 .2768
`
ROI 0.7881 0.7542 0.5311
Avg Mut 8.21 4.05 3.84
`
`
Actual Expected Actual/Expected
Track Weight Analysis Winners Winners Ratio
CPace 123 24.00 29.84 0.8043
CompoundLate 123 17.00 17.41 0.9767
`
`
Win
Roi
1.1065 (CPace 1)
0.7258 (CompoundLate 1)
`
` Actual Expected Actual/Expected
Run Style Analysis Winners Winners Ratio
E 5.00 8.08 0.6187
EP 2.00 2.09 0.9579
P 12.00 14.73 0.8147
S 11.00 5.45 2.0200
NA 1.00 0.66 1.5259
`
Approximate Track Weight: 5.00 (Speed Tiring)


From the above sample I draw your attention to the following:

Approximate Track Weight: 5.00 (Speed Tiring)

Q. How can this be? After all, CPace rank 1 and 2 are off the charts, right?

A. This is what happens when you work with small samples and look at one thing only.



Let's look at the same sample broken out by a few of the program's early and late factors:


query start: 3/24/2015 10:42:34 AM
query end: 3/24/2015 10:42:34 AM
elapsed time: 0 seconds
`
Data Window Settings:
Connected to: C:\JCapper\exe\JCapper2.mdb
999 Divisor Odds Cap: None
Betting Instructions: Testing Purposes Only
`
UDM: _TrackDateExpression
`
SQL: SELECT * FROM STARTERHISTORY
WHERE TRACK='MNR'
AND INTSURFACE <= 3
AND DIST = 1210
AND [DATE] >= #03-01-2015#
AND [DATE] <= #03-22-2015#
`
`
Data Summary Win Place Show
-----------------------------------------------------
Mutuel Totals 279.00 267.00 188.00
Bet -354.00 -354.00 -354.00
-----------------------------------------------------
P/L -75.00 -87.00 -166.00
`
Wins 34 66 49
Plays 177 177 177
PCT .1921 .3729 .2768
`
ROI 0.7881 0.7542 0.5311
Avg Mut 8.21 4.05 3.84
`
`
`
`
EARLY:
`
By: 73 rankForPaceFig_2F_InLast
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -14.40 64.00 0.7750 7 32 .2188 1.1388
2 45.40 60.00 1.7567 11 30 .3667 1.9088
3 3.40 62.00 1.0548 7 31 .2258 1.1755

4 -57.60 66.00 0.1273 2 33 .0606 0.3155
5 -21.20 48.00 0.5583 4 24 .1667 0.8676
6 -34.00 34.00 0.0000 0 17 .0000 0.0000
7 7.40 16.00 1.4625 3 8 .3750 1.9522
8 -2.00 2.00 0.0000 0 1 .0000 0.0000
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: 74 rankForPaceFig_4F_InLast
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -18.00 64.00 0.7188 8 32 .2500 1.3015
2 -25.00 68.00 0.6324 9 34 .2647 1.3780
3 79.60 56.00 2.4214 9 28 .3214 1.6733
4 -45.60 60.00 0.2400 2 30 .0667 0.3471
5 -34.40 56.00 0.3857 4 28 .1429 0.7437
6 -32.00 32.00 0.0000 0 16 .0000 0.0000
7 -8.20 12.00 0.3167 1 6 .1667 0.8676
8 10.60 4.00 3.6500 1 2 .5000 2.6029
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: CompoundAP Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 16.00 62.00 1.2581 14 31 .4516 2.3510
2 -1.80 62.00 0.9710 7 31 .2258 1.1755

3 -30.20 62.00 0.5129 4 31 .1290 0.6717
4 7.00 62.00 1.1129 3 31 .0968 0.5038
5 -30.40 52.00 0.4154 4 26 .1538 0.8009
6 -34.00 34.00 0.0000 0 17 .0000 0.0000
7 -10.20 14.00 0.2714 1 7 .1429 0.7437
8 10.60 4.00 3.6500 1 2 .5000 2.6029
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: CompoundE1 Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -16.20 62.00 0.7387 7 31 .2258 1.1755
2 -0.60 62.00 0.9903 11 31 .3548 1.8472
3 -13.60 62.00 0.7806 5 31 .1613 0.8397
4 -37.00 62.00 0.4032 4 31 .1290 0.6717
5 18.00 52.00 1.3462 4 26 .1538 0.8009
6 -24.00 34.00 0.2941 1 17 .0588 0.3062
7 -10.20 14.00 0.2714 1 7 .1429 0.7437
8 10.60 4.00 3.6500 1 2 .5000 2.6029
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: CompoundE2 Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -13.20 62.00 0.7871 9 31 .2903 1.5114
2 10.00 62.00 1.1613 10 31 .3226 1.6793

3 -9.20 62.00 0.8516 6 31 .1935 1.0076
4 -54.00 62.00 0.1290 2 31 .0645 0.3359
5 27.00 52.00 1.5192 5 26 .1923 1.0011
6 -34.00 34.00 0.0000 0 17 .0000 0.0000
7 -10.20 14.00 0.2714 1 7 .1429 0.7437
8 10.60 4.00 3.6500 1 2 .5000 2.6029
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: CPace Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 6.60 62.00 1.1065 12 31 .3871 2.0152
2 13.20 62.00 1.2129 9 31 .2903 1.5114

3 -24.20 62.00 0.6097 5 31 .1613 0.8397
4 -54.60 62.00 0.1194 2 31 .0645 0.3359
5 25.60 52.00 1.4923 4 26 .1538 0.8009
6 -21.60 34.00 0.3647 2 17 .1176 0.6125
7 -14.00 14.00 0.0000 0 7 .0000 0.0000
8 -4.00 4.00 0.0000 0 2 .0000 0.0000
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: PMI Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 18.80 62.00 1.3032 15 31 .4839 2.5190
2 -22.00 62.00 0.6452 5 31 .1613 0.8397
3 -6.80 62.00 0.8903 7 31 .2258 1.1755
4 14.00 62.00 1.2258 4 31 .1290 0.6717
5 -25.00 52.00 0.5192 3 26 .1154 0.6007
6 -34.00 34.00 0.0000 0 17 .0000 0.0000
7 -20.00 20.00 0.0000 0 10 .0000 0.0000
8 0.00 0.00 0.0000 0 0 .0000 0.0000
9 0.00 0.00 0.0000 0 0 .0000 0.0000
`
`
`
By: Avg E1 Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -23.00 62.00 0.6290 8 31 .2581 1.3435
2 -12.20 62.00 0.8032 9 31 .2903 1.5114
3 -2.40 62.00 0.9613 6 31 .1935 1.0076
4 36.60 62.00 1.5903 5 31 .1613 0.8397

5 -32.40 52.00 0.3769 3 26 .1154 0.6007
6 -26.60 34.00 0.2176 2 17 .1176 0.6125
7 -14.00 14.00 0.0000 0 7 .0000 0.0000
8 1.00 4.00 1.2500 1 2 .5000 2.6029
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: EarlyConsensus Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 0.40 64.00 1.0063 12 32 .3750 1.9522
2 -13.60 62.00 0.7806 6 31 .1935 1.0076
3 -9.80 62.00 0.8419 6 31 .1935 1.0076
4 -41.20 66.00 0.3758 5 33 .1515 0.7888
5 24.80 46.00 1.5391 3 23 .1304 0.6790
6 -15.60 34.00 0.5412 2 17 .1176 0.6125
7 -14.00 14.00 0.0000 0 7 .0000 0.0000
8 -4.00 4.00 0.0000 0 2 .0000 0.0000
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: Q Speed Points Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -3.80 78.00 0.9513 12 39 .3077 1.6018
2 22.20 68.00 1.3265 8 34 .2353 1.2249

3 -42.60 74.00 0.4243 4 37 .1081 0.5628
4 -41.40 62.00 0.3323 3 31 .0968 0.5038
5 -0.40 36.00 0.9889 3 18 .1667 0.8676
6 -3.80 22.00 0.8273 2 11 .1818 0.9465
7 -4.20 8.00 0.4750 1 4 .2500 1.3015
8 1.00 4.00 1.2500 1 2 .5000 2.6029
9 -2.00 2.00 0.0000 0 1 .0000 0.0000
`
`
`
By: Q Speed Points Number
`
Q SpdPts P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
0 -33.40 76.00 0.5605 6 38 .1579 0.8220
1 -5.80 32.00 0.8188 3 16 .1875 0.9761
2 -26.80 56.00 0.5214 3 28 .1071 0.5578
3 25.80 66.00 1.3909 8 33 .2424 1.2620
4 -9.20 22.00 0.5818 2 11 .1818 0.9465
5 -31.00 48.00 0.3542 3 24 .1250 0.6507
6 -7.20 20.00 0.6400 3 10 .3000 1.5618
7 4.00 16.00 1.2500 2 8 .2500 1.3015
8 8.60 18.00 1.4778 4 9 .4444 2.3137


`
`
`
`
`
LATE:
`
By: SQL-F23 Rank (CompoundLate)
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -17.00 62.00 0.7258 7 31 .2258 1.1755
2 -29.40 62.00 0.5258 5 31 .1613 0.8397
3 -24.00 62.00 0.6129 6 31 .1935 1.0076
4 53.60 54.00 1.9926 8 27 .2963 1.5425
5 -40.00 50.00 0.2000 2 25 .0800 0.4165
6 -22.60 38.00 0.4053 3 19 .1579 0.8220
7 7.40 18.00 1.4111 2 9 .2222 1.1569
8 3.00 2.00 2.5000 1 1 1.0000 5.2059
9 -6.00 6.00 0.0000 0 3 .0000 0.0000
`
`
`
By: LateConsensus Rank
`
Rank P/L Bet Roi Wins Plays Pct Impact
-----------------------------------------------------------------------
1 -9.80 72.00 0.8639 10 36 .2778 1.4461
2 -37.60 62.00 0.3935 4 31 .1290 0.6717
3 -17.40 58.00 0.7000 6 29 .2069 1.0771
4 23.60 66.00 1.3576 8 33 .2424 1.2620
5 -11.80 44.00 0.7318 2 22 .0909 0.4733
6 -13.00 38.00 0.6579 3 19 .1579 0.8220
7 -8.00 8.00 0.0000 0 4 .0000 0.0000
8 -4.00 4.00 0.0000 0 2 .0000 0.0000
9 3.00 2.00 2.5000 1 1 1.0000 5.2059




The above sample shows 5.5f races on the dirt at MNR from opening day March 01, 2015 through March 22, 2015. I purposely did not include data from last night's card (the 23rd) because I did not want the sample to include the horse Chuck was discussing in R1.

I've broken the above Data Window sample out into two sections. The first contains most of the program's early based factors. The second contains the program's primary late factors CompoundLate and LateConsensus.

I've also highlighted rows in the Early section using red text wherever the row for a given rank had significantly higher win rate and/or roi than what you'd normally expect to see for that factor and rank if you were looking at a large sample.

I did the same for the matrices in the Late section too.

Each of the early factor matrices shows something that suggests (at least in the sample presented) that the individual factor has been outperforming its historical norm.

Both of the late factor matrices shows the row for rank=4 outperforming one or more of the higher ranked rows.

Taken individually and viewed within the context that we are looking at a matrix for a single factor in a small sample only my conclusion tends to be that I am looking at small sample noise.

However, taken collectively and viewed within the context that the matrices for many different factors are behaving in a similar manner that is different from the norm - even though we are looking at sample spanning a few weeks only - my conclusion changes.

In this case, if only one or two early factors where outperforming the norm, and the others weren't, my conlusion would tend to be that I was looking at small sample noise.

But when most if not all of the factor matrices are displaying a similar pattern, I start to see something along the lines of preponderance of the evidence in the data.

And based on that I have to start asking myself if the surface is biased.

When analyzing data:

It often pays to look at a sample many different ways. Try to apply some critical thinking to what you are seeing.

If you look at something simplistic or look at one thing only you increase your chances of being misled.

But if you look at many different data points (think lots of trees if you will) and apply some critical thinking to what you are seeing - I think your chances of being misled go down exponentially.


-jp

.

Reply
NYMike
4/17/2015
2:16:22 PM
Jeff,
Back to R.

You wrote:
summary(mlogit(winpayoff ~ odds + valf13 + valf27 -1, data = x))

How would you write this same equation in R if you wanted to add gapF18 but only those above -10?

Mike

Reply
jeff
4/19/2015
11:42:56 AM
This falls within an area called Data Transformations.

FYI, this is an area where you get your chance to shine as a modeler. Imo, the degree of creativity, ingenuity, and critical thinking that goes into transforming the data is often where you as a player separate yourself the crowd.

Put anowther way: Data Transformations is often the area where you as a player generate your edge.




To answer your question:

I would add an additional column to my .csv file and give it a unique name that describes the data in it. Something like F18MinGap would probably do the trick.

After populating the .csv file in the normal manner, I would write - and then run - a special routine to populate the new F18MinGap column.

There are probably several options here, but speaking from experience - I've had success handling similar situations as follows:

• Populate the column with a numeric 0 (to indicate False) whenever GAPF18 fails to meet the >= -10 min value constraint.

• Populate the column with a numeric 1 (to indicate True) whenever GAPF18 meets the >= -10 min value constraint.

From there, once the .csv file has been populated and the data verified, I'd launch R and turn it loose on the file...

library(csvread)

library(mlogit)

y <- read.csv("c:/jcapper/exe/tam7f2014.csv")

map.coltypes("c:/jcapper/exe/tam7f2014.csv", header = TRUE, nrows = 100, delimiter = ",")

x <- mlogit.data(y,choice="winpayoff",shape="long",id.var="id",alt.var="horsename")

summary(mlogit(winpayoff ~ odds + valf13 + valf27 + f18mingap -1, data = x))


Keep in mind that if the report generated by R (or whatever stat package you are using) indicates that the 0's and 1's in the F18MinGap column are significant enough that you decide to add F18MinGap as another factor to your model:

It should be obvious that F18MinGap as described above (or any new data point you create for that matter) doesn't exist in JCapper.

Because of that, if you decide to use it, you'll need to write a routine you can run on race day that evaluates the GAPF18 field in the StartersToday table and writes the 0 or 1 F18MinGap indicator to a custom file or table so that it can be read and from there fed into your custom pricing model as an input.

Hope I managed to explain most of that in a way that makes sense.


-jp

.


~Edited by: jeff  on:  4/19/2015  at:  11:42:56 AM~

Reply
Reply

Copyright © 2018 JCapper Software              back to the JCapper Message Board              www.JCapper.com