JCapper Message Board

General Discussion
-- Statistical programs

Home	Register Log In

Statistical programs

SILVER01HDW
2/28/2015
12:01:15 PM

Is anyone using R or another statistical program to optimize patterns and results?

jeff
2/28/2015
1:55:47 PM

I am.

I started an outside of JCapper data project about a year ago.

The following bullet points describe the "How did I go about it?" part:

• About a year ago, after reading Precision by CX Wong, I became interested in doing an outside of JCapper pricing model.

• I read everything I could find on the topic of Logisitic Regression.

Hint: I was able to find several titles on Google Books where I was able to read 60%-70% or so of the complete book (for free.)

Hint: My efforts in this area lead me to Youtube where I was able to watch free video of college level courses being taught (including a class at Stanford) where the course topic (and the keywords emphasized in my search) was Logistic Regression.

• I downloaded "R" from http://www.r-project.org/

• I created a version of the JCX File Exports Module that enables the player to use custom sql expressions to drive export of JCapper tables to .csv file.

Hint: Once you have data sitting in a .csv file you can open the .csv file in Excel 2010 - and from there: clean the data up/make transformations, etc.

• From there, once you understand the basics - you can connect the MLogit Module in "R" to your .csv files - and let it perform statistical analysis (Logistic Regression) for you.

• From there, you are in a position (certainly a much better position than you otherwise would be without performing statistical analysis on your data) to build a pricing model - or UPR, UserFactors, and UDMs for that matter.

The above process entails a LOT of work. Along the way I think I've gained a much deeper understanding than I had before of not just racing data and model building in general... but an understanding of crowd behavior and what it actually takes to build models that perform reasonably well going forward in time.

-jp

.

SILVER01HDW
2/28/2015
3:16:01 PM

Thanks Jeff, that is helpful. I haven't read Wong's book but do intend on reading it at some point in the near future. I was looking for the right package to install on R so this is helpful. One more question, is it possible to convert the playlist file that is generated in notepad to a CSV file?

~Edited by: SILVER01HDW on: 2/28/2015 at: 3:16:01 PM~

jeff
2/28/2015
4:59:39 PM

PL_Profile.txt files have (if I recall correctly) 500 plus data fields (or columns) per row.

They can be opened in Excel 2010 - which is designed to handle that many columns and more.

However, Excel 2003 has a limit of 255 columns per row. For that reason - Excel 2003 is not a good choice for handling PL_Profile.txt files.

BASIC OPERATING INSTRUCTIONS for getting a PL_Profile.txt file into Excel 2010:

1. Working from inside of Windows Explorer (or My Computer) find the desired PL_Profile.txt file, right-click it, and select COPY.

2. Right-click (not on a file but in the 'white space' inside the folder where the PL_Profile.txt file you are working with is located and select PASTE.

This will cause Windows to create a copy of your PL_Profile.txt file on the folder where you are working.

Hint: Creating a copy leaves the original intact - which prevents you from 'breaking' the integrity of JCapper Build Database routines run on the folder where you are working.

3. Right click the copy created in step 2 above, and rename the file. While renaming the file, change the file extension from .txt to .csv.

4. Double click the renamed copy (which is now a .csv file) from steps 2 and 3 above - and provided you have Excel 2010 installed on your machine - you should find that the file now opens in Excel 2010.

That's it!

-jp

.

NYMike
3/12/2015
11:35:03 AM

Jeff,
Your answer explains how. Can you shed a little light on why? What are you looking at and how would that information be used? I am interested in Logistic Regression but I'm not quite connecting the dots.

Thanks,

Mike

jeff
3/13/2015
3:28:08 PM

Why do it?

Short answer: Better decisions during live play.

How and related insights?

I don't have that kind of free time right now.

In order to cover the subject matter adequately, I'd end up writing the equivalent of several chapters from a book.

And if I were to do that - I would not be surprised one bit if what I ended up writing looked an awful lot like the book I've already recommended in this thread:
Precision by CX Wong

-jp

.

~Edited by: jeff on: 3/13/2015 at: 3:28:08 PM~

jeff
3/13/2015
3:20:43 PM

One of the situations I face daily is calculating a strike price. Or, more specifically - deciding whether or not the odds offered on a horse I am about to bet are high enough that the odds combined with the horse's probability of winning represents a +EV (positive expected value) situation.

Mathematically, the only situations I should be betting are those offering +EV. It goes without saying (obviously) that situations offering -EV (negative expected value) are what I need to avoid.

Bet only +EV over time - and the result is exponential bankroll growth.

On the flip side of things - sprinkle enough -EV in with the bets - and the expected result (eventually) is complete loss of bankroll.

That said, all horseplayers are human beings and subject to mistakes. I'm convinced some of us are capabable of playing a near perfect game in fits and spurts. But none of us are capable of playing a near perfect game perpetually.

Speaking strictly for myself, the goal I am striving for - and the reason I recommend statistical tools to analyze data - is improved accuracy when it comes to identifying +EV situations.

Let's try some SIMLIFIED examples where I create (certainly not the entire thing - I'm not looking to write a book here) but individual parts of a pricing model based on data analysis of a few basic areas of the game: early, late, class, ability from speed figs, form, human connections, breeding, and track profile.

For purposes of these examples I'll be breaking out specific areas of the game in terms of the following JCapper factors:

EARLY:
• EarlyConsensus

LATE:
• LateConsensus

CLASS:
• ClassConsensus

FIGS:
• FigConsensus (primary)
• CFA (secondary)

FORM:
• FormConsensus

HUMAN CONNECTIONS:
• Situational Data Window samples for trainer.
• Situational Data Window samples for rider.

TRACK PROFILE:
• Situational Data Window samples for early, late, and perhaps railposition/gate draw.

EDIT: After posting that and re-reading it I'm struck with the thought that none of this is going to be simple. (But let's see where it leads.)

More to come...

-jp

.

~Edited by: jeff on: 3/13/2015 at: 3:20:43 PM~

jeff
3/14/2015
9:12:00 PM

Big Picture Data Sample:

To start things off, let's get something that represents (the most basic glimpse of) the big picture. The following data sample is driven by a sql expression that gets us every starter that raced on an outer (Main) dirt surface during calendar year 2014:


     query start:         3/14/2015 2:21:52 PM
     query end:           3/14/2015 2:24:27 PM
     elapsed time:        155 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
`
     SQL:  SELECT * FROM STARTERHISTORY
           WHERE [DATE] >= #01-01-2014#
           AND [DATE] <= #12-31-2014#
           AND INTSURFACE=1
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals   411151.80     416241.60     415691.90
     Bet            -550914.00    -550914.00    -550914.00
     -----------------------------------------------------
     P/L            -139762.20    -134672.40    -135222.10
`
     Wins                37208         74017        107303
     Plays              275457        275457        275457
     PCT                 .1351         .2687         .3895
`
     ROI                0.7463        0.7555        0.7545
     Avg Mut             11.05          5.62          3.87
`
`
     By: SQL-F19 Rank (EarlyConsensus)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1    -8429.50   77660.00     0.8915    9324   38830   .2401     1.7777  
      2   -13419.90   74672.00     0.8203    6801   37336   .1822     1.3485  
      3   -17328.70   73492.00     0.7642    5529   36746   .1505     1.1139  
      4   -18752.30   72766.00     0.7423    4716   36383   .1296     0.9596  
      5   -21409.10   70824.00     0.6977    3815   35412   .1077     0.7976  
      6   -21603.40   63748.00     0.6611    2894   31874   .0908     0.6722  
      7   -17157.10   48782.00     0.6483    1902   24391   .0780     0.5773  
      8   -11051.10   32196.00     0.6568    1114   16098   .0692     0.5123  
      9    -5398.10   19248.00     0.7196     623    9624   .0647     0.4792  
     10    -3211.60   10742.00     0.7010     316    5371   .0588     0.4356  
     11    -1223.50    4420.00     0.7232     119    2210   .0538     0.3986  
     12     -518.10    1906.00     0.7282      43     953   .0451     0.3340  
     13     -135.80     320.00     0.5756      10     160   .0625     0.4627  
     14     -112.00     126.00     0.1111       2      63   .0317     0.2350  
     15       -4.00       4.00     0.0000       0       2   .0000     0.0000  
     16       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     17        0.00       0.00     0.0000       0       0   .0000     0.0000  
     18       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     19       -4.00       4.00     0.0000       0       2   .0000     0.0000  
`
`
`
     By: SQL-F22 Rank (LateConsensus)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1   -15036.30   81924.00     0.8165    9455   40962   .2308     1.7088  
      2   -16422.10   76866.00     0.7864    6960   38433   .1811     1.3407  
      3   -16921.90   76006.00     0.7774    5840   38003   .1537     1.1377  
      4   -16872.80   74376.00     0.7731    4821   37188   .1296     0.9597  
      5   -18289.20   70802.00     0.7417    3828   35401   .1081     0.8005  
      6   -18132.60   61690.00     0.7061    2834   30845   .0919     0.6802  
      7   -16562.80   45402.00     0.6352    1667   22701   .0734     0.5436  
      8   -10005.00   30040.00     0.6669     933   15020   .0621     0.4599  
      9    -4984.80   17876.00     0.7211     527    8938   .0590     0.4365  
     10    -3882.10    9712.00     0.6003     228    4856   .0470     0.3476  
     11    -1574.20    3980.00     0.6045      81    1990   .0407     0.3013  
     12     -867.10    1810.00     0.5209      25     905   .0276     0.2045  
     13      -97.10     292.00     0.6675       8     146   .0548     0.4057  
     14     -102.20     126.00     0.1889       1      63   .0159     0.1175  
     15       -4.00       4.00     0.0000       0       2   .0000     0.0000  
     16       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     17       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     18       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     19       -2.00       2.00     0.0000       0       1   .0000     0.0000  
`
`
`
     By: SQL-F27 Rank ClassConsensus)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1   -10978.40   79238.00     0.8615   11899   39619   .3003     2.2234  
      2   -11989.40   75720.00     0.8417    7630   37860   .2015     1.4920  
      3   -13130.50   73586.00     0.8216    5621   36793   .1528     1.1310  
      4   -17487.10   72714.00     0.7595    4160   36357   .1144     0.8471  
      5   -19707.60   70364.00     0.7199    3071   35182   .0873     0.6462  
      6   -20478.80   63034.00     0.6751    2131   31517   .0676     0.5006  
      7   -16791.60   48112.00     0.6510    1332   24056   .0554     0.4099  
      8   -13278.10   31530.00     0.5789     722   15765   .0458     0.3390  
      9    -7510.40   19222.00     0.6093     376    9611   .0391     0.2896  
     10    -5513.70   10604.00     0.4800     168    5302   .0317     0.2346  
     11    -1899.00    4404.00     0.5688      67    2202   .0304     0.2253  
     12     -649.80    1924.00     0.6623      27     962   .0281     0.2078  
     13     -237.80     332.00     0.2837       3     166   .0181     0.1338  
     14      -98.00     118.00     0.1695       1      59   .0169     0.1255  
     15       -4.00       4.00     0.0000       0       2   .0000     0.0000  
     16       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     17       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     18       -4.00       4.00     0.0000       0       2   .0000     0.0000  
     19        0.00       0.00     0.0000       0       0   .0000     0.0000  
`
`
`
     By: SQL-F13 Rank (FigConsensus)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1   -10769.50   77654.00     0.8613   11796   38827   .3038     2.2491  
      2   -11723.50   75606.00     0.8449    7720   37803   .2042     1.5118  
      3   -14431.60   75016.00     0.8076    5645   37508   .1505     1.1142  
      4   -17156.50   74436.00     0.7695    4243   37218   .1140     0.8440  
      5   -21554.50   71898.00     0.7002    3123   35949   .0869     0.6431  
      6   -19991.60   64064.00     0.6879    2208   32032   .0689     0.5103  
      7   -19577.30   47826.00     0.5907    1206   23913   .0504     0.3734  
      8   -11148.90   30848.00     0.6386     707   15424   .0458     0.3393  
      9    -6984.60   18140.00     0.6150     334    9070   .0368     0.2726  
     10    -4610.80    9778.00     0.5285     143    4889   .0292     0.2165  
     11    -1204.20    3700.00     0.6745      57    1850   .0308     0.2281  
     12     -384.70    1338.00     0.7125      18     669   .0269     0.1992  
     13      -91.00     258.00     0.6473       5     129   .0388     0.2869  
     14       -5.50     224.00     0.9754       3     112   .0268     0.1983  
     15      -76.00      76.00     0.0000       0      38   .0000     0.0000  
     16      -44.00      44.00     0.0000       0      22   .0000     0.0000  
     17       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     18       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     19       -4.00       4.00     0.0000       0       2   .0000     0.0000  
`
`
`
     By: SQL-F08 Rank (CFA)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1   -11012.90   74122.00     0.8514    9064   37061   .2446     1.8106  
      2   -10746.10   74908.00     0.8565    7711   37454   .2059     1.5242  
      3   -15229.80   75368.00     0.7979    6125   37684   .1625     1.2033  
      4   -19231.90   75788.00     0.7462    4872   37894   .1286     0.9518  
      5   -20626.10   73304.00     0.7186    3723   36652   .1016     0.7520  
      6   -21485.30   65360.00     0.6713    2595   32680   .0794     0.5879  
      7   -16782.40   48194.00     0.6518    1576   24097   .0654     0.4842  
      8   -11337.10   31064.00     0.6350     838   15532   .0540     0.3994  
      9    -7423.70   17878.00     0.5848     417    8939   .0466     0.3454  
     10    -3487.50    9524.00     0.6338     193    4762   .0405     0.3000  
     11    -1927.10    4010.00     0.5194      70    2005   .0349     0.2585  
     12     -349.60    1166.00     0.7002      20     583   .0343     0.2540  
     13      -72.70     178.00     0.5916       4      89   .0449     0.3327  
     14      -40.00      40.00     0.0000       0      20   .0000     0.0000  
     15      -10.00      10.00     0.0000       0       5   .0000     0.0000  
     16        0.00       0.00     0.0000       0       0   .0000     0.0000  
     17        0.00       0.00     0.0000       0       0   .0000     0.0000  
     18        0.00       0.00     0.0000       0       0   .0000     0.0000  
     19        0.00       0.00     0.0000       0       0   .0000     0.0000  
`
`
`
     By: SQL-F07 Rank (FormConsensus)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1   -12404.90   78976.00     0.8429    7657   39488   .1939     1.4355  
      2   -14105.80   75812.00     0.8139    6466   37906   .1706     1.2628  
      3   -16876.20   74216.00     0.7726    5675   37108   .1529     1.1322  
      4   -18508.80   73120.00     0.7469    5213   36560   .1426     1.0556  
      5   -19375.50   70562.00     0.7254    4491   35281   .1273     0.9424  
      6   -19069.90   63056.00     0.6976    3454   31528   .1096     0.8110  
      7   -16480.00   47434.00     0.6526    2090   23717   .0881     0.6524  
      8    -9940.90   31622.00     0.6856    1178   15811   .0745     0.5516  
      9    -5365.00   19072.00     0.7187     594    9536   .0623     0.4611  
     10    -4432.60   10330.00     0.5709     262    5165   .0507     0.3755  
     11    -1852.50    4390.00     0.5780      97    2195   .0442     0.3272  
     12     -959.10    1874.00     0.4882      28     937   .0299     0.2212  
     13     -266.30     308.00     0.1354       2     154   .0130     0.0961  
     14     -114.70     132.00     0.1311       1      66   .0152     0.1122  
     15       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     16       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     17       -4.00       4.00     0.0000       0       2   .0000     0.0000  
     18        0.00       0.00     0.0000       0       0   .0000     0.0000  
     19       -2.00       2.00     0.0000       0       1   .0000     0.0000

More to come....

-jp

.

~Edited by: jeff on: 3/14/2015 at: 9:12:00 PM~

jeff
3/15/2015
5:31:29 PM

Simplistic Big Picture Probability Estimation:

Suppose for the sake of argument, we are evaluating a horse that is ranked 1st in FigConsensus.

If nothing else is known about the horse: A peek at the above data sample suggests a win probability of approximately 30 percent.

That said, you and I both know that this 30 percent number is certainly not an accurate prob estimate.

For example, by breaking the data in the above sample out by field size - it becomes easy to see that win prob for rank=1 FigConsensus horses on the dirt in a 4 horse race is one thing - while win prob for the same rank=1 FigConsensus on the dirt in a 14 horse race is something else entirely:


     query start:         3/14/2015 4:56:50 PM
     query end:           3/14/2015 4:57:08 PM
     elapsed time:        18 seconds
'
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
'
     SQL:  SELECT * FROM STARTERHISTORY
           WHERE RANKF13=1 
           AND [DATE] >= #01-01-2014# 
           AND [DATE] <= #12-31-2014# 
           AND INTSURFACE=1
'
'
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals    66884.50      67798.70      67029.20
     Bet             -77654.00     -77654.00     -77654.00
     -----------------------------------------------------
     P/L             -10769.50      -9855.30     -10624.80
'
     Wins                11796         19630         24380
     Plays               38827         38827         38827
     PCT                 .3038         .5056         .6279
'
     ROI                0.8613        0.8731        0.8632
     Avg Mut              5.67          3.45          2.75
'
'
     By: Field Size
'
     Value      P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1        0.00       0.00     0.0000       0       0   .0000     0.0000  
      2        0.00       8.00     1.0000       3       4   .7500     2.4687  
      3      -14.40     108.00     0.8667      28      54   .5185     1.7067  
      4     -159.60    1300.00     0.8772     281     650   .4323     1.4230  
      5    -1052.40    7060.00     0.8509    1314    3530   .3722     1.2252  
      6    -2502.90   16380.00     0.8472    2724    8190   .3326     1.0948  
      7    -2093.30   17916.00     0.8832    2809    8958   .3136     1.0321  
      8    -1916.60   13842.00     0.8615    1942    6921   .2806     0.9236  
      9    -1334.00    9308.00     0.8567    1249    4654   .2684     0.8834  
     10     -989.60    6910.00     0.8568     888    3455   .2570     0.8460  
     11     -460.80    2680.00     0.8281     311    1340   .2321     0.7639  
     12     -186.00    1800.00     0.8967     219     900   .2433     0.8009  
     13      -66.70     208.00     0.6793      16     104   .1538     0.5064  
     14       -4.40     130.00     0.9662      10      65   .1538     0.5064  
     15        6.20       2.00     4.1000       1       1  1.0000     3.2915  
     16        0.00       0.00     0.0000       0       0   .0000     0.0000  
     17        0.00       0.00     0.0000       0       0   .0000     0.0000  
     18        0.00       0.00     0.0000       0       0   .0000     0.0000  
     19        5.00       2.00     3.5000       1       1  1.0000     3.2915

So, knowing that field size changes things... What happens if we control for field size by narrowing things to 7 and 8 horse fields only?

And from there break the data out by FigConsensus numeric value?:


     query start:         3/14/2015 5:05:19 PM
     query end:           3/14/2015 5:05:26 PM
     elapsed time:        7 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
`
     SQL:  SELECT * FROM STARTERHISTORY
           WHERE RANKF13=1 
           AND FIELDSIZE >= 7 
           AND FIELDSIZE <= 8 
           AND [DATE] >= #01-01-2014# 
           AND [DATE] <= #12-31-2014# 
           AND INTSURFACE=1
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals    27748.10      27867.30      28128.50
     Bet             -31758.00     -31758.00     -31758.00
     -----------------------------------------------------
     P/L              -4009.90      -3890.70      -3629.50
`
     Wins                 4751          7895         10206
     Plays               15879         15879         15879
     PCT                 .2992         .4972         .6427
`
     ROI                0.8737        0.8775        0.8857
     Avg Mut              5.84          3.53          2.76
`
`
     By: SQL-F13 (FigConsensus) Numeric Value:
`
       >=Min        < Max        P/L        Bet        Roi    Wins   Plays     Pct   Impact
     --------------------------------------------------------------------------------------
     -999.00        65.00       0.00       0.00     0.0000       0       0   .0000   0.0000
       65.00        67.50      32.00      16.00     3.0000       2       8   .2500   0.8356
       67.50        70.00     -47.10      62.00     0.2403       2      31   .0645   0.2156
       70.00        72.50    -194.70     644.00     0.6977      56     322   .1739   0.5813
       72.50        75.00    -142.10    1102.00     0.8711     110     551   .1996   0.6672
       75.00        77.50    -570.70    3458.00     0.8350     388    1729   .2244   0.7500
       77.50        80.00    -458.90    3406.00     0.8653     435    1703   .2554   0.8537
       80.00        82.50    -714.50    6478.00     0.8897     887    3239   .2738   0.9153
       82.50        85.00    -475.30    4672.00     0.8983     739    2336   .3164   1.0573
       85.00        87.50    -864.90    6184.00     0.8601    1017    3092   .3289   1.0993
       87.50        90.00    -305.70    2840.00     0.8924     549    1420   .3866   1.2922
       90.00        92.50    -268.00    2896.00     0.9075     566    1448   .3909   1.3064
       92.50        95.00       0.00       0.00     0.0000       0       0   .0000   0.0000
       95.00        97.50       0.00       0.00     0.0000       0       0   .0000   0.0000
       97.50       100.00       0.00       0.00     0.0000       0       0   .0000   0.0000
      100.00       102.50       0.00       0.00     0.0000       0       0   .0000   0.0000
      102.50       105.00       0.00       0.00     0.0000       0       0   .0000   0.0000
      105.00       107.50       0.00       0.00     0.0000       0       0   .0000   0.0000
      107.50       110.00       0.00       0.00     0.0000       0       0   .0000   0.0000
      110.00    999999.00       0.00       0.00     0.0000       0       0   .0000   0.0000

The above data sample makes it easy to see that even though we are controlling for field size...

Win prob for low FigConsensus numeric value (at say 67.5 to 70) is one thing - while win prob for FigConsensus high numeric value ( at say 90 plus) is much higher.

What if we take the same rank=1 FigConsensus data on Main outer dirt courses while controlling for field size (7 & 8 horse fields only) and break things out by odds range?:


     query start:         3/14/2015 5:54:01 PM
     query end:           3/14/2015 5:54:09 PM
     elapsed time:        8 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
`
     SQL:  SELECT * FROM STARTERHISTORY
           WHERE RANKF13=1 
           AND FIELDSIZE >= 7 
           AND FIELDSIZE <= 8 
           AND [DATE] >= #01-01-2014# 
           AND [DATE] <= #12-31-2014# 
           AND INTSURFACE=1
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals    27748.10      27867.30      28128.50
     Bet             -31758.00     -31758.00     -31758.00
     -----------------------------------------------------
     P/L              -4009.90      -3890.70      -3629.50
`
     Wins                 4751          7895         10206
     Plays               15879         15879         15879
     PCT                 .2992         .4972         .6427
`
     ROI                0.8737        0.8775        0.8857
     Avg Mut              5.84          3.53          2.76
`
`
     By: Odds Range
`
       >=Min        < Max        P/L        Bet        Roi    Wins   Plays     Pct   Impact
     --------------------------------------------------------------------------------------
     -999.00         0.00       0.00       0.00     0.0000       0       0   .0000   0.0000
        0.00         0.50    -168.50    1266.00     0.8669     419     633   .6619   2.2123
        0.50         1.00    -440.40    4332.00     0.8983    1139    2166   .5259   1.7575
        1.00         1.50    -670.90    4734.00     0.8583     927    2367   .3916   1.3089
        1.50         2.00    -514.60    4184.00     0.8770     683    2092   .3265   1.0912
        2.00         2.50    -468.90    3444.00     0.8639     467    1722   .2712   0.9064
        2.50         3.00    -342.00    2740.00     0.8752     325    1370   .2372   0.7929
        3.00         3.50    -200.40    2036.00     0.9016     220    1018   .2161   0.7223
        3.50         4.00    -273.70    1474.00     0.8143     128     737   .1737   0.5805
        4.00         4.50     -45.90    1214.00     0.9622     113     607   .1862   0.6222
        4.50         5.00    -194.80     970.00     0.7992      68     485   .1402   0.4686
        5.00         5.50    -237.80     756.00     0.6854      42     378   .1111   0.3714
        5.50         6.00      -4.60     646.00     0.9929      48     323   .1486   0.4967
        6.00         6.50      28.30     488.00     1.0580      36     244   .1475   0.4931
        6.50         7.00    -130.70     400.00     0.6733      18     200   .0900   0.3008
        7.00         7.50     -84.20     372.00     0.7737      18     186   .0968   0.3234
        7.50         8.00     -82.20     308.00     0.7331      13     154   .0844   0.2821
        8.00         8.50     -42.80     264.00     0.8379      12     132   .0909   0.3038
        8.50         9.00       2.50     250.00     1.0100      13     125   .1040   0.3476
        9.00    999999.00    -138.30    1880.00     0.9264      62     940   .0660   0.2204

Obviously, even though we are controlling for field size...

When the odds for rank=1 FigConsensus are low, win prob is high - and when the odds are high win prob is low.

One more data query - and then I'll get to the point.

What if we take the same rank=1 FigConsensus on the dirt while controlling for field size and break the data out by ClassConsensus rank?:


     query start:         3/14/2015 5:58:42 PM
     query end:           3/14/2015 5:58:50 PM
     elapsed time:        8 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
`
     SQL:  SELECT * FROM STARTERHISTORY
           WHERE RANKF13=1 
           AND FIELDSIZE >= 7 
           AND FIELDSIZE <= 8 
           AND [DATE] >= #01-01-2014# 
           AND [DATE] <= #12-31-2014# 
           AND INTSURFACE=1
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals    27748.10      27867.30      28128.50
     Bet             -31758.00     -31758.00     -31758.00
     -----------------------------------------------------
     P/L              -4009.90      -3890.70      -3629.50
`
     Wins                 4751          7895         10206
     Plays               15879         15879         15879
     PCT                 .2992         .4972         .6427
`
     ROI                0.8737        0.8775        0.8857
     Avg Mut              5.84          3.53          2.76
`
`
     By: SQL-F27 Rank (ClassConsensus)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1    -1796.60   15292.00     0.8825    2788    7646   .3646     1.2187  
      2     -615.80    7186.00     0.9143    1010    3593   .2811     0.9395  
      3     -794.00    4224.00     0.8120     491    2112   .2325     0.7770  
      4     -285.30    2392.00     0.8807     234    1196   .1957     0.6539  
      5     -377.00    1424.00     0.7353     122     712   .1713     0.5727  
      6      -87.90     740.00     0.8812      61     370   .1649     0.5510  
      7      -68.10     406.00     0.8323      37     203   .1823     0.6092  
      8       14.80      94.00     1.1574       8      47   .1702     0.5689  
      9        0.00       0.00     0.0000       0       0   .0000     0.0000  
     10        0.00       0.00     0.0000       0       0   .0000     0.0000  
     11        0.00       0.00     0.0000       0       0   .0000     0.0000  
     12        0.00       0.00     0.0000       0       0   .0000     0.0000  
     13        0.00       0.00     0.0000       0       0   .0000     0.0000  
     14        0.00       0.00     0.0000       0       0   .0000     0.0000  
     15        0.00       0.00     0.0000       0       0   .0000     0.0000  
     16        0.00       0.00     0.0000       0       0   .0000     0.0000  
     17        0.00       0.00     0.0000       0       0   .0000     0.0000  
     18        0.00       0.00     0.0000       0       0   .0000     0.0000  
     19        0.00       0.00     0.0000       0       0   .0000     0.0000

Again, even though we are controlling for field size...

When ClassConsensus rank=1, win prob is one thing - but when ClassConsensus rank is 5-6-7-8: win prob morphs into something much lower.

The point I'm trying to make here is this:

All by itself (if no other information is known) rank=1 FigConsensus on the dirt has a win prob of approximately 30 percent.

But each time some new piece of information is added: The picture changes.

That in itself provides a valuable clue.

The key, in my humble opinion, is finding the right piece(s) of information - or better yet - creating your own custom data points that no one else has - and/or combining your data points in a unique way.

Obviously, one of the many things a model has to be capable of is generating accurate probabilities.

But it goes a little deeper than that:

The model has to be robust. Not only should it be grounded in sound mathematical probability theory, it should also be able to handle most of the situations faced by the player during live play each day.

I'll stop here (for now) and come back as free time permits.

-jp

.

~Edited by: jeff on: 3/15/2015 at: 5:31:29 PM~

jeff
3/15/2015
5:54:22 PM

Big Picture Prob Estimation - moving beyond the simplistic:

Focusing on the last data sample presented above, one way of moving beyond simplistic prob estimation, at least a way that seems intuitive to me, and one that allows mapping out the data points describing a probability distribution for something like the FigConsensus rank=1 and ClassConsensus rank matrix presented above might be to create a Decision Forest or series of Decision Trees.

In such a Forest, each Tree in the Forest can take a very simple form: That of the rows from a matrix like the one presented above.

For example:

The first tree in the forest describing the intersection of FigConsensus Rank=1 with ClassConsensus Rank is the row from the matrix above describing rank=1 for ClassConsensus:

     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1    -1796.60   15292.00     0.8825    2788    7646   .3646     1.2187

The second tree in the forest describing the intersection of FigConsensus Rank=1 with ClassConsensus Rank is the row from the matrix above describing rank=2 for ClassConsensus:

     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      2     -615.80    7186.00     0.9143    1010    3593   .2811     0.9395

The third tree in the forest describing the intersection of FigConsensus Rank=1 with ClassConsensus Rank is the row from the matrix above describing rank=3 for ClassConsensus:

     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      3     -794.00    4224.00     0.8120     491    2112   .2325     0.7770

And, of course, the 4th through Xth trees in the forest would be the remaining rows in the matrix.

Speaking from personal experience I have been able to make the decision tree approach work.

The advantages are:

In its simplest form it's highly intuitive.

It's fairly easy to implement.

Several individual rows from a matrix (having desirable win rate and roi) are observed in the Data Window.

From there a series of decision trees are created in the mind's eye (or whiteboarded.)

This approach lends itself well to UDMs.

If you want to go beyond UDMs and create an actual Model:

The decision trees can be coded out as a 'model' where the inputs (odds, rankF0X, ValF0Y, GapF0Z, etc.) are passed to a 'function' programmed to essentially recreate each row from the matrix and return a prob estimate based on the inputs received.

Hint: In the absence of matrix rows that have been individually coded out: The prob estimate generated by the function can be the result of mathematical calculation. (This is what stat packages such as R, SAS, and SPSS do. )

That said, in its simplest form, with no requirement whatsoever that you know higher math: In a Decision Tree Model a prob estimate generated by the function based on the inputs fed into it can be the same as the prob estimates taken from individual rows displayed in a Data Window factor breakout matrix.

From there the prob estimate is converted to a strike price (or min required odds) using the following formula:

RequiredOdds = (1/ProbEstimate) -1

From there, during live play, RequiredOdds for horses the player is thinking about betting are compared to actual odds as the field is facing up to the gate (the later into the loading process you can wait the better) - and (hopefully) intelligent play or pass decisions are made.

The disadvantages are:

You have to whiteboard and code out a LOT of trees (if you hope to cover most of the situations you will encounter during live play.)

Even when you put a LOT of effort into creating a LOT of trees, you always encounter cases during live play that you hadn't considered before - and therefore have no trees that cover the individual case in front of you.

Examples:

• A race is taken off the turf and run at some oddball distance not seen before in the data set you used to create your trees.

• A new track code appears out of the blue or becomes available at an ADW where you have an account. BOI-BTP-FON-HPX-LRC-MVR might be examples of this that many of you encountered in the past year.

With each new track code, you have to acquire enough data to have a relevant sample - and from there - perform Data Window R&D, see something in the Data Window, and create new trees - possibly for each surface-distance configuration - running at that track code from scratch.

• A track makes a surface switch - making your existing trees useless in the process. In recent years APX-DMR-GGX-KEE-SAX-TPX-WOX removing their dirt surfaces and putting in synthetic would be examples of this. KEE-SAX scrapping synthetic and going back to dirt would also be examples of this. DMR removng their synthetic surface and going back to dirt and WOX replacing their Polytrack surface with Tapeta are both soon to be examples of this. Handling these types of changes basically requires you to redo your trees from scratch.

• Your datasets were based on races from the last 6 months of the prior year - a large enough sample that you (mistakenly) believed it would contain at least a few races representing every possible case you were likely to encounter.

However, because your data was from the second half of the previous year - that dataset failed to contain a single race where every starter in the field was a 2 yr old first time starter. Fast forward to April of the following year and you are using your trees as the basis for live play decision making and you are faced with a field full of 2 yr old first time starters at KEE and it hits you: There is 8 mtp, the race in front of you is part of a potentially large paying pick3 or pick4 sequence that you very much want to play. But you have no possible basis for using your trees to make an informed decision in that race.

The game itself is something that is slowly yet constantly evolving.

Your Trees have a shelf life of some unknown duration beforehand when you create them.

The game's evolution process means other players will (eventually) catch on to the same things you observed when you created your trees. If and when that happens you have to scrap your Trees and create new ones from scratch.

Did I mention you have to whiteboard and code out a LOT of trees?

Keep in mind that the matrices presented in the above samples are very simplistic. Now suppose for the sake of argument (to keep things simple) you want your Model to use rank only and you want it to cover field sizes of 5-14 horses. Based on that you have 10 rows per matrix. Let's further suppose that we want the Model to include just 5 factors: EarlyConsensus, LateConsensus, ClassConsensus, FigConsensus, and FormConsensus.

Sounds pretty simple, right?

Believe it or not such a Model expressed as a Decision Forest could include up to 10 x 10 x 10 x 10 x 10 (or 100,000) individual trees.

That's a LOT of Trees. Far too many to code out by hand... And that's if none of your trees are track-surface-distance specific!

Which makes a nice lead in as to why you might want to employ a stat package such as R, SAS, or SPSS, etc.

I'll stop here (for now) and add to this as free time permits.

-jp

.

~Edited by: jeff on: 3/15/2015 at: 5:54:22 PM~

NYMike
3/16/2015
12:51:22 PM

Jeff,
This is terrific. Keep writing as you have time. Also, everything you are saying makes sense. You don't mention anything outside of JCapper yet. Without causing you to spend too much more time, can you give a snap shot of what you can't see in the data window?

NY Mike

~Edited by: NYMike on: 3/16/2015 at: 12:51:22 PM~

jeff
3/17/2015
1:17:44 PM

I used a custom sql expression in the JCapper JCX File Exports Module to create a .csv file named "tam7f2014.csv" on my c:\JCapper\Exe folder.

The file contains id, horsename, [date], track, race, fieldsize, winpayoff, odds, valf19 or EarlyConsensus, valf22 or LateConsensus, valf27 or ClassConsensus, valf13 or FigConsensus, and valf07 or FormConsensus data for 7 furlong dirt races (with zero first time starters) that were run at Tampa Bay Downs during calendar year 2014.

The sql expression is as follows:

SELECT id, horsename, [date], track, race, fieldsize, winpayoff, odds, valf19, valf22, valf27, valf13, valf07 FROM starterhistory
           
           WHERE track='tam'
           and dist = 1540
           and intsurface = 1           
           and ftscount = 0               

           and [DATE] >= #01-01-2014#
           AND [DATE] <= #12-31-2014#
           ORDER BY [DATE], TRACK, RACE, officialfinishposition

From there I opened the resulting .csv file in Excel 2010, and performed the following steps to "clean up" the data:

I converted the $2.00 win mutuels as reported in the Equibase Charts from the actual payoff to an integer value of 1 for winning horses and converted the emply (or zero length string) values for losing horses to an integer value of 0.

Note: I did this because in Logistic Regression you are evaluating the likelihood of two possible outcomes. A horse can either win a race (1) or lose a race (0.)

I converted the text names of all horses in the .csv file to a unique number. To accomplish this I simply gave the first horse in the file a name of 1, the second horse in the file a name of 2, and the third horse in the file a name of 3. I kept incrementing the names of each horse in the file by 1 until every horse in the file had a unique number instead of a name.

I did this because the MLogit package in R (at least the way I am using it) requires that each row in the dataset have both a primary and a secondary unique identifier. In my case the id field is the primary unique identifier and the name field (with the names replaced by sequential numbers) is the secondary unique identifier.

I then launched R and after the interface came up keyed in the following commands which caused R's CsvRead and MLogit packages to be loaded into memory:

library(csvread)

library(mlogit)

Next I keyed the following commands into the R interface which caused my .csv file to be read into memory and the fields in the file to be mapped:

y <- read.csv("c:/jcapper/exe/tam7f2014.csv")

map.coltypes("c:/jcapper/exe/tam7f2014.csv", header = TRUE, nrows = 100, delimiter = ",")

Next, I keyed the following commands into the R interface which caused the MLogit package to use Logistic Regression to generate beta coefficients (or curve fit the data) in the Odds - FigConsensusNumericValue - ClassConsensusNumericValue matrices as they exist in my tam7f2014.csv file:

x <- mlogit.data(y,choice="winpayoff",shape="long",id.var="id",alt.var="horsename")

summary(mlogit(winpayoff ~ odds + valf13 + valf27 -1, data = x))

It took a few minutes, but once the task completed, the generated output looked like this:

nr method
7 iterations, 0h:0m:3s 
g'(-H)^-1g = 2.52E+06 
last step couldn't find higher value 
`
Coefficients :
       Estimate Std. Error t-value Pr(>|t|)
odds   0.067211   0.051214  1.3123   0.1894
valf13 0.025159   0.118567  0.2122   0.8320
valf27 0.259931   0.208039  1.2494   0.2115
`
Log-Likelihood: -1.9903

Generally, the lower the value in the Pr(>|t|) column for a given factor the better. Or more specifically, the lower the value in the Pr(>|t|) column for a given factor, the more significance that factor has in your model.

Here, I basically have a model for 7f dirt races at TAM based on 3 factors only: Odds, FigConsensus, and ClassConsensus.

In that model, Odds has the greatest degree of significance, followed closely by ClassConsensus second, followed by FigConsensus (which appears to lag well behind the other two as a distant last.)

FWIW, my interpretation of the above report is that among the 2 factors in the model other than the odds: FigConsensus is of little value because it is so strongly reflected in the odds (at least at 7f on the dirt at TAM.)

-jp

.

~Edited by: jeff on: 3/17/2015 at: 1:17:44 PM~

jeff
3/17/2015
1:11:07 PM

In an above post I said the following:

--quote:

"In the absence of matrix rows that have been individually coded out: The prob estimate generated by the function can be the result of mathematical calculation."

--end quote

I also said the following about a rank only 5 factor model:

--quote:

"such a Model expressed as a Decision Forest could include up to 10 x 10 x 10 x 10 x 10 (or 100,000) individual trees.

That's a LOT of Trees. Far too many to code out by hand... And that's if none of your trees are track-surface-distance specific!"

--end quote

If you don't want to create 100k trees manually - and keep in mind that 100k trees is a conservative estimate if your intent is to create a robust model...

Then this is where a stat package like R, SAS, or SPSS, etc. can help you.

For example, in R, the MLogit logicstic regression package is designed to read data into memory from a properly formatted external data source (database table or .csv file.)

From there, the MLogit package in R uses a maximum likelihood function to mathematically calculate (or map out) the data points between two or more matrices like those found in the FigConsensusRank - ClassConsensusRank example presented above.

Once the data points are mapped out, the MLogit package in R will generate a report showing the Beta Coefficients for the factors in your model that you told it to evaluate.

The math in a logistic regression max likelihood algorithm is quite involved. But try not to let that (or the terminology) throw you.

In layman's terms, the algorithm is doing nothing more than plotting a curve describing the data points in your matrices.

Once you have Beta Coefficients for the factors in your model - and provided you are using significant factors in your model - the next step is to plug them in to a formula similar to the one used by Wong in chapter 5 (titled "Winning Probability and Fair Odds") of Precision and start generating probabilty estimates for the horses in a given race.

-jp

.

edcondon
3/18/2015
5:50:34 AM

"From there, during live play, RequiredOdds for horses the player is thinking about betting are compared to actual odds as the field is facing up to the gate (the later into the loading process you can wait the better) - and (hopefully) intelligent play or pass decisions are made."

For Me, this is the problem. +EV quickly turns negative with a flash of the tote (and visa versa). Deciding to bet or not while your EV teeters from + to - and while two other races are loading. Conditional Bets @ 0 to post create a lot of "whipsaw" (usually working against me).

NYMike
3/18/2015
4:30:09 PM

Jeff,
This is great stuff. Thanks.

Mike

jeff
3/20/2015
11:47:55 AM

In an above post I mentioned "Situational Data Window samples for early, late, and perhaps railposition/gate draw" as an area of interest when it comes to prob estimates.

I wanted to post some examples of what I mean by that in this thread.

Looking at today's races - Fri March 20th, 2015 - I see that I have a number of UDMs converging on the #1 SCONSET EXPRESS in LRL R3. (A little more than 50 mtp as I start to type this.)

My UDMs aren't track specific and are mostly emphasizing speed-pace-form. Because of that, the situational queries I am going to run will look at areas not covered by the 'trees' in my UDMs.

The first thing that concerns me here is RailPosition. The LRL dirt surface has had a dead rail lately and SCONSET EXPRESS drew the 1 hole.

Looking at the most recent 600 starters in my StarterHistory table (which begins flagging horses sometime in mid December 2015 going forward to yesterday) at the 6f distance on LRL dirt, with the data broken out by RailPosition, I show the following:


     query start:         3/20/2015 9:09:36 AM
     query end:           3/20/2015 9:09:37 AM
     elapsed time:        1 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
`
     SQL:  SELECT TOP 600 * FROM STARTERHISTORY
           WHERE TRACK='LRL' 
           AND INTSURFACE = 1 
           AND DIST = 1320 
           AND [DATE] >= #01-01-2014# 
           AND [DATE] <= #03-19-2015# 
           ORDER BY [DATE] DESC
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals     1073.60        953.90        988.90
     Bet              -1224.00      -1224.00      -1224.00
     -----------------------------------------------------
     P/L               -150.40       -270.10       -235.10
`
     Wins                   83           163           242
     Plays                 612           612           612
     PCT                 .1356         .2663         .3954
`
     ROI                0.8771        0.7793        0.8079
     Avg Mut             12.93          5.85          4.09
`
`
     By: Rail Position
`
     Rail Pos   P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1      -89.20     160.00     0.4425       8      80   .1000     0.7373 <---
` 
      2       -5.60     160.00     0.9650      11      80   .1375     1.0139  
      3      -50.80     160.00     0.6825       9      80   .1125     0.8295  
      4       15.80     160.00     1.0988      12      80   .1500     1.1060  
      5       26.60     158.00     1.1684      16      79   .2025     1.4934  
      6      -41.60     154.00     0.7299       7      77   .0909     0.6703  
      7       21.40     132.00     1.1621      11      66   .1667     1.2289  
      8        8.60      76.00     1.1132       7      38   .1842     1.3583  
      9       -3.60      32.00     0.8875       2      16   .1250     0.9217  
     10      -22.00      22.00     0.0000       0      11   .0000     0.0000  
     11       -6.00       6.00     0.0000       0       3   .0000     0.0000  
     12       -2.00       2.00     0.0000       0       1   .0000     0.0000  
     13       -2.00       2.00     0.0000       0       1   .0000     0.0000

I can't help but notice the terrible roi for the 1 hole at 6f on the LRL dirt surface. The 10% win rate and 0.73 impact value aren't that terrible.

That said, my interpretation of the above sample is that the public hasn't yet caught on that the rail is bad and therefore has been betting horses from the 1 hole at LRL as if it were a good thing.

Based on that I'm not expecting to get a serious deviation upward from the 5-1 MLine.

-jp

.

jeff
3/20/2015
12:17:22 PM

None of the business UDMs that I have converging on #1 SCONSET EXPRESS in LRL R3 contain factor constraints (trees) desiged to handle rider and trainer.

I generally use layering UDMs for that.

That said, the game changes over time - and staying on top of who's riding and training well (and where) takes WORK.

Lately, rather than continually updating UDMs like I used to I've taken up the practice of running situational queries in the Data Window to get the layering info I want for horses I am thinking about betting.

Widening the query parameters a bit (removing the track, surface, and dist restrictions) here's what I get for today's trainer:


     query start:         3/20/2015 9:59:26 AM
     query end:           3/20/2015 9:59:26 AM
     elapsed time:        0 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
`
     SQL:  SELECT TOP 600 * FROM STARTERHISTORY
           WHERE TRAINER = 'MCCARTHY KEVIN' 
           AND [DATE] >= #01-01-2014# 
           AND [DATE] <= #03-19-2015# 
           ORDER BY [DATE] DESC
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals      120.00        120.50        133.60
     Bet               -192.00       -192.00       -192.00
     -----------------------------------------------------
     P/L                -72.00        -71.50        -58.40
`
     Wins                   17            31            42
     Plays                  96            96            96
     PCT                 .1771         .3229         .4375
`
     ROI                0.6250        0.6276        0.6958
     Avg Mut              7.06          3.89          3.18
`
`
     ****************************************************************************************
     BY TRACK sorted by Track Code                           Run Date: 3/20/2015 9:59:26 AM
     ****************************************************************************************
                                            WIN  WIN          WIN               PLACE   PLACE
     TRACK            PLAYS    WINS         PCT  IMPACT       ROI  PLACES         PCT     ROI
     ****************************************************************************************
     ATL                 1       1       1       5.6471    4.1       1         1       2.4     
     CTX                 5       0       0       0         0         2         0.4     1.14    
     LRL                 38      1       0.0263  0.1485    0.1789    2         0.0526  0.1868  
     PEN                 1       0       0       0         0         0         0       0       
     PHA                 1       1       1       5.6471    4.7       1         1       1.6     
     SAR                 1       0       0       0         0         0         0       0       
     SUF                 39      14      0.359   2.0273    1.1385    25        0.641   1.1141  
     TAM                 10      0       0       0         0         0         0       0       
     ****************************************************************************************
     8 Track Codes from file: StarterHistory Table
     ****************************************************************************************

Using the same query, here's what I get for today's rider:


     query start:         3/20/2015 10:01:43 AM
     query end:           3/20/2015 10:01:44 AM
     elapsed time:        1 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
`
     SQL:  SELECT TOP 600 * FROM STARTERHISTORY
           WHERE RIDER = 'ALMODOVAR GERALD' 
           AND [DATE] >= #01-01-2014# 
           AND [DATE] <= #03-19-2015# 
           ORDER BY [DATE] DESC
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals      980.00        865.00        899.40
     Bet              -1202.00      -1202.00      -1202.00
     -----------------------------------------------------
     P/L               -222.00       -337.00       -302.60
`
     Wins                  132           206           284
     Plays                 601           601           601
     PCT                 .2196         .3428         .4725
`
     ROI                0.8153        0.7196        0.7483
     Avg Mut              7.42          4.20          3.17
`
`
     ****************************************************************************************
     BY TRACK sorted by Track Code                           Run Date: 3/20/2015 10:01:44 AM
     ****************************************************************************************
                                            WIN  WIN          WIN               PLACE   PLACE
     TRACK            PLAYS    WINS         PCT  IMPACT       ROI  PLACES         PCT     ROI
     ****************************************************************************************
     CTX                 559     126     0.2254  1.0263    0.8496    199       0.356   0.7497  
     LRL                 10      0       0       0         0         1         0.1     0.4     
     MNR                 10      4       0.4     1.8212    1.14      4         0.4     0.67    
     PEN                 2       1       0.5     2.2765    0.9       1         0.5     0.65    
     PHA                 2       1       0.5     2.2765    0.95      1         0.5     0.7     
     TIM                 18      0       0       0         0         0         0       0       
     ****************************************************************************************
     6 Track Codes from file: StarterHistory Table
     ****************************************************************************************

My interpretation of the above samples would be that while today's trainer and rider have been able to enjoy a measure of succeess at smaller circuit tracks such as CTX-MNR-PEN-PHA-SUF, etc...

They have not, to date, been able to crack the bigger/tougher Maryland circuit tracks such as LRL-PIM-TIM.

My reaction to this is as follows:

The prob models (that do not consider any of the situational info posted above) inherent in my UDMs would suggest a strike price in the 7/2 to 4/1 range.

After running the above queries I am going to need 6/1 to 7/1 before pulling the trigger on this horse.

-jp

.

jeff
3/20/2015
12:26:32 PM

With 1 mtp and odds for the #1 horse at LRL in R3 hovering at about 3/1... my decision is to pass.

-jp

.

NYMike
3/20/2015
12:27:29 PM

I'm assuming at 5/2 with a minute to go you will watch this one!

Mike

jeff
3/20/2015
12:43:40 PM

Result?

In LRL R3 #1 SCONSET EXPRESS set the pace while the rider kept her glued to a dead rail. Not surprisingly, she began shortening stride as they turned for home and all but stopped in the stretch.

The win prob estimates inherent in the situational queries were a lot lower than those suggested by my speed-pace-form UDMs.

Merging the two together caused me to revise my strike price upward - and allowed me to avoid making an otherwise bad bet.

-jp

.

NYMike
3/20/2015
12:48:46 PM

Jeff,
Having watched that race, how do you look at it? Do you see a horse with a reasonable chance to win, good Fig, Early, Form but poor late? At decision time he was overbet and the actual odds were still too low. For the race itself, he led, was not pushed by a contender, and still the poor late ability stopped him. Do you see this as one of the expected random walk outcomes that over time will closely resemble the percentages you felt he had to win?

And to judge as to whether this was a good bet or a bad bet, is result irrelevant to you? At 3/1 it was a bad bet and at 7/1 it was a good bet assuming the same outcome?

NY

jeff
3/20/2015
1:44:28 PM

--quote:

"Having watched that race, how do you look at it? Do you see a horse with a reasonable chance to win, good Fig, Early, Form but poor late?"

--end quote

Having just watched the race I see a horse with a reasonable chance of showing a complete form reversal if one or more of the following were to take place:

• Change to a surface where the rail is good. For example, a night at CTX when the track is dry.

• Change in RailPosition/Gate Draw. Especially if she races again at LRL. Maybe next time out draws a middle post - better if outside whatever other speed is in the race - and because of the way she stopped in today's race - likely goes off at higher odds.

• Rider Change - In my opinion, Almodovar didn't do the horse, the connections (or the bettors for that matter) any favors by going to the lead while keeping her glued to a dead rail. Her "S" HDW Runstyle indicates she's not a need the lead type. Another rider - one more adept at avoiding LRL's dead rail might well have been able to get a very different result out of today's race by sitting off the pace and/or making an effort to tip off the rail and get to a better part of the track surface.

--quote:

"Do you see this as one of the expected random walk outcomes that over time will closely resemble the percentages you felt he had to win?"

--end quote

I see today's result as anything but random - especially in light of the situational queries that I ran.

That said, had I not run the situational queries - then I'd have to base my interpretation on the info built into my UDMs - and that's all I would have had been able to go on at post time.

--EDIT:

So far as a random walk type of thing...

Results from betting are almost always a random walk type of thing. The more accurate your prob estimates the lower the degree of variance. But short of getting to 100% accuracy in your prob estimates - there is always going to be at least some degree of variance in your results.

--end Edit

Remember what I said (or tried to say) earlier in this thread:

Each new (relevant) prob estimate (or piece of info) changes (or should change) your overall prob estimate.

--quote:

"And to judge as to whether this was a good bet or a bad bet, is result irrelevant to you? At 3/1 it was a bad bet and at 7/1 it was a good bet assuming the same outcome?"

--end quote

No. The result is not relevant to me.

However, whether it was a good bet or a bad bet IS relevant to me. Based on my situational queries I set my strike price in the mid 6/1 range.

Had the odds been say 8/1 on my horse facing up to the gate - knowing what I know based on my situational queries - I would have pulled the trigger.

-jp

.

~Edited by: jeff on: 3/20/2015 at: 1:44:28 PM~

Charlie James
3/23/2015
10:45:27 PM

Jeff, MUCHAS GRACIAS for the situational query post.

Maybe you recall before we became automated I used to keep tallies for sires and damssires in a spiral notebook. Simpler days back then because I only followed one circuit at a time: SOCAL.

No SOCAL. It's Monday. But with Mnr not cancelling tonight [Ha!] I thought why not.

Maybe you know where this is going already---

Mnr race 1 horse #3 --
#'s from the html report: a sprinkling of 1's, 2's and 3's for this and that plus 20/1 morning line odds.

Right there maybe enough for a smallish bet?? -- especially at a place like Mnr where anything can happen and does.

Normally I'd stop there. But after reading your post about the situational I decided to dig deeper.

So I copy and paste your top 600 railposition sql query into the data window -- I change track to 'mnr' and dist to 1210 and break the data out by railposition:

15 percent winners 0.71 roi. [nothing to see here]

Next I rerun the same query but this time with the data broken out by early consensus rank:

30 percent winners 0.97 roi

And the words speed bias flash before me.

Instantly followed by the words yeah right.

You see in the entire history of my betting life I have proven myself to be expertly adept at seeing speed biases after the fact only.

So let's forget that I mentioned 30 percent winners and 0.97 roi for early consensus -- for the time being.

Now HERE'S the kicker. The reason I'm posting--

When I run a query for the damssire of the #3 horse:

TOP 600 * FROM STARTERHISTORY
WHERE DAMSSIRE='BOUNDARY'
AND INTSURFACE=1
AND DIST=1210
AND [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2015#
ORDER BY [DATE] DESC

13 percent winners and 1.48 roi.

And from my days of breeding stats in a spiral notebook I ask the age old relevant question:

Q. What is this horse bred to do?

A. Apparently run 5.5f furlongs on the dirt.

But HERE'S where it gets interesting --

When I run a query for the damssire of the favorite:

TOP 600 * FROM STARTERHISTORY
WHERE DAMSSIRE='FORESTRY'
AND INTSURFACE=1
AND DIST=1210
AND [DATE] >= #01-01-2014#
AND [DATE] <= #12-31-2015#
ORDER BY [DATE] DESC

10 percent winners 0.56 roi.

And from my days of breeding stats in a spiral notebook I ask the flipside of the age old relevant question:

Q. What is the favorite not bred to do?

A. Apparently run 5.5f furlongs on the dirt.

At this point I connect the dots.

Now what I need to know is this:

Did she wire her field because of early consensus and a speed bias?

Or did she do what I think she was bred to do?

Again THANK YOU.

No way do I dig beyond the numbers on the html report without reading your post about the situational.

jeff
3/24/2015
2:36:33 PM

--quote:

"Now what I need to know is this:

Did she wire her field because of early consensus and a speed bias?

Or did she do what I think she was bred to do? "

--end quote

The short answer:

Sometimes they do a little of both. Admit it.

Little bit longer answer:

A close look at CPace rank stats (not counting last night's results) suggests the dirt surface for the 5.5f distance at MNR has been playing to early speed so far this meet.

I say that because CPace rank 1 and 2 are both well above historical norms vs. stats for CPace rank 1 and 2 in large samples:


     query start:         3/24/2015 9:57:39 AM
     query end:           3/24/2015 9:57:39 AM
     elapsed time:        0 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
     Betting Instructions: Testing Purposes Only
`
     UDM: _TrackDateExpression
`
     SQL:  SELECT * FROM STARTERHISTORY
           WHERE TRACK='MNR' 
           AND INTSURFACE <= 3 
           AND DIST = 1210 
           AND [DATE] >= #03-01-2015# 
           AND [DATE] <= #03-22-2015#
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals      279.00        267.00        188.00
     Bet               -354.00       -354.00       -354.00
     -----------------------------------------------------
     P/L                -75.00        -87.00       -166.00
`
     Wins                   34            66            49
     Plays                 177           177           177
     PCT                 .1921         .3729         .2768
`
     ROI                0.7881        0.7542        0.5311
     Avg Mut              8.21          4.05          3.84
`
`
     By: SQL-F20 Rank (CPace)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1        6.60      62.00     1.1065      12      31   .3871     2.0152  
      2       13.20      62.00     1.2129       9      31   .2903     1.5114
`  
      3      -24.20      62.00     0.6097       5      31   .1613     0.8397  
      4      -54.60      62.00     0.1194       2      31   .0645     0.3359  
      5       25.60      52.00     1.4923       4      26   .1538     0.8009  
      6      -21.60      34.00     0.3647       2      17   .1176     0.6125  
      7      -14.00      14.00     0.0000       0       7   .0000     0.0000  
      8       -4.00       4.00     0.0000       0       2   .0000     0.0000  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000

Whether or not this trend continues is anyone's guess.

However, if I run the same query through the Data Window with the data broken out as a Track Weight Report I get the following:


     query start:         3/24/2015 10:07:22 AM
     query end:           3/24/2015 10:07:22 AM
     elapsed time:        0 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
     Betting Instructions: Testing Purposes Only
`
     UDM: _TrackDateExpression
`
     SQL:  SELECT * FROM STARTERHISTORY
           WHERE TRACK='MNR' 
           AND INTSURFACE <= 3 
           AND DIST = 1210 
           AND [DATE] >= #03-01-2015# 
           AND [DATE] <= #03-22-2015#
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals      279.00        267.00        188.00
     Bet               -354.00       -354.00       -354.00
     -----------------------------------------------------
     P/L                -75.00        -87.00       -166.00
`
     Wins                   34            66            49
     Plays                 177           177           177
     PCT                 .1921         .3729         .2768
`
     ROI                0.7881        0.7542        0.5311
     Avg Mut              8.21          4.05          3.84
`
`
                                Actual         Expected     Actual/Expected
     Track Weight Analysis     Winners          Winners         Ratio 
     CPace 123                    24.00             29.84        0.8043
     CompoundLate 123             17.00             17.41        0.9767
`
`
     Win
     Roi
     1.1065 (CPace 1)
     0.7258 (CompoundLate 1)
`
`                                Actual         Expected     Actual/Expected
     Run Style Analysis        Winners          Winners         Ratio
     E                             5.00              8.08        0.6187
     EP                            2.00              2.09        0.9579
     P                            12.00             14.73        0.8147
     S                            11.00              5.45        2.0200
     NA                            1.00              0.66        1.5259
`
     Approximate Track Weight:    5.00  (Speed Tiring)

From the above sample I draw your attention to the following:

Approximate Track Weight: 5.00 (Speed Tiring)

Q. How can this be? After all, CPace rank 1 and 2 are off the charts, right?

A. This is what happens when you work with small samples and look at one thing only.

Let's look at the same sample broken out by a few of the program's early and late factors:


     query start:         3/24/2015 10:42:34 AM
     query end:           3/24/2015 10:42:34 AM
     elapsed time:        0 seconds
`
     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
     Betting Instructions: Testing Purposes Only
`
     UDM: _TrackDateExpression
`
     SQL:  SELECT * FROM STARTERHISTORY
           WHERE TRACK='MNR' 
           AND INTSURFACE <= 3 
           AND DIST = 1210 
           AND [DATE] >= #03-01-2015# 
           AND [DATE] <= #03-22-2015#
`
`
     Data Summary          Win         Place          Show
     -----------------------------------------------------
     Mutuel Totals      279.00        267.00        188.00
     Bet               -354.00       -354.00       -354.00
     -----------------------------------------------------
     P/L                -75.00        -87.00       -166.00
`
     Wins                   34            66            49
     Plays                 177           177           177
     PCT                 .1921         .3729         .2768
`
     ROI                0.7881        0.7542        0.5311
     Avg Mut              8.21          4.05          3.84
`
`
`
`
EARLY:
`
     By: 73   rankForPaceFig_2F_InLast
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1      -14.40      64.00     0.7750       7      32   .2188     1.1388  
      2       45.40      60.00     1.7567      11      30   .3667     1.9088  
      3        3.40      62.00     1.0548       7      31   .2258     1.1755  
      4      -57.60      66.00     0.1273       2      33   .0606     0.3155  
      5      -21.20      48.00     0.5583       4      24   .1667     0.8676  
      6      -34.00      34.00     0.0000       0      17   .0000     0.0000  
      7        7.40      16.00     1.4625       3       8   .3750     1.9522  
      8       -2.00       2.00     0.0000       0       1   .0000     0.0000  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000
`
`
`
     By: 74   rankForPaceFig_4F_InLast
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1      -18.00      64.00     0.7188       8      32   .2500     1.3015  
      2      -25.00      68.00     0.6324       9      34   .2647     1.3780  
      3       79.60      56.00     2.4214       9      28   .3214     1.6733  
      4      -45.60      60.00     0.2400       2      30   .0667     0.3471  
      5      -34.40      56.00     0.3857       4      28   .1429     0.7437  
      6      -32.00      32.00     0.0000       0      16   .0000     0.0000  
      7       -8.20      12.00     0.3167       1       6   .1667     0.8676  
      8       10.60       4.00     3.6500       1       2   .5000     2.6029  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000  
`
`
`
     By: CompoundAP Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1       16.00      62.00     1.2581      14      31   .4516     2.3510  
      2       -1.80      62.00     0.9710       7      31   .2258     1.1755 
      3      -30.20      62.00     0.5129       4      31   .1290     0.6717  
      4        7.00      62.00     1.1129       3      31   .0968     0.5038  
      5      -30.40      52.00     0.4154       4      26   .1538     0.8009  
      6      -34.00      34.00     0.0000       0      17   .0000     0.0000  
      7      -10.20      14.00     0.2714       1       7   .1429     0.7437  
      8       10.60       4.00     3.6500       1       2   .5000     2.6029  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000  
`
`
`
     By: CompoundE1 Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1      -16.20      62.00     0.7387       7      31   .2258     1.1755  
      2       -0.60      62.00     0.9903      11      31   .3548     1.8472  
      3      -13.60      62.00     0.7806       5      31   .1613     0.8397  
      4      -37.00      62.00     0.4032       4      31   .1290     0.6717  
      5       18.00      52.00     1.3462       4      26   .1538     0.8009  
      6      -24.00      34.00     0.2941       1      17   .0588     0.3062  
      7      -10.20      14.00     0.2714       1       7   .1429     0.7437  
      8       10.60       4.00     3.6500       1       2   .5000     2.6029  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000  
`
`
`
     By: CompoundE2 Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1      -13.20      62.00     0.7871       9      31   .2903     1.5114  
      2       10.00      62.00     1.1613      10      31   .3226     1.6793  
      3       -9.20      62.00     0.8516       6      31   .1935     1.0076  
      4      -54.00      62.00     0.1290       2      31   .0645     0.3359  
      5       27.00      52.00     1.5192       5      26   .1923     1.0011  
      6      -34.00      34.00     0.0000       0      17   .0000     0.0000  
      7      -10.20      14.00     0.2714       1       7   .1429     0.7437  
      8       10.60       4.00     3.6500       1       2   .5000     2.6029  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000  
`
`
`
     By: CPace Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1        6.60      62.00     1.1065      12      31   .3871     2.0152  
      2       13.20      62.00     1.2129       9      31   .2903     1.5114  
      3      -24.20      62.00     0.6097       5      31   .1613     0.8397  
      4      -54.60      62.00     0.1194       2      31   .0645     0.3359  
      5       25.60      52.00     1.4923       4      26   .1538     0.8009  
      6      -21.60      34.00     0.3647       2      17   .1176     0.6125  
      7      -14.00      14.00     0.0000       0       7   .0000     0.0000  
      8       -4.00       4.00     0.0000       0       2   .0000     0.0000  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000  
`
`
`
     By: PMI Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1       18.80      62.00     1.3032      15      31   .4839     2.5190  
      2      -22.00      62.00     0.6452       5      31   .1613     0.8397  
      3       -6.80      62.00     0.8903       7      31   .2258     1.1755  
      4       14.00      62.00     1.2258       4      31   .1290     0.6717  
      5      -25.00      52.00     0.5192       3      26   .1154     0.6007  
      6      -34.00      34.00     0.0000       0      17   .0000     0.0000  
      7      -20.00      20.00     0.0000       0      10   .0000     0.0000  
      8        0.00       0.00     0.0000       0       0   .0000     0.0000  
      9        0.00       0.00     0.0000       0       0   .0000     0.0000  
`
`
`
     By: Avg E1 Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1      -23.00      62.00     0.6290       8      31   .2581     1.3435  
      2      -12.20      62.00     0.8032       9      31   .2903     1.5114  
      3       -2.40      62.00     0.9613       6      31   .1935     1.0076  
      4       36.60      62.00     1.5903       5      31   .1613     0.8397  
      5      -32.40      52.00     0.3769       3      26   .1154     0.6007  
      6      -26.60      34.00     0.2176       2      17   .1176     0.6125  
      7      -14.00      14.00     0.0000       0       7   .0000     0.0000  
      8        1.00       4.00     1.2500       1       2   .5000     2.6029  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000  
`
`
`
     By: EarlyConsensus Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1        0.40      64.00     1.0063      12      32   .3750     1.9522  
      2      -13.60      62.00     0.7806       6      31   .1935     1.0076  
      3       -9.80      62.00     0.8419       6      31   .1935     1.0076  
      4      -41.20      66.00     0.3758       5      33   .1515     0.7888  
      5       24.80      46.00     1.5391       3      23   .1304     0.6790  
      6      -15.60      34.00     0.5412       2      17   .1176     0.6125  
      7      -14.00      14.00     0.0000       0       7   .0000     0.0000  
      8       -4.00       4.00     0.0000       0       2   .0000     0.0000  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000  
`
`
`
     By: Q Speed Points Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1       -3.80      78.00     0.9513      12      39   .3077     1.6018  
      2       22.20      68.00     1.3265       8      34   .2353     1.2249  
      3      -42.60      74.00     0.4243       4      37   .1081     0.5628  
      4      -41.40      62.00     0.3323       3      31   .0968     0.5038  
      5       -0.40      36.00     0.9889       3      18   .1667     0.8676  
      6       -3.80      22.00     0.8273       2      11   .1818     0.9465  
      7       -4.20       8.00     0.4750       1       4   .2500     1.3015  
      8        1.00       4.00     1.2500       1       2   .5000     2.6029  
      9       -2.00       2.00     0.0000       0       1   .0000     0.0000
`
`
`
     By: Q Speed Points Number
`
     Q SpdPts   P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      0      -33.40      76.00     0.5605       6      38   .1579     0.8220  
      1       -5.80      32.00     0.8188       3      16   .1875     0.9761  
      2      -26.80      56.00     0.5214       3      28   .1071     0.5578  
      3       25.80      66.00     1.3909       8      33   .2424     1.2620  
      4       -9.20      22.00     0.5818       2      11   .1818     0.9465  
      5      -31.00      48.00     0.3542       3      24   .1250     0.6507  
      6       -7.20      20.00     0.6400       3      10   .3000     1.5618  
      7        4.00      16.00     1.2500       2       8   .2500     1.3015  
      8        8.60      18.00     1.4778       4       9   .4444     2.3137

`
`
`
`
`
LATE:
`
     By: SQL-F23 Rank (CompoundLate)
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1      -17.00      62.00     0.7258       7      31   .2258     1.1755  
      2      -29.40      62.00     0.5258       5      31   .1613     0.8397  
      3      -24.00      62.00     0.6129       6      31   .1935     1.0076  
      4       53.60      54.00     1.9926       8      27   .2963     1.5425  
      5      -40.00      50.00     0.2000       2      25   .0800     0.4165  
      6      -22.60      38.00     0.4053       3      19   .1579     0.8220  
      7        7.40      18.00     1.4111       2       9   .2222     1.1569  
      8        3.00       2.00     2.5000       1       1  1.0000     5.2059  
      9       -6.00       6.00     0.0000       0       3   .0000     0.0000
`
`
`
     By: LateConsensus Rank
`
     Rank       P/L        Bet        Roi    Wins   Plays     Pct     Impact
     -----------------------------------------------------------------------
      1       -9.80      72.00     0.8639      10      36   .2778     1.4461  
      2      -37.60      62.00     0.3935       4      31   .1290     0.6717  
      3      -17.40      58.00     0.7000       6      29   .2069     1.0771  
      4       23.60      66.00     1.3576       8      33   .2424     1.2620  
      5      -11.80      44.00     0.7318       2      22   .0909     0.4733  
      6      -13.00      38.00     0.6579       3      19   .1579     0.8220  
      7       -8.00       8.00     0.0000       0       4   .0000     0.0000  
      8       -4.00       4.00     0.0000       0       2   .0000     0.0000  
      9        3.00       2.00     2.5000       1       1  1.0000     5.2059

The above sample shows 5.5f races on the dirt at MNR from opening day March 01, 2015 through March 22, 2015. I purposely did not include data from last night's card (the 23rd) because I did not want the sample to include the horse Chuck was discussing in R1.

I've broken the above Data Window sample out into two sections. The first contains most of the program's early based factors. The second contains the program's primary late factors CompoundLate and LateConsensus.

I've also highlighted rows in the Early section using red text wherever the row for a given rank had significantly higher win rate and/or roi than what you'd normally expect to see for that factor and rank if you were looking at a large sample.

I did the same for the matrices in the Late section too.

Each of the early factor matrices shows something that suggests (at least in the sample presented) that the individual factor has been outperforming its historical norm.

Both of the late factor matrices shows the row for rank=4 outperforming one or more of the higher ranked rows.

Taken individually and viewed within the context that we are looking at a matrix for a single factor in a small sample only my conclusion tends to be that I am looking at small sample noise.

However, taken collectively and viewed within the context that the matrices for many different factors are behaving in a similar manner that is different from the norm - even though we are looking at sample spanning a few weeks only - my conclusion changes.

In this case, if only one or two early factors where outperforming the norm, and the others weren't, my conlusion would tend to be that I was looking at small sample noise.

But when most if not all of the factor matrices are displaying a similar pattern, I start to see something along the lines of preponderance of the evidence in the data.

And based on that I have to start asking myself if the surface is biased.

When analyzing data:

It often pays to look at a sample many different ways. Try to apply some critical thinking to what you are seeing.

If you look at something simplistic or look at one thing only you increase your chances of being misled.

But if you look at many different data points (think lots of trees if you will) and apply some critical thinking to what you are seeing - I think your chances of being misled go down exponentially.

-jp

.

NYMike
4/17/2015
2:16:22 PM

Jeff,
Back to R.

You wrote:
summary(mlogit(winpayoff ~ odds + valf13 + valf27 -1, data = x))

How would you write this same equation in R if you wanted to add gapF18 but only those above -10?

Mike

jeff
4/19/2015
11:42:56 AM

This falls within an area called Data Transformations.

FYI, this is an area where you get your chance to shine as a modeler. Imo, the degree of creativity, ingenuity, and critical thinking that goes into transforming the data is often where you as a player separate yourself the crowd.

Put anowther way: Data Transformations is often the area where you as a player generate your edge.

To answer your question:

I would add an additional column to my .csv file and give it a unique name that describes the data in it. Something like F18MinGap would probably do the trick.

After populating the .csv file in the normal manner, I would write - and then run - a special routine to populate the new F18MinGap column.

There are probably several options here, but speaking from experience - I've had success handling similar situations as follows:

• Populate the column with a numeric 0 (to indicate False) whenever GAPF18 fails to meet the >= -10 min value constraint.

• Populate the column with a numeric 1 (to indicate True) whenever GAPF18 meets the >= -10 min value constraint.

From there, once the .csv file has been populated and the data verified, I'd launch R and turn it loose on the file...

library(csvread)

library(mlogit)

y <- read.csv("c:/jcapper/exe/tam7f2014.csv")

map.coltypes("c:/jcapper/exe/tam7f2014.csv", header = TRUE, nrows = 100, delimiter = ",")

x <- mlogit.data(y,choice="winpayoff",shape="long",id.var="id",alt.var="horsename")

summary(mlogit(winpayoff ~ odds + valf13 + valf27 + f18mingap -1, data = x))

Keep in mind that if the report generated by R (or whatever stat package you are using) indicates that the 0's and 1's in the F18MinGap column are significant enough that you decide to add F18MinGap as another factor to your model:

It should be obvious that F18MinGap as described above (or any new data point you create for that matter) doesn't exist in JCapper.

Because of that, if you decide to use it, you'll need to write a routine you can run on race day that evaluates the GAPF18 field in the StartersToday table and writes the 0 or 1 F18MinGap indicator to a custom file or table so that it can be read and from there fed into your custom pricing model as an input.

Hope I managed to explain most of that in a way that makes sense.

-jp

.

~Edited by: jeff on: 4/19/2015 at: 11:42:56 AM~

JCapper Message Board

General Discussion -- Statistical programs

General Discussion
-- Statistical programs