JCapper Message Board

JCapper 101
-- Starting Factors

Home	Register Log In

Starting Factors

Caveat
8/9/2012
7:36:50 AM

Hi All...

I an still new to this and learning..spent a lot of time in old message boards , last week or so..

I'm playing around with constructing UDM's and would like to ask a question..

In doing a track UDM , I believe that a good starting factor would be UPR, CPACE, AFR ,CFA...etc..
What would you do when none of those are near the top in ROI?
I have factors like these near the top...

code:

•	FASTSLOWFINAL       103     32      0.3107  2.5241    1.1738    48        0.466   0.9112  
•	CXN                 103     31      0.301   2.4453    1.1563    51        0.4951  1.0165  
•	WILLTOWIN           176     19      0.108   0.8774    1.1313    44        0.25    0.9568  
•	LASTRACEBRISFIG     111     33      0.2973  2.4152    1.1       52        0.4685  0.9221  
•	COMPOUNDLATE        103     18      0.1748  1.42      1.0893    30        0.2913  0.7447  
•	LATESLANT           103     25      0.2427  1.9716    1.0592    42        0.4078  0.932   
•	BETTORSTOTEPROB     103     45      0.4369  3.5493    1.0282    65        0.6311  0.984   
•	CONSISTENCY         103     26      0.2524  2.0504    1.0243    46        0.4466  0.9995  
•	POST TIME FAVS      115     49      0.4261  3.4616    1.0209    71        0.6174  0.9752  
•	QSPEEDPOINTS        131     33      0.2519  2.0464    1.0038    58        0.4427  1.0011  
•	TURNTIME            103     26      0.2524  2.0504    0.9728    42        0.4078  0.8068  
•	COMPOUNDE1          103     25      0.2427  1.9716    0.967     41        0.3981  0.8573  
•	AVGE1               105     26      0.2476  2.0115    0.9638    43        0.4095  0.8124

Ths
Mike

Charlie James
8/9/2012
11:28:03 AM

Imho, you are [still] ignoring some very good advice given out in the private section of this board. That advice went something like this: Model the big picture and don't cherry pick from among small sample results.

The danger in cherry picking from small sample results is that U R going down a path likely 2 get U the world's biggest back-fit. And when that back-fitted model doesn't produce the same good results going fwd U sit there wondering why.

[Not U personally -- but U meaning newbie players in general.]

We were lucky enough to have the program's author present us with a "what it takes" write up -- backed up by 4 years of data.

I'm going to ask a serious question:

Do U believe U have the requisite talent and insight into what it takes in this game to go down a path that ignores the advice U were given?

~Edited by: Charlie James on: 8/9/2012 at: 11:25:43 AM~

~Edited by: Charlie James on: 8/9/2012 at: 11:27:31 AM~

~Edited by: Charlie James on: 8/9/2012 at: 11:28:03 AM~

Charlie James
8/9/2012
12:07:25 PM

Example -- SAR dirt sprints 2011:

code:

     query start:         8/9/2012 9:37:21 AM
     query end:           8/9/2012 9:37:23 AM
     elapsed time:        2 seconds

     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
     Betting Instructions: Testing Purposes Only

     UDM: 0_1TrackDateExpression

     SQL:  SELECT * FROM STARTERHISTORY
           WHERE TRACK='SAR' 
           AND INTSURFACE <= 3 
           AND DIST < 1760 
           AND [YEAR] = 2011


     Data Summary         Win     Place      Show
     Mutuel Totals    2067.00   2189.90   2122.30
     Bet             -2628.00  -2628.00  -2628.00
     Gain             -561.00   -438.10   -505.70

     Wins                 177       356       505
     Plays               1314      1314      1314
     PCT                .1347     .2709     .3843

     ROI               0.7865    0.8333    0.8076
     Avg Mut            11.68      6.15      4.20


     ****************************************************************************************
     Key Factors Rank = 1 sorted by Win ROI                  Run Date: 8/9/2012 9:37:23 AM
     ****************************************************************************************
                                            WIN  WIN          WIN               PLACE   PLACE
     FACTOR           PLAYS    WINS         PCT  IMPACT       ROI  PLACES         PCT     ROI
     ****************************************************************************************
     COMPOUNDAP F18      182     59      0.3242  2.4068    1.1637    82        0.4505  0.9255  
     FIGCONSENSUS F13    191     59      0.3089  2.2932    1.1301    92        0.4817  1.0099  
     JPRCLASS F28        172     57      0.3314  2.4602    1.0718    86        0.5     0.9945  
     JRATING F29         172     50      0.2907  2.1581    1.0613    75        0.436   0.9087  
     USERFACTOR3 F33     172     51      0.2965  2.2011    1.048     74        0.4302  0.8547  
     POWERCONSENSUS F32  180     59      0.3278  2.4335    1.0342    86        0.4778  0.8928  
     UPR                 172     55      0.3198  2.3741    1.0337    86        0.5     0.9721  
     COMPOUNDSP F24      182     51      0.2802  2.0801    1.0327    84        0.4615  0.9819  
     USERFACTOR4 F34     172     52      0.3023  2.2442    1.0203    78        0.4535  0.8919  
     TPACE F11           182     55      0.3022  2.2435    1.0091    87        0.478   1.0195  
     CLASSCONSENSUS F27  193     63      0.3264  2.4231    1.0085    94        0.487   0.9554  
     WEIGHTEDFIG F12     182     49      0.2692  1.9985    1.0077    74        0.4066  0.8624  
     UPRMLPROB           173     61      0.3526  2.6176    1.0061    91        0.526   0.9624  
     JPRMLPROB           173     57      0.3295  2.4461    0.9971    87        0.5029  0.9318  
     PACEFIG F10         184     53      0.288   2.138     0.9965    83        0.4511  0.9383  
     JPR                 172     53      0.3081  2.2873    0.9826    82        0.4767  0.9238...

Now -- SAR dirt sprints 2012:

code:

     query start:         8/9/2012 9:40:15 AM
     query end:           8/9/2012 9:40:16 AM
     elapsed time:        1 seconds

     Data Window Settings:
     Connected to: C:\JCapper\exe\JCapper2.mdb
     999 Divisor  Odds Cap: None
     Betting Instructions: Testing Purposes Only

     UDM: 0_1TrackDateExpression

     SQL:  SELECT * FROM STARTERHISTORY
           WHERE TRACK='SAR' 
           AND INTSURFACE <= 3 
           AND DIST < 1760 
           AND [YEAR] = 2012


     Data Summary         Win     Place      Show
     Mutuel Totals    1039.60   1014.40    919.40
     Bet             -1088.00  -1088.00  -1088.00
     Gain              -48.40    -73.60   -168.60

     Wins                  69       140       210
     Plays                544       544       544
     PCT                .1268     .2574     .3860

     ROI               0.9555    0.9324    0.8450
     Avg Mut            15.07      7.25      4.38




     ****************************************************************************************
     Key Factors Rank = 1 sorted by Win ROI                  Run Date: 8/9/2012 9:40:16 AM
     ****************************************************************************************
                                            WIN  WIN          WIN               PLACE   PLACE
     FACTOR           PLAYS    WINS         PCT  IMPACT       ROI  PLACES         PCT     ROI
     ****************************************************************************************
     PEDIGREE F15        70      13      0.1857  1.4641    1.2836    31        0.4429  1.3221  
     CPACE F20           69      16      0.2319  1.8283    1.158     25        0.3623  0.813   
     EARLYCONSENSUS F19  74      17      0.2297  1.811     1.1473    28        0.3784  0.9014  
     MORNINGLINE         71      29      0.4085  3.2206    1.0718    39        0.5493  0.8915  
     AFR F01             69      15      0.2174  1.714     0.9746    24        0.3478  0.7645  
     USERFACTOR4 F34     69      18      0.2609  2.057     0.9565    25        0.3623  0.6891  
     BETTORSTOTEPROB     70      26      0.3714  2.9281    0.9129    40        0.5714  0.92    
     USERFACTOR3 F33     69      19      0.2754  2.1713    0.8877    29        0.4203  0.8739  
     WOBRILL F04         70      14      0.2     1.5768    0.8814    21        0.3     0.7121  
     PRIME F31           73      22      0.3014  2.3763    0.8589    38        0.5205  0.9164  
     POST TIME FAVS      71      25      0.3521  2.776     0.8542    41        0.5775  0.931   
     CLASSCONSENSUS F27  71      20      0.2817  2.2209    0.8275    29        0.4085  0.7549  
     FORM F03            72      14      0.1944  1.5327    0.8174    23        0.3194  0.8694  
     BASICFITNESS F02    105     19      0.181   1.427     0.7633    30        0.2857  0.8033  
     CFA F08             69      16      0.2319  1.8283    0.7616    27        0.3913  0.7355  
     RACESTRENGTH F16    70      19      0.2714  2.1397    0.7593    29        0.4143  0.7021  
     JPRCLASS F28        69      19      0.2754  2.1713    0.7543    31        0.4493  0.7725  
     JPR                 69      17      0.2464  1.9426    0.7442    30        0.4348  0.7812  
     OPTIMIZATION F30    79      14      0.1772  1.3971    0.7437    22        0.2785  0.6468  
     USERFACTOR5 F05     69      11      0.1594  1.2567    0.7196    21        0.3043  0.9645  
     FORMCONSENSUS F07   72      13      0.1806  1.4239    0.7076    19        0.2639  0.6979  
     UPRMLPROB           69      19      0.2754  2.1713    0.6978    30        0.4348  0.6957  
     COMPOUNDSP F24      70      17      0.2429  1.915     0.6836    29        0.4143  0.7364  
     FASTSLOWFINAL F09   70      16      0.2286  1.8023    0.6829    30        0.4286  0.8007  
     COMPOUNDAP F18      69      17      0.2464  1.9426    0.6812    30        0.4348  0.8565...

In 2011 CompAP rank=1 was the top factor [32% winners 1.16 roi.] But so far in 2012 CompAP rank=1 only 24% winners and 0.68 roi.

Q. Why?

A. Speaking strictly for myself, I haven't the 1st clue. The best insight I can come up with after looking at this and hundreds of similar [small] data samples is:

Because that's the way horse racing data behaves.

Q. Caveat, can you [or anyone else?] provide me with insight -- backed up by reasoning and data samples -- that files in the face of this? -- Can U or anyone else tell me WHEN to expect cherry picked results to perform well going fwd and why?

Because if I could only know ahead of time when to cherry pick and what to cherry pick [and why] -- this game would be ridiculously EZ to beat.

Until such time as that happens I will continue on the path that has produced reasonably good results -- at least 4 me: I will stick 2 modeling the big picture.

~Edited by: Charlie James on: 8/9/2012 at: 12:05:21 PM~

~Edited by: Charlie James on: 8/9/2012 at: 12:07:25 PM~

jeff
8/9/2012
1:58:04 PM

Searching through past posts on the same topic, I came up with the following thread:
http://www.jcapper.com/messageboard/TopicReader.asp?topic=1105&forum=JCapper%20101

I thought Steve's reply was relevant and the bolded text from his quote was put there by me to emphasize what he said:

--quote:

"I agree with the above. All too often, factors which do a great job of filtering out losers and boosting your ROI in your development sample, end up filtering out winners in your fresh data. You always want to keep independent data available to test anything you're doing."

--end quote

In my opinion, after creating a UDM, any UDM, be it based on any concept that involves R&D using large sample or small: You want to validate performance going forward by confronting the UDM with races from outside the sample used when developing the UDM. And hold off betting real money on the UDM until you see clear evidence that the concept encapsulated in the UDM is "validating" or performing well going forward in time.

I also found the following thread on Track Profile Theory:
http://www.jcapper.com/MessageBoard/TopicReader.asp?topic=135&forum=General

Speaking strictly for myself (and admittedly it is an acquired skill) I have had success in the past when I am able to relate Data Window results to a physical cause.

For example, the SAR 2012 dirt sprint results that Chuck posted above, I see clear evidence of an early speed bias. I say that because I know at a glance that CPace and EarlyConsensus performance in those results is above historical norms.

But what cements it for me is watching races run there so far this meet. The leaders aren't getting tired. Also, a look at the overhead camera "snapshot" talked about in the Track Profile Theory thread indicates (at least to me) that the numbers in the Data Window aren't the result of some random cosmic accident - that they are in fact being produced because there is an actual speed bias.

Q. What causes that bias? Is it the weather? Humidity? Track maintenance? Or are other unexplained phenomena at work?

A. I haven't the first clue. But I do know from looking at video of the horses and from looking at overhead snapshots of where the horses are when the winner breaks the plane of the finish line and from numbers in the Data Window that a speed bias is there.

Q. Will that speed bias continue?

A. I haven't the first clue. If you believe in track profile theory - go for it.

If it suddenly reverses tomorrow: don't be surprised.

If it holds up through closing day - but weather, humidity, track maintenance next year cause the same surface to favor closers: don't be surprised by that either.

-jp

.

Charlie James
8/9/2012
4:52:01 PM

Jeff, Love the overhead snapshot concept. Brilliant if U ask me.

Fwiw I've never personally been able to lock into a speed bias early enough to take advantage. By the time I see it so has everybody else -- and the boxcar prices have already been paid out. By the time I pick up on it, oh the bias is still there -- but the pubilc k_n_o_w_s and the result is a chalk parade [which causes the trainers to complain to management who in turn tells the track super to harrow deeper.]

For some reason I have no problem using speed and separation as the universal bias. Imho still the single best trip to the winners circle [even after the advent of polyshite.] I also have no problem working to educate myself -- to come up with a short list of trainers with the proven talent to prep their babies in such a way to take full advantage of the so called universal bias. -- Made easier I guess by years of going to KEE each fall to follow who bought what and for whom -- and then watch it all unfold the following spring as the babies grow up and work their way through the condition book.

A good buddy of mine used to go to Fla and then later in life Ariz every March to watch the kids in the farm system vie for spots in the big leagues. Point is -- follow this or any game closely enough and going beyond the numbers [I like that phrase] suddenly within reach.

Different ways to skin a cat I guess.

~Edited by: Charlie James on: 8/9/2012 at: 4:50:18 PM~

~Edited by: Charlie James on: 8/9/2012 at: 4:52:01 PM~

jeff
8/10/2012
12:06:16 AM

Mike, I wanted to make a few specific comments about your post.

In my way of doing things, there are two different types of UDMs and each has a very different yet specific purpose:

1. The Business UDM -When Chuck posted the words "model the big picture" (and bluntly I might add) I'm about 99% sure he was talking about Business UDMs.

In JCapper terminology, a Business UDM is a UDM designed to point out horses that are very close to being automatic bets - those that have lots of hidden positive attributes in their past performance records. In the "what it takes" write up, I hope I was able to make the point that the phrase positive hidden attributes is in no way limited to "handicapping" in the traditional sense. (In fact, the topic of "handicapping" was purposely avoided.) Instead of basing the modeling process around attributes tied to the horse - the process was turned on its head - and the "handicapping" was instead focused on (less than perfect) public betting behavior.

When a business UDM flags a horse on one of my reports - it (rightly) deserves my intense focus. I say that because years of Data Window R&D and wager history analysis very clearly tells me those are the UDMs driving my profits.

That last sentence describes what is meant by the term "Business UDM."

By the way, I fully agree with Chuck. When creating a Business UDM, forget the small sample and the track specific. Model the big picture instead.

2. The Layerng UDM - In JCapper terminology, the Layering UDM can be anything (small sample or large) that adds an additional "layer" of knowledge to the player's understanding of something related to either the race or individual horses in the race. A layering UDM can literally be based on anything.

Last Thurs I spent a day at DMR and ran into John Doyle. For those of you who may not be aware, John was the overall winner of the NHC tournament in Jan 2011. (No, John is not a JCapper guy.) He mostly uses the DRF and a ball point pen.

IMHO, if horse racing were chess John would be a grand master while the rest of us (myself included) would be local tournament players at best. It's both scary and amazing how quickly he subconsciously homes in on patterns.

Anyway, we came to a Mclm race with a Sadler FTS in it. John instantly knows that Sadler is 3 for 8 at DMR with FTS... win pct .375. Quick odds conversion = approx 8/5.

Conversely, John says that Sadler is 1 for 11 with FTS in SPLWT at DMR and wouldn't touch a Sadler FTS in a SPLWT at DMR with my money... we'll maybe with MY money! but definitely not his money.

Me? I sense small sample syndrome at work. Not liking anything else in the race I pass.

John loads up on what he sees as very generous odds (7/2) and watches his Sadler FTS stalk the pace while in hand, pull even at about the 1/8th pole - and then win going away late.

Later that night over beers and dinner at a nice place, John and I are talking "shop." It occurs to me that in this case, even though he's not a JCapper guy - John had a "Layering" UDM in his head based on Sadler FTS's in MClm races at DMR.

It also occurs to me that John won the NHC and I didn't. That fact is not lost on me.

(It's not like I didn't have a good day myself. I did. But it's just like Chuck says... many ways to skin a cat.)

Below are a couple of links to some older posts in the private section of the board where I laid out a few of my thoughts on Layering UDMs and how best to use them.

JCapper Under The Hood - Jan, 2011:
http://www.jcapper.com/MessageBoard/TopicReader.asp?topic=903&forum=Private

Stunned by JCapper yet again - June, 2012:
http://www.jcapper.com/MessageBoard/TopicReader.asp?topic=1277&forum=Private

Wrapping this up... if that's even possible... the Layering UDM absolutely CAN be based on the small sample and the track specific. Some of mine are.

My best successes always seem to come about where Business UDM and Layering UDM meet.

-jp

.

~Edited by: jeff on: 8/10/2012 at: 12:06:16 AM~

Caveat
8/10/2012
11:30:03 AM

Thxs Charlie!!
While I was at work yesterday, I took a peek to see if I got a response..I glance over it quickly and after I saw the charts that you put up.. then it hit me.
At first when you mentioned back-fitting and the big picture , I wasn't sure what you were talking about and was shy to ask.
Now, with those charts, I can see if I had built a UDM based on last years numbers..it would have been disastrous come 2012.
2011 showed average paced horses having the advantage...this year it would be early horses having the advantage.
You asked what could be cherry picked, I'm guessing maybe trainers, connections , post bias...others ..if the data shows up again on those factors
Databases is completely new to me ..so please be patient :)
Moving forward, I now have data on what happened way back and data on whats happening now , I would put an emphasis toward early
Jeff, thxs for taking the time to post a reply, I will get to your stuff..soon

Mike

Things have to sink in little by little

~Edited by: Caveat on: 8/10/2012 at: 9:25:08 AM~

Stupid me...I was thinking that there was only one page in JCP 101...cause I didnt find page numbers at the bottom...
Theres a back button!!..WOW...Tons of reading :)

~Edited by: Caveat on: 8/10/2012 at: 11:06:26 AM~

~Edited by: Caveat on: 8/10/2012 at: 11:06:53 AM~

~Edited by: Caveat on: 8/10/2012 at: 11:30:03 AM~

Windoor
8/11/2012
9:47:02 PM

I will add my two cents on why one years results are (can be) so much different from one year to the other and what you might be able to do about it.

Need I say it? All in my most humble opinion.

I believe there are many different kinds of races other than simply Maiden, Claiming, Allowance, Stakes ,Handicap, and variations of same. Having said that, I believe they all can be broken down by Track, Distance, Class, Age, Surface (including today's variant) Sex and time of year.

I call this " The Seven" and is how I separate the different kinds of races. When you consider all of the sub categories for each and possible combinations of them, there are many, many and more to take into consideration. You now see just how complex our problem can be when deciding on what factors to use in our UDM's.

Some factors can transcend many categories. Others, like some performance factors (speed and pace) can drastically change with the track conditions, and this can happen on a daily basis at some tracks. I tend to stay away from them (performance factors) even though they can indeed show a healthy win percent. It's the average odd I object to.

So what changed from this year to last year than made such a drastic difference in the ROI? As mentioned above, it could be a simple track bias. It also could be the "type" of races being run. Maybe a lot more of "non winners of three" or one, or two, or conditional allowance races, etc. It can be many things. Knowing what is going to be the dominant factor today, for this "type" of race is the real challenge in my view.

I now have Seven Key factors (only one is based on speed), Seven Primary factors, and Seven Secondary factors. They all have value, but some really shine when a specific "type" of race comes along.

I use to have a signature statement that says, " The Numbers Have Hinges". This is a reference to factors who's "Value" has change due to the type of race being run. A top ranked factor that used to give us a lot of winners, may now be nearly useless due to changes in the track surface or "types" of races being run.

I would recommend to anyone who is still struggling to maintain a profit, to break down their plays by the Seven. Pick one from each category, and build a UDM for it. Test it for a three year period (or more), or at least a few hundred consecutive races. Then support it with a small bank to see if it grows. It may very well work at other tracks, class levels, distances or any of the seven. Only a large database that can show enough consecutive plays can tell you if it has value or not.

Even then, there is no guarantee that it will work going forward. I start each with a very small bank, and only increase the wager when the bank has grown enough to support it.

Greed and impatience kills. The discipline to wait for them is also mandatory.

Regards,

Windoor.

~Edited by: Windoor on: 8/11/2012 at: 9:45:37 PM~

~Edited by: Windoor on: 8/11/2012 at: 9:47:02 PM~

Charlie James
8/11/2012
11:21:08 PM

--Quote:

"

So what changed from this year to last year than made such a drastic difference in the ROI? As mentioned above, it could be a simple track bias. It also could be the "type" of races being run. Maybe a lot more of "non winners of three" or one, or two, or conditional allowance races, etc. It can be many things. Knowing what is going to be the dominant factor today, for this "type" of race is the real challenge in my view.

"
--End Quote.

Re: the bolded part -- Truer words never spoken.

Sharp post from start to finish.

~Edited by: Charlie James on: 8/11/2012 at: 11:21:08 PM~

Caveat
8/12/2012
7:48:06 AM

Thxs guys...the knowledge is building :)

Mike

JCapper Message Board

JCapper 101 -- Starting Factors

JCapper 101
-- Starting Factors