JCapper Message Board

JCapper 101
-- How to create a UDM

Home	Register Log In

How to create a UDM

CalvinJ
8/19/2009
12:40:57 AM

I would like to establish a UDM which alerts me to horses which have gained 1 length in the first call, of either the last race or the next to last race. Also I would like to develope a UDM which alerts me to horses which have gained 1 length in the last call of either the last race or the next to last race. Is this possible?

I'm fairly new to JCapper and am probably asking a dumb question; please forgive me, I've a lot to learn. But I have learned this; the program is a lot of fun. Fun, because if one follows the instructions Jeff provides, "things" actually work the way he says they work. I must say, I've experienced very few problem areas. I think the main reason why I'm enjoying myself, is because the instructions are clear and workable, with some practice.

Anyway, thanks much, if anyone has any advice.

calvinj...

jeff
8/19/2009
1:58:40 PM

Welcome to the board. Grin

Right now there isn't a way to flag lengths gained/lost from one call to the next in a UDM. Very early on (mid 1990's) I did R&D into that and didn't see enough promise to warrant adding the capability.

The "gotcha" that happens when UDMs are based strictly on lengths gained / lost from one call to the next is context. Context being the strength of the competition from the race represented by the running line being looked at.

For example: Stretch Gain is seen by many as a positive sign. But stretch gain against weaker competition might not be nearly as strong a positive as lack of stretch gain vs. stronger foes - or stretch loss vs. significantly stronger foes.

For that reason I found myself getting inconsistent results (remember this was back in the mid 1990's) using raw lengths gained in my early UDMs.

Later, I found better results roi-wise by focusing on the CONCEPT that lengths gained / lost represented.

For example, the concept (at least for me) that stretch gain represents is closing ability. Factors that represent closing ability in a way that measures the closing ability of the horses in today's field relative to each other might be: LateSlant, LatePace3, CompoundLate, etc.

Raw lengths gained or lost fail to capture the relative to each other part.

OK, that said - it doesn't mean the capability to measure raw lengths gained / lost from one call to the next wouldn't be useful in UDMs when combined with factors added to the program since the last time I looked at it.

I happen to think it would.

This post is getting long. So I'll stop here. To be continued...

-jp

.

CalvinJ
8/19/2009
2:18:29 PM

Thanks much Jeff for the info, which is helpful. I grabbed this idea from Jim Lehane's work, "Calibration Handicapping", and was going to see in the Data Window, if there is any promise to this concept. His theory isn't based upon ROI, necessarily, it is based upon locating horses that project readiness for their next racing event. I purchased his book about the same time I purchased JCapper and haven't really investigated the elements he describes in his material.

Thanks
calvinj...

jeff
8/19/2009
2:50:02 PM

Continued...

In one of my posts from a few days ago in the Private area of the message board I hinted that I was working on a Data Window capable of running queries 20 times faster than the Data Window you have now. Depending on the type of query run "20 times faster" might be an understatement.

Right now I have about 18 months of data loaded into a StarterHistory Table. Total starters is just over 335,000 and total races is just over 40,000.

In the new Data Window, most of my UDMs come back with results in 4 to 15 seconds when run against that database. Running the same UDMs against a playlist file containing the same starter history takes several minutes.

Where am I going with this and how does it relate to lengths gained and lost from one call to the next?

The Data Window that I'm describing here uses a completely different architecture than the one you have now. In the new Data Window the user has the ability to write and execute a SQL expression.

For example, executing the following SQL Expression in the new Data Window is exactly the same thing as running an ALL Button Data Window Query in the current Data Window:

SELECT * FROM StarterHistory

One key difference being that results come back faster.

But wait - it gets better.

In the current program architecture the user can't do any programming. He or she (yes there are a handful of she's now.) has to rely on me Jeff the programmer to develop an interface so that ideas like lengths gained or lost can be captured in a UDM Definition.

Well what if you the user had the ability to do a little programming of your own? What if you could execute something like the following SQL Expression in the new Data Window?

SELECT * FROM StarterHistory WHERE posstrcall <> 1 AND posfincall <> 1 AND (lenstrcall - lenfincall) >= 1

In the new Data Window the above SQL Expression would return results for one case (horses that were not 1st at either the stretch call or finish call) that gained at least 1L in the stretch. Other SQL Expressions could be created to capture the remaining cases needed to identify horses that gained at least 1L in the stretch.

Under the new program architecture, the user has the ability to do a little programming of their own - and SQL Expressions like the above can be executed in the Data Window - and later made part of a UDM Defintion.

The result being:

1. Results in the Data Window come back FASTER

2. Users with ideas don't have to lean on ME as much to test their ideas. If the raw data exists in the StarterHistory table - then there's a way to write a SQL Expression to return results based on the user's idea.

So instead of you the user having to ask me to create new preset filters or new factors to get at your ideas, most of the time you just might find all you have to do is ask: "Hey, what SQL Expression do I need to do X?"

Did I mention those results are going to come back faster? Grin

Just so you know, phase one of the new architecture will be the new Data Window that I've mentioned. It's looking like early September, 2009 is a real possibility for a release date. Phase two will have the same architecture built into the UDM Wizard and Profile Marker - to follow as shortly afterwards as humanly possible.

-jp

.

~Edited by: jeff on: 8/19/2009 at: 2:50:02 PM~

CalvinJ
8/19/2009
3:02:44 PM

I must admit, this sounds very exciting. I guess I'll need to learn some "code". It sounds like we're entering a whole "new" era of development.

Very exciting, Jeff. Thanks.

jeff
8/19/2009
5:30:33 PM

Just for fun, I ran the SQL Expression from my above post through the new Data Window broken out by my own UPR numeric value.

Here it is:


     query start:         8/19/2009 2:14:15 PM
     query end:           8/19/2009 2:14:48 PM
     elapsed time:        33 seconds
     `
     Data Window Settings:
     999 Divisor  Odds Cap: None
     `
     SQL: SELECT * FROM STARTERHISTORY 
          WHERE POSSTRCALL <> 1 
          AND POSFINCALL <> 1 
          AND (LENSTRCALL - LENFINCALL) >= 1
     `
     `
     Data Summary         Win     Place      Show
     Mutuel Totals   65495.10  65074.90  67581.60
     Bet            -87342.00 -87342.00 -87342.00
     Gain           -21846.90 -22267.10 -19760.40
     `
     Wins                5748     11497     17248
     Plays              43671     43671     43671
     PCT                .1316     .2633     .3950
     `
     ROI               0.7499    0.7451    0.7738
     Avg Mut            11.39      5.66      3.92
     `
     `
     By: SQL-F13 (UPR)
     `
     >=Min      < Max      Gain       Bet       Roi   Wins  Plays     Pct   Impact
   -999.00       0.00      0.00      0.00    0.0000      0      0   .0000   0.0000
      0.00       5.00      0.00      0.00    0.0000      0      0   .0000   0.0000
      5.00      10.00      0.00      0.00    0.0000      0      0   .0000   0.0000
     10.00      15.00      0.00      0.00    0.0000      0      0   .0000   0.0000
     15.00      20.00      0.00      0.00    0.0000      0      0   .0000   0.0000
     20.00      25.00      0.00      0.00    0.0000      0      0   .0000   0.0000
     25.00      30.00      0.00      0.00    0.0000      0      0   .0000   0.0000
     30.00      35.00      0.00      0.00    0.0000      0      0   .0000   0.0000
     35.00      40.00    -56.00     56.00    0.0000      0     28   .0000   0.0000
     40.00      45.00   -368.00    586.00    0.3720      5    293   .0171   0.1297
     45.00      50.00   -612.40   1624.00    0.6229     25    812   .0308   0.2339
     50.00      55.00  -2170.80   2982.00    0.2720     32   1491   .0215   0.1631
     55.00      60.00  -2793.40   5846.00    0.5222    115   2923   .0393   0.2989
     60.00      65.00  -3171.80  10826.00    0.7070    361   5413   .0667   0.5067
     65.00      70.00  -3128.10  16054.00    0.8052    748   8027   .0932   0.7080
     70.00      75.00  -4616.00  18688.00    0.7530   1137   9344   .1217   0.9245
     75.00      80.00  -2897.10  15542.00    0.8136   1335   7771   .1718   1.3052
     80.00      85.00  -1438.50   9800.00    0.8532   1142   4900   .2331   1.7707
     85.00      90.00   -455.90   4248.00    0.8927    647   2124   .3046   2.3143
     90.00  999999.00   -138.90   1090.00    0.8726    201    545   .3688   2.8021

Granted, this is just one of the possible cases (not 1st at either the stretch or finish call) for horses that gained at least 1L in the stretch -

But...

Results returned by this query suggest that stretch gainers don't appear to be world beaters roi-wise... and certainly aren't the world beaters that a lot of players (and authors) might tell you they are.

Flat win bet roi for this sample is lower than what you'd get with just an ALL Button Data WIndow query. And flat win bet roi for horses rated high using my own UPR is lower than what you'd get with just an ALL Button Data Window query broken out by UPR numeric value.

Looks oddly similar to results from my initial R&D back in the day.

-jp

.

CalvinJ
8/19/2009
10:09:16 PM

Yes, this doesn't look very promising...thanks again for the search. Very helpful.

calvinj...

dvlander
8/20/2009
12:06:43 PM

Jeff, extremely exciting possibilities as you describe it. However, your forward thinking always prompts more questions.

Given these are SQL statements, would "OR" logic be accomodated or is that still a pipe dream?

At some point, would the SQL statement capabilities be able to be added to UDM construction?

Great work and even if the answer to my questions is a resounding "NO", I'm still thrilled by the faster data window and by the power that SQL adds.

Dale

jeff
8/20/2009
1:32:53 PM

Dale,

Let's take your questions one at a time...

you asked:
Q. Given these are SQL statements, would "OR" logic be accomodated or is that still a pipe dream?

my reply:
A. Yes. OR logic is absolutely supported. Grin

As a matter of fact OR logic is what sent me down this path.

My apologies beforehand if I use programmer's geek-talk. I'll try and translate to plain English as much as possible.

I started work on a proof of concept project where OR Conditionals were made part of JCapper under the current program architecture. At first I embedded an OR Conditional in its own XML node in the FactorXML. And then added code blocks in the Data Window to parse OR Conditionals XML nodes one factor at a time and enforce them as factor constraints, again, one factor at a time. I did this for each supported factor - about 60 factors in all. Next I did the same thing in the Profile Marker. And followed it up by creating an OR button in the UDM Wizard that launches a new OR Conditionals Module that I created. The new module empowers users to create OR Conditionals FactorXML as a non programmer. About 3 weeks worth of work in all.

It's slow when it runs. But it works.

But I didn't want slow.

I started looking at why it was slow and ways I could make it faster. The prospects for making it truly fast didn't look good. Parsing XML is no different than parsing strings. Parsing strings is one of the slowest operations you can tell a computer to do. Compound repeating a slow operation hundreds of times for the user who has hundreds of UDMs multiplied by a hundred or more supported factors and I had to ask myself if this was what I really wanted to publish. Yes it worked. But it had no chance of ever being fast. And it would only get slower going forward as more and more factors are added to the program. So in the end, even though I have OR Conditionals working in a prototype version right now at my end I decided not to publish.

A couple of weeks ago I had to go to Yavapai for meetings with track management and horsemen in AZ. On the drive over, it's over 400 miles one way, a light bulb went off in my head. What about using SQL expressions in the Data Window much the same way I used them in the Name Selection Tool? We had some really good meetings at Yavapai and I think positive change has a chance of happening in AZ a lot sooner than most people think. That Saturday I hit a $60.00 horse in the last race at Arlington which paid for my trip and then some. But on the drive back all I could think about was how best to implement SQL expressions in the Data Window.

Once I got back I immediately went to work making it happen. I've been on it nearly 24-7 for almost 2 weeks and can't seem to tear myself away from it. I've made tremendous progress and envision a phase one release date (Data Window Only) in early September.

you asked:
At some point, would the SQL statement capabilities be able to be added to UDM construction?

my reply:
Absolutely! I'm going to do that in phase two - which will happen as soon after phase one as humanly possible. I'm extremely exited about what this means for JCapper users going forward. Grin

you asked:
Great work and even if the answer to my questions is a resounding "NO", I'm still thrilled by the faster data window and by the power that SQL adds.

my reply:
Thanks! But it looks right now like the answer to all of your questions Dale is "yes" - just need enough time to get it done.

-jp

.

~Edited by: jeff on: 8/20/2009 at: 1:32:53 PM~

dvlander
8/20/2009
1:52:49 PM

Awesome Jeff, this is really good stuff. Shoot me down from this high if you need to but I'm guessing that you would also be able to place parameters on a factor based on the contents of another field?

A simple example for a UDM might be something like Post Odds >= (Field Size * .8).

Dale

jeff
8/20/2009
2:59:45 PM

Yes. That absolutely works.

As would SQL Expressions like the following, where factors F10 and F16 are two factors that you really like:


     query start:         8/20/2009 11:54:46 AM
     query end:           8/20/2009 11:54:51 AM
     elapsed time:        5 seconds
     `
     Data Window Settings:
     999 Divisor  Odds Cap: None
     `
     SQL: SELECT * FROM STARTERHISTORY 
          WHERE ODDS >= (FIELDSIZE * .5) 
          AND (VALF10 + VALF16) / 2 >= 70 
          AND RANKF10 = 1 
          AND RANKF16 = 1
     `
     `
     Data Summary         Win     Place      Show
     Mutuel Totals   11123.00   9911.80   9358.10
     Bet            -10560.00 -10560.00 -10560.00
     Gain              563.00   -648.20  -1201.90
     `
     Wins                 767      1568      2290
     Plays               5280      5280      5280
     PCT                .1453     .2970     .4337
     `
     ROI               1.0533    0.9386    0.8862
     Avg Mut            14.50      6.32      4.09

I'm really exited about the possibilities here.

-jp

.

Charlie James
8/20/2009
4:33:21 PM

Can't code my way out of a brown paper bag Jeff. Kind of scary but I mostly understand what you're talking about.

Question: Any chance this new stuff works with sire track surface distance combinations?

jeff
8/20/2009
4:56:44 PM

Chuck,

Without giving your work away by posting specific combos Grin

...

Yes. You'll absolutely be able to flag sires to track code, surface, and distance using the new program architecture. Trainers too.

-jp

.

kingfish
8/20/2009
7:33:28 PM

Since you will be able to speed up the data window query. Would it be possible to speed up the process of building databases? Interested in building multiple databases. But it takes so long to build the database then test your upr and userfactors on your database. Also is this sql expression the same as building mini userfactors?

kingfish

Hdcper
8/20/2009
8:03:56 PM

Hey Jeff,

This sounds like great stuff!!! Just wondering anyway I can beta test this new data window?

Also with regard to SQL Expressions is there a good tutorial or reference guide to learn effectively how to write the Expressions. It might be advantagous if you are aware of one, that you post it so users can do some learning before introducing the upgrade.

In the last example you displayed, could you add fieldsize >=7 and change the last statement to read: or RankF16 = 1 (if so would this mean either one of these ranks could be 1, but the other one could be 1 or greater?).

If so, could you run that and see how the output looks.

Anyway nice idea there Jeff,

Bill

jeff
8/20/2009
8:16:31 PM

Herman,

In answer to your question... Yes. I most certainly do envision phase two of this project speeding up database builds - and speeding them up significantly.

Part of the reason DB Builds currently take so long is that a LOT of extra data fields get written to a full playlist file. But I don't want to remove them because I know doing that it will cause problems for those who have taken the time to write their own import routines to get playlist file data into 3rd party apps like Excel, Access, File Maker, Fox Pro, and even SAS. If I remove experimental fields from the playlist file data structure then a fair percentage of JCapper users will need to modify apps THEY have written. Something along the lines of "If it aint broke don't fix it!"

Another reason that DB Builds take so long is that almost all of the playlist file content is text... or in programming geek-talk "string" data type. Computer handling of strings is about the slowest thing you can make a computer do.

When phase two of this project is done I plan to have a couple of new user options on the System Defintions Screen. One of those options will be a checkbox labeled something like "Build Database to StarterHistory Table Only." When that box is checked a DB Build is going to bypass writing to a playlist file and data is going to go straight to the StarterHistory Table... a much faster process.

The other option that I envision is a checkbox labeled "Build Database for UserFactors and UPR Only." Think about this one for a second. Let's say for example you already have an existing built database for a calendar quarter or a full calendar year folder. You go into UPR Tools and create a new GroupName that you want to test out as UserFactor1. When you do a rebuild from scratch on an existing folder NONE of the factor ranks, values, or gaps are going to change for something like 98 percent of the recorded data points. The ONLY data points that CAN change belong to UserFactors 1 through 5 and UPR. So WHY rebuild the entire thing from scratch?

Why not create a routine that:

1. Simply reads needed data from the existing StarterHistory Table instead of recalculating everything from scratch?

2. Takes those existing factor values and makes Userfactor and UPR calculations.

3. Writes the new UserFactor and UPR values back to the StarterHistory Table.

Suddenly a DB Build on a large folder under those conditions takes - I'm guessing here - What? Maybe 15-20 minutes to run from from start to finish?!!!

Wait, it gets better.

Suppose the calculation itself for your new UserFactor1 is something fairly straightforward. Depending on the calculation, in a lot of cases it's going to be possible ahead of time to write a SQL Expression that makes use of mathematical calculations in a way that will enable you to get a reasonable idea of what to expect out of your new UserFactor1.

Example: Your UserFactor1 is a simple average of CPace and CompoundLate. You create the following SQL Expression:

SELECT * FROM StarterHistory WHERE (VALCPACE + VALCOMPOUNDLATE) / 2 >= 80

Then you execute the SQL Expression. A few seconds later, your query results show you what to expect when UserFactor1 is greater than or equal to 80! From there you can embed factor weights into the SQL Expression and re-execute it. Then adjust the weights a little with each new query until you LIKE what you see.

LOTS and lots of new possibilities. But this post is getting long so I'll stop here.

-jp

.

jeff
8/20/2009
10:57:01 PM

posted by Hdcper:
Hey Jeff,

This sounds like great stuff!!! Just wondering anyway I can beta test this new data window?

my reply:
Sure why not? Not a problem letting you beta test the new Data Window. What I have ready as of right now isn't a beta yet... more of an alpha. It takes programming intervention on my part to make it run. Probably should have a beta version ready by Mon or Tues of next week and would be more than happy to let a few of you download it, run it, and kick the tires.

posted by Hdcper:
Also with regard to SQL Expressions is there a good tutorial or reference guide to learn effectively how to write the Expressions. It might be advantagous if you are aware of one, that you post it so users can do some learning before introducing the upgrade.

my reply:
Here's one - Introduction to SQL:
http://www.w3schools.com/SQl/sql_intro.asp

Of course I'll also need to write up a table schema so that you guys without Access can see the field names to use in your sql expressions.

posted by Hdcper:
In the last example you displayed, could you add fieldsize >=7 and change the last statement to read: or RankF16 = 1 (if so would this mean either one of these ranks could be 1, but the other one could be 1 or greater?).

If so, could you run that and see how the output looks.

Anyway nice idea there Jeff,

Bill

my reply:
Thanks Bill! Here's what it looks like - and yes, it means rank for F10 = 1 OR rank for F16 = 1 whereas in the original sql expression ranks for both had to be = 1:


     query start:         8/20/2009 6:44:26 PM
     query end:           8/20/2009 6:44:45 PM
     elapsed time:        19 seconds
     `
     Data Window Settings:
     999 Divisor  Odds Cap: None
     `
     SQL: SELECT * FROM STARTERHISTORY 
          WHERE ODDS >= (FIELDSIZE * .5) 
          AND FIELDSIZE >= 7 
          AND (VALF10 + VALF16) / 2 >= 70 
          AND (RANKF10 = 1 OR RANKF16 = 1)
     `
     `
     Data Summary         Win     Place      Show
     Mutuel Totals   18904.40  17585.20  17281.30
     Bet            -19498.00 -19498.00 -19498.00
     Gain             -593.60  -1912.80  -2216.70
     `
     Wins                1191      2501      3839
     Plays               9749      9749      9749
     PCT                .1222     .2565     .3938
     `
     ROI               0.9696    0.9019    0.8863
     Avg Mut            15.87      7.03      4.50

which can be proven by adding up the starters from:



     query start:         8/20/2009 6:47:46 PM
     query end:           8/20/2009 6:47:50 PM
     elapsed time:        4 seconds
     `
     Data Window Settings:
     999 Divisor  Odds Cap: None
     `
     SQL: SELECT * FROM STARTERHISTORY 
          WHERE ODDS >= (FIELDSIZE * .5) 
          AND FIELDSIZE >= 7 
          AND (VALF10 + VALF16) / 2 >= 70 
          AND RANKF10 = 1 
          AND RANKF16 = 1
     `
     `
     Data Summary         Win     Place      Show
     Mutuel Totals    8759.20   7745.90   7259.80
     Bet             -8002.00  -8002.00  -8002.00
     Gain              757.20   -256.10   -742.20
     `
     Wins                 551      1101      1623
     Plays               4001      4001      4001
     PCT                .1377     .2752     .4056
     `
     ROI               1.0946    0.9680    0.9072
     Avg Mut            15.90      7.04      4.47

and:


     query start:         8/20/2009 6:48:51 PM
     query end:           8/20/2009 6:48:54 PM
     elapsed time:        3 seconds
     `
     Data Window Settings:
     999 Divisor  Odds Cap: None

     SQL: SELECT * FROM STARTERHISTORY 
          WHERE ODDS >= (FIELDSIZE * .5) 
          AND FIELDSIZE >= 7 
          AND (VALF10 + VALF16) / 2 >= 70 
          AND RANKF10 = 1 
          AND RANKF16 > 1
     `
     `
     Data Summary         Win     Place      Show
     Mutuel Totals    4841.90   4400.40   4410.50
     Bet             -5356.00  -5356.00  -5356.00
     Gain             -514.10   -955.60   -945.50
     `
     Wins                 310       633       983
     Plays               2678      2678      2678
     PCT                .1158     .2364     .3671
     `
     ROI               0.9040    0.8216    0.8235
     Avg Mut            15.62      6.95      4.49

and:


     query start:         8/20/2009 6:49:42 PM
     query end:           8/20/2009 6:49:45 PM
     elapsed time:        3 seconds
     `
     Data Window Settings:
     999 Divisor  Odds Cap: None
     `
     SQL: SELECT * FROM STARTERHISTORY 
          WHERE ODDS >= (FIELDSIZE * .5) 
          AND FIELDSIZE >= 7 
          AND (VALF10 + VALF16) / 2 >= 70 
          AND RANKF10 > 1 
          AND RANKF16 = 1
     `
     `
     Data Summary         Win     Place      Show
     Mutuel Totals    5303.30   5438.90   5611.00
     Bet             -6140.00  -6140.00  -6140.00
     Gain             -836.70   -701.10   -529.00
     `
     Wins                 330       767      1233
     Plays               3070      3070      3070
     PCT                .1075     .2498     .4016
     `
     ROI               0.8637    0.8858    0.9138
     Avg Mut            16.07      7.09      4.55

Adding up the starters as a way of ensuring that the sql expression does in fact return an honest or conditional for F10, F16:

9749 = 4001 + 2678 + 3070

-jp

.

~Edited by: jeff on: 8/20/2009 at: 10:57:01 PM~

Charli125
8/21/2009
2:43:45 PM

I'm thrilled that we're going to be able to use SQL in the data window. That's the one language I know how to use, and I am actually pretty proficient at it. I'm also thrilled about the speed thing.

Thanks for all the updates Jeff.

JCapper Message Board

JCapper 101 -- How to create a UDM

JCapper 101
-- How to create a UDM