Batting Stats 1: The Easy Stuff

The goal here is to create a table (or view) for all offensive stats for each player-year-stint-split.  Let’s go back and parse that briefly:  Stats are collected for each player who accumulates them.  Each player gets his own row.  For each year that a player accumulates stats, a new row of data is created for that player.  For each team that a player plays in a given year (stint), a new row of data is created for that player.  Stats are accumulated and placed into three splits for each player-year-stint: Overall, vs. Left, and vs. Right.

I will most likely create views for each split, but for the moment I am going to focus only on Overall (split_id=1).

As this view is where I have gotten tripped up in the past, I’m going to take it slowly here, checking against game data periodically.  So, easy bit first: Let’s get the counting stats and the more traditional rate stats out of the way:

#Calculated batting stats for OOTP
    CREATE OR REPLACE VIEW CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove this
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , b.ab+b.bb+b.sh+b.sf+b.hp as PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
        FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year

We’ll pick a random player from in-game and make sure we line up.  Let’s pick player #14 from 2015 and compare to OOTP.  Not expecting any problems on the easy stuff.  Still, just to be sure.

Output from calcbatting
Frank Garcia

So far so good.  Next post will finish up with the standard stats and some of the easier advanced stats.

Run Environment 4: Run Values A Different Way

My attempt in the last post resulted in a view of linear weights that, while it seemed to work, was very unwieldy.  It was so because for each of the linear weights, I had to spell out fully the calculations for runPlus, runMinus, and wobaScale.

I’ve been aware of this code from Colin Wyers for a while now, but wanted to try my hand at translating Tango’s views into MySQL on my own.  I’ve done that.  Now, I’m going to try to pick apart how Wyers uses variables to simplify the process and configure this back to my OOTP tables.  Probably best to have that link open in another tab while I walk through this.

I’ll be breaking this across several posts.

We’re going to ignore the first statement that creates a primary position (PrimPos) table.  As mentioned before, this table would serve to eliminate stats created by non-pitchers pitching.  My feeling is that they are part of the run environment and should be included and, besides, this is a rarer event in OOTP than it is in real life.

The second thing to note is that Wyers is creating a table rather than a view.  This is actually a good idea.  Views save disk space but have slower performance than tables.  That’s because views are essentially stored queries that are run every time they are accessed.  Tables are created once and saved as a data object in their own right.  Since space is not a concern for me and all of the advanced stats will require complex queries, I think that opting for performance here is the right thing to do.

Next, I am going to leave in place my view vLeagueRunsPerOut – the foundational view/table.  They’re too similar for me to make any changes.  Besides, I like that I thought to total up league Plate Appearances in mine.  It comes in handy later.

Minor changes to the RunValues table give me this:

DROP TABLE IF EXISTS tblRunValues;
CREATE TABLE IF NOT EXISTS tblRunValues
AS SELECT year
, league_id
, RperOut 
, @rb := RperOut+0.14 AS runBB 
, @rb+0.025 AS runHB 
, @rs := @rb+0.155 AS run1B 
, @rd := @rs+0.3 AS run2B 
, @rd+0.27 AS run3B 
, 1.4 AS runHR 
, 0.2 AS runSB 
, 2*RperOut+0.075 AS runCS 
FROM vLeagueRunsPerOut;

 

Two things are happening here that I didn’t think we could do.  First, declaring variables with just the ‘@’ and the ‘:=’ operator.  Murach, whose book I’ve been using to learn MySQL, doesn’t mention this as an option.  He does usually point out shortcuts that some devs take before advising against using them yourself.  This squares with the other web searches I’ve done regarding variables in MySQL.  But, it seems to work, so…

Second, I could have sworn that I had read that you can’t rely on how MySQL will evaluate variables when used like this.  That is, there’s not a guarantee that it will evaluate ‘@rb’ first and use that value to evaluate ‘@rs’.  It could decide to do ‘@rs’ first and force ‘@rb’ to null.  That didn’t happen here, and I am a little confused. (NOTE: I emailed Wyers about this and will update if he gets back to me.)

Bottom line, though, is that this worked.  Comparing the results from my original view without variables to Wyers’ table with them shows identical results if we disregard all of the trailing 0’s:

Oirignal View created without variables
Results from modified RunValues table with variables

OK then.  On to the next one.

Run Environment 3: The Second Run Values View

Here’s what we’re doing for the next view.  We’re taking all of the run expectancy values from the previous view, making 3 more calculations that I will explain in a moment, and creating linear weights for each non-out batting event that we’ll use to derive wOBA.

runMinus establishes a run value for outs and other events that are not part of OBP.  It’s AB – H + SF, and is equivalent to a batting out.

runPlus gives us the average run value for all non-out batting events: Hits, Hit-by-Pitches, and Walks.

wOBA will give us the Weighted On Base Average for the league

wobaScale will give us the factor to apply to all of the other linear weights and for each league year in order to scale players’ wOBAs to their matching, unweighted On Base Percentage.

Two notes before the code.  First, we’re using stats from the players_career_batting_stats table.  I wrote a join to the league_history_stats table when I was thinking of doing it another way.  I may go back and switch it out to the other table as it will save us doing the sums.

Second, there’s got to be a more elegant way to do this.  I think there is, and it’s to do with defining runPlus, runMinus, and wobaScale as variables so that I don’t have to elaborate the code each time they are mentioned.  I know that I can’t use column aliases within the select statement.  So my options are leaving it as is (I am getting results that look reasonable), try my luck with variables, or create another table or view to store these values and call them from this view.

I will come back to this to try to clean it up.

 

CREATE OR REPLACE VIEW vRunValues2 AS
SELECT 
r.year
, r.league_id
, r.RperOut 
, r.runBB 
, r.runHB 
, r.run1B 
, r.run2B 
, r.run3B 
, r.runHR 
, r.runSB 
, r.runCS 
, Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B * (b.H -b.d - b.t -b.hr)
    + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB * b.SB - r.runCS *b.CS)
    / Sum(b.ab -b.h + b.SF) AS runMinus 
, Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP +r.run1B * (b.H -b.d -b.t -b.hr)
    + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr + r.runSB * b.SB - r.runCS * b.CS)
    / Sum(b.BB - b.IBB + b.HP +b.H) AS runPlus 

, #1/([runPlus]+[runMinus]) AS wOBAscale 
1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP +r.run1B * (b.H -b.d -b.t -b.hr)
    + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr + r.runSB * b.SB - r.runCS * b.CS))
    / (Sum(b.BB - b.IBB + b.HP +b.H) + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP
    + r.run1B * (b.H -b.d - b.t -b.hr) + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR
    + r.runSB * b.SB - r.runCS *b.CS) / Sum(b.ab -b.h + b.SF))) AS wOBAscale

, #([runBB]+[runMinus])*[wOBAscale] AS wobaBB 
(r.runBB + (Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B
    * (b.H -b.d - b.t -b.hr) + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB
    * b.SB - r.runCS *b.CS) / Sum(b.ab -b.h + b.SF))) 
    * (1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP +r.run1B
    * (b.H -b.d -b.t -b.hr) + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr
    + r.runSB * b.SB - r.runCS * b.CS)) / (Sum(b.BB - b.IBB + b.HP +b.H)
    + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B 
    * (b.H -b.d - b.t -b.hr) + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR 
    + r.runSB * b.SB - r.runCS *b.CS) / Sum(b.ab -b.h + b.SF)))) AS wobaBB

, #([runHB]+[runMinus])*[wOBAscale] AS wobaHB 
(r.runHB +(Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B 
   * (b.H -b.d - b.t -b.hr) + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR 
   + r.runSB * b.SB - r.runCS *b.CS) / Sum(b.ab -b.h + b.SF))) 
   * (1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP +r.run1B * (b.H -b.d -b.t -b.hr) 
   + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr + r.runSB * b.SB - r.runCS * b.CS)) 
   / (Sum(b.BB - b.IBB + b.HP +b.H) + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP 
   + r.run1B * (b.H -b.d - b.t -b.hr) + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR 
   + r.runSB * b.SB - r.runCS *b.CS) / Sum(b.ab -b.h + b.SF)))) AS wobaHB

, #([run1B]+[runMinus])*[wOBAscale] AS woba1B 
(r.run1b + (Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B * (b.H -b.d - b.t -b.hr) 
   + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB * b.SB - r.runCS *b.CS) 
   / Sum(b.ab -b.h + b.SF))) * (1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP 
   +r.run1B * (b.H -b.d -b.t -b.hr) + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr 
   + r.runSB * b.SB - r.runCS * b.CS)) / (Sum(b.BB - b.IBB + b.HP +b.H) 
   + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B * (b.H -b.d - b.t -b.hr) 
   + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB * b.SB - r.runCS *b.CS) 
   / Sum(b.ab -b.h + b.SF)))) AS woba1B

, #([run2B]+[runMinus])*[wOBAscale] AS woba2B 
(r.run2b + (Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B * (b.H -b.d - b.t -b.hr) 
   + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB * b.SB - r.runCS *b.CS) 
   / Sum(b.ab -b.h + b.SF))) * (1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP 
   +r.run1B * (b.H -b.d -b.t -b.hr) + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr 
   + r.runSB * b.SB - r.runCS * b.CS)) / (Sum(b.BB - b.IBB + b.HP +b.H) 
   + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP 
   + r.run1B * (b.H -b.d - b.t -b.hr) + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR 
   + r.runSB * b.SB - r.runCS *b.CS) / Sum(b.ab -b.h + b.SF)))) AS woba2B

, #([run3B]+[runMinus])*[wOBAscale] AS woba3B 
(r.run3B + (Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B * (b.H -b.d - b.t -b.hr) 
   + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB * b.SB - r.runCS *b.CS) 
   / Sum(b.ab -b.h + b.SF))) * (1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP 
   +r.run1B * (b.H -b.d -b.t -b.hr) + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr 
   + r.runSB * b.SB - r.runCS * b.CS)) / (Sum(b.BB - b.IBB + b.HP +b.H) 
   + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B * (b.H -b.d - b.t -b.hr) 
   + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB * b.SB - r.runCS *b.CS) 
   / Sum(b.ab -b.h + b.SF)))) AS woba3B

, #([runHR]+[runMinus])*[wOBAscale] AS wobaHR 
(r.runHR + (Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B * (b.H -b.d - b.t -b.hr) 
   + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB * b.SB - r.runCS *b.CS) 
   / Sum(b.ab -b.h + b.SF))) * (1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP 
   +r.run1B * (b.H -b.d -b.t -b.hr) + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr 
   + r.runSB * b.SB - r.runCS * b.CS)) / (Sum(b.BB - b.IBB + b.HP +b.H) 
   + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP + r.run1B * (b.H -b.d - b.t -b.hr) 
   + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR + r.runSB * b.SB - r.runCS *b.CS) 
   / Sum(b.ab -b.h + b.SF)))) AS wobaHR

, #[runSB]*[wOBAscale] AS wobaSB 
r.runSB * (1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP +r.run1B * (b.H -b.d -b.t -b.hr) 
   + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr + r.runSB * b.SB - r.runCS * b.CS)) 
   / (Sum(b.BB - b.IBB + b.HP +b.H) + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP 
   + r.run1B * (b.H -b.d - b.t -b.hr) + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR 
   + r.runSB * b.SB - r.runCS *b.CS) / Sum(b.ab -b.h + b.SF)))) AS wobaSB

, #[runCS]*[wOBAscale] AS wobaCS
r.runCS * (1/ ((Sum(r.runBB * (b.BB - b.ibb) + r.runHB * b.HP +r.run1B * (b.H -b.d -b.t -b.hr) 
   + r.run2B *b.d + r.run3B * b.t + 1.4 *b.hr + r.runSB * b.SB - r.runCS * b.CS)) 
   / (Sum(b.BB - b.IBB + b.HP +b.H) + Sum(r.runBB* (b.bb - b.ibb) + r.runHB * b.HP 
   + r.run1B * (b.H -b.d - b.t -b.hr) + r.run2B * b.d + r.run3B * b.t + 1.4*b.HR 
   + r.runSB * b.SB - r.runCS *b.CS) / Sum(b.ab -b.h + b.SF)))) AS wobaCS

FROM 
vRunValues r
INNER JOIN players_career_batting_stats b ON r.year=b.year AND r.league_id=b.league_id
INNER JOIN league_history_batting_stats lhb ON r.year=lhb.year AND r.league_id=lhb.league_id

GROUP BY 
r.year
, r.league_id
, r.RperOut 
, r.runBB 
, r.runHB 
, r.run1B 
, r.run2B 
, r.run3B 
, r.runHR 
, r.runSB 
, r.runCS


ORDER BY 
r.year DESC;

Run Environment 2: Run Values Part 1

I am not a statistician by any stretch of the imagination.  Nor am I a sabermetrician.  I can barely follow along with the conversation threads with the big brains on Tango’s website.  As a result, I have to take some things on faith.  Sometimes I feel good about that, sometimes not.  The things I am taking on faith for this first Run Values view are giving me heartburn.  Here’s what’s happening:

In the previous view, we established a run environment for each league year.  That is, we determined how many runs were scored for every plate appearance and for every out.  In this next view, we are adding a run value to the runs per out for every non-out batting outcome.  For example, let’s say that in 2009 the RperOut for my league was .172.  That’s basically saying that, no matter the outcome, stepping up to the plate in 2009 was worth at least .172 runs on average.  That RperOut establishes a baseline to which successful outcomes will add value.  How much value?  Here’s where the heartburn comes in.

The value added is a constant for each event.  Across all years and leagues.  A walk is worth 0.14 runs more than an out in every year, league, planet, park, etc.  It’s difficult to accept for real world historical MLB and it’s even harder to accept in OOTP where the baseball environment can be much different.  I have been looking, and will continue to look, for an explanation of how these constants were derived.  Haven’t found one yet and I’m anxious to move forward, so I will accept them for now.  However, when I start getting wacky results for my OOTP stats, this is the first place I am going to look.

It’s to do with Run Expectancy, and this Tango post and this article talk about it, but I was having some trouble parsing it, so I am leaving it for now, but with some misgivings.  So to recap, this view we are adding an expected run value to the runs per out for every non-out batting outcome.  Here’s that view:

CREATE OR REPLACE VIEW vRunValues AS
SELECT l.year
, l.league_id
, l.rperout
, l.rperout+0.14 AS runBB
, l.rperout+0.14+0.025 AS runHB
, l.rperout+0.14+0.155 AS run1b
, l.rperout+0.14+0.155+0.3 AS run2b
, l.rperout+0.14+0.155+0.3+0.27 AS run3b
, 1.4 AS runHR
, 0.2 AS runSB
, (2*l.RperOut)+0.075 AS runCS
FROM vLeagueRunsPerOut AS l;

Run Environment 1: League Stats

Here’s where things start to get fun. I’m going to be setting up a number of views that will allow me to calculate some of the more advanced statistics. The ones that I am most interested in are Weighted On-Base Average (wOBA), Weighted Runs Created (wRC), Weighted Runs Above Average (wRAA), and Weighted Runs Created+ (wRC+). If this blog ever becomes a book or major motion picture, I will come back and fill in some detail about what these stats are and why they are meaningful. Until then, I will leave the links to do the explaining.

I am basing all of this Run Environment work on a couple of posts and formulas posted by Tom Tango on his blog.

wOBA is the gatekeeper to all of these other stats, and it can’t be derived from a player’s individual stats alone.  It requires some baseline statistics about the run environment in which a batter plays.  The first step is to determine, for each league year, the number of Runs, Outs, Plate Appearances, Runs per Out, and Runs per PA.  I don’t have a good way, yet, to separate out sub_leagues, and Tango doesn’t seem to care at this point, so I leave it alone.

The other thing that Tango does and that I have decided to omit is limiting the summed pitching stats to those accumulated by players whose primary position is pitcher.  My experience is that non-pitchers pitch so rarely in OOTP that it is not worth creating a new view to determine primary position.

I create a view, as such:

CREATE OR REPLACE VIEW vLeagueRunsPerOut AS
SELECT p.year
, p.league_id
, SUM(p.r)/sum(p.outs) AS "RperOut"
, sum(p.r) AS "totR"
, sum(p.outs) AS "totOuts"
, sum(p.outs)+sum(p.ha)+sum(p.bb)+ sum(p.iw)+ sum(p.sh)
   + sum(p.sf) AS "totPA"
, round(sum(p.r)/(sum(p.outs)+sum(p.ha)+sum(p.bb)+ sum(p.iw)+ sum(p.sh)
   + sum(p.sf)),8) AS "RperPA"
FROM players_career_pitching_stats AS p
GROUP BY p.year, p.league_id;