Pitching Stats 7: FIP-

FIP- does for FIP just what ERA- does for ERA: it scales it to 100 and accounts for park factors and league run environment.  I am still searching for a definitive formula- but I know that to begin with, I will need to calculate the league FIP for comparison purposes.  I’ll go back and adjust the sub-league-pitching-stats table to include it.

In fact, I am going one step further, and generating a FIP for each sub-league.  I am guessing that this will cause me to deviate a bit from the game’s generated scores, but in this case, I think this will lead me to more accurate results.

The revised sub-league-pitching-stats table now looks like this:

DROP TABLE IF EXISTS sub_league_history_pitching;
CREATE TABLE IF NOT EXISTS sub_league_history_pitching AS

SELECT
       year
     , league_id
     , sub_league_id
     , round((totER/totIP)*9,2) AS slgERA 
     , round((adjHRA + adjBB + adjHP - adjK)/totIP+FIPConstant,2) AS slgFIP
     #FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant
FROM  (        
     SELECT p.year
          , p.league_id
          , t.sub_league_id
          , ((sum(ip)*3)+sum(ipf))/3 AS totIP
          , sum(er) AS totER
          , 13*sum(hra) AS adjHRA
          , 3*sum(bb) AS adjBB
          , 3*sum(hp) AS adjHP
          , 2*sum(k) AS adjK
          , f.FIPConstant
     FROM CalcPitching AS p INNER JOIN team_relations AS t ON p.team_id=t.team_id
          INNER JOIN FIPConstant AS f ON p.year=f.year AND p.league_id=f.league_id
     GROUP BY year, league_id, sub_league_id
      ) AS x ;

The calculation for FIP- is exactly the same as ERA-:

FIP Minus = 100*((FIP + (FIP – FIP*(PF/100)) )/ AL or NL FIP)

We’ve already got all of the data points we need, so let’s plug it in and see what happens.

Pretty good. 25 of 30 within 5 points.  Two that were ridiculously off and 3 that are meh.  I can rely on this stat to be game equivalent 85% of the time; in the right ballpark 93% of the time; so ridiculously off that I will be able to spot it immediately 6% of the time.  I wouldn’t want my real life money riding on this, maybe, but it’s good enough for video games.

The script for CalcPitching table is now:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , @FIP := round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
    , round(100*(slg.slgERA/@ERA)*p.avg,0) AS ERAplus
    , round(100*((@FIP + (@FIP - @FIP*(p.avg)))/slg.slgFIP),0) AS FIPminus
    FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 6: ERA+

This one may prove to be tricky, if only because there are a couple of ways to calculate it.  Baseball-Reference says they calculate it one way, Wikipedia says that bb-ref used to calculate it that way, but then they changed.  So, we may see some variation here.  Frankly, I’m not even sure why I’d want to use this counter-intuitive + stat anyway.  However, it’s in the game, and I’d like to be able to use it as a sanity check if nothing else.

Here’s the first way I am going to try it.  Defined by Wikipedia as the way bb-ref currently does the calculation:

ERA+ = 100 * (2 - (ERA/lgERA) * 1/ParkFactor)

No additional joins are needed for this, so we can just plug it in.  Let’s do it and check our results.

Awful.  Just awful.  1/3 of the result set was hugely off, and only about half was within 5 percent.

We’ll try the original recipe for this stat and see if we get better luck.  That one is:

ERA+ = 100 * (lgERA/ERA) * ParkFactor

And the results:

Much, much better.  24 within 5 points and only 1 more than 10.  And that one is also one that was way off on the first try.  I am good to keep this version and even to use it for evaluative purposes.  I think the difference between this and the game is probably down to park factors.  Here’s the CalcPitching table to this point:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
    , round(100*(slg.slgERA/@ERA)*p.avg,0) AS ERAplus
    
    FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 5: ERA-

It’s that time, once again, to try to deal with park adjusted stats.  Again, and against counsel, I will be pulling the park factors from the teams table rather than doing the calculations myself.  I got within spitting distance of a good result set for wRC+, so I am hoping for similar with these park-adjusted pitching stats.

First up is ERA-.  ERA- takes a pitcher’s ERA and puts it in the context of his league and his home park.  This makes it possible to compare players across eras and leagues, essentially normalizing the data.  100 is league average.  Every point below 100 is 1 percent better than average.

The formula is pretty straight-forward:
ERA Minus = 100*((ERA + (ERA – ERA*(PF/100)) )/ AL or NL ERA)

A few things have to happen in order to run this calc.  First, we’ll need sub-league ERA’s.  As mentioned in the first FIP post, we sort of do but really don’t have this on the league_history_table.  Better to roll our own from players_career_pitching_stats table.  We’ll do this in the same manner that we did it for batting- joining to the team relations table to get subleague.

Here’s how:

DROP TABLE IF EXISTS sub_league_history_pitching;
CREATE TABLE IF NOT EXISTS sub_league_history_pitching AS

SELECT
       year
     , league_id
     , sub_league_id
     , round((totER/totIP)*9,2) AS slgERA 
FROM  (        
     SELECT p.year
          , p.league_id
          , t.sub_league_id
          , ((sum(ip)*3)+sum(ipf))/3 AS totIP
          , sum(er) AS totER
     FROM CalcPitching AS p INNER JOIN team_relations AS t ON p.team_id=t.team_id
     GROUP BY year, league_id, sub_league_id
      ) AS x ;

Before we move on to the park factor, we have to make sure that we can associate a player’s team with his sub-league.  As usual, I’m sure that there’s a more elegant way to go about this than where I landed.  The problem I needed to solve was that sub-leagues do not have unique identifiers; they are uniquely identified only as composites of league_id and sub_league_id.  So, it’s not enough to refer to a sub-league as sub-league-1.  There are as many sub-league-1’s as there are leagues.  To make matters more complicated, the teams table does not carry a sub-league field.  That’s why we had to refer to the team_relations table.  Unfortunately, the team_relations table is the only table that contains all three necessary data points to pin down a team/sub-league relationship.  When I tried to let the database do the thinking for me by joining to it, it wasn’t consistently choosing the correct sub-league for each team.

I decided to add sub-league as a field to the already-crowded CalcPitching table.  It worked in testing, correctly pulling the right slgERA for each league-sub_league-year.  Like I said, I bet there’s a way to do this only with joins, but I wasn’t able to figure it out.  I am going to go back to the CalcBatting table and do the same thing.  Here’s the code for the new joins:

INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id

The next thing is to return the park factor for each pitcher-stint-year.  We’ll do this by joining to the teams table, then to the parks table:

INNER JOIN teams AS t ON i.team_id=t.team_id
INNER JOIN parks AS p ON t.park_id=p.park_id

With all that done, we’ve got to go back and define ERA as a variable so that we can reference it here without elaborating it.  Then, the formula is simple.  OOTP doesn’t track this stat either, so it’s hard to say with any certainty how well this works or how badly I’m getting bad results from using hard-coded park factors.  I did a quick sniff test, looking at ranges of ERA’s in my league and sniffing the ERA- stats for each.  It looks OK, I guess?

OOTP uses ERA+ instead, which seems to be more or less the same stat scaled up from 100 rather than down.  I will tackle that one next.

Here’s the full script for CalcPitching so far:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , r.sub_league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , @ERA := round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    , round(100*((@ERA + (@ERA - @ERA*(p.avg)))/slg.slgERA),0) AS ERAminus
      
FROM players_career_pitching_stats AS i
    INNER JOIN team_relations AS r ON i.team_id=r.team_id AND i.league_id=r.league_id
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
    INNER JOIN sub_league_history_pitching AS slg ON i.year=slg.year AND i.league_id=slg.league_id AND r.sub_league_id=slg.sub_league_id
    INNER JOIN teams AS t ON i.team_id=t.team_id
    INNER JOIN parks AS p ON t.park_id=p.park_id
WHERE i.split_id=1 AND i.league_id<>0;

 

Pitching Stats 4: xFIP

xFIP is almost the same thing as FIP, just with something ‘xtra’.  The idea is that while pitchers are responsible for the 3 True Outcomes (HR, BB/HP, and K), home runs can also be subject to luck.  For example, a fence-scraper over the short right porch in Fenway might not be a home run through the marine layer at Dodger Stadium.  What does this tell us about the pitcher’s expected performance?

Well, to account for the vagaries of chance, xFIP takes all of a pitchers fly balls and multiplies them by the league average HR/FB rate.  Basically, it assumes a number of HR a pitcher would have given up based on the number of fly balls their opponents hit rather than the number of HR they actually did give up.

It feels like splitting hairs to me, but hey.  That’s baseball.  The formula for xFIP is just like FIP with that one change:
xFIP = ((13*(Fly balls * lgHR/FB%))+(3*(BB+HBP))-(2*K))/IP + constant

The constant is the same FIPConstant we calculated for FIP.  So, this one is pretty straight-forward, except that we need the HR/FB% for the league.  We’ll go back to our FIPConstant table and add it there for each league year.  Our FIPConstant table now looks like this:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
      year
    , league_id
    , hra_totals/fb_totals AS hr_fb_pct
    , @HRAdj := 13*hra_totals AS Adjusted_HR
    , @BBAdj := 3*bb_totals AS Adjusted_BB
    , @HPAdj := 3*hp_totals AS Adjusted_HP
    , @KAdj  := 2*k_totals AS Adjusted_K
    , @InnPitch := ((ip_totals*3)+ipf_totals)/3 AS InnPitch
    , @lgERA := round((er_totals/@InnPitch)*9,2) AS lgERA
    , round(@lgERA - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch),2) AS FIPConstant
FROM (
         SELECT year
                , league_id
                , sum(hra) as hra_totals
                , sum(bb) as bb_totals
                , sum(hp) as hp_totals
                , sum(k) as k_totals
                , sum(er) as er_totals
                , sum(ip) as ip_totals
                , sum(ipf) as ipf_totals
                , sum(fb) as fb_totals
          FROM players_career_pitching_stats
          GROUP BY year, league_id
      ) AS x;

I added the formula above to the CalcPitching table and we’re done.  OOTP doesn’t track xFIP (at least in v18), so there’s nothing to compare it to.  This one’s done.

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS FIP 
    , round(((13*(i.fb*f.hr_fb_pct))+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS xFIP
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1 AND i.league_id<>0;

Pitching Stats 3: FIP – The Conclusion

I redid the FIPConstant table to pull summed data from the players_career_pitching_stats table.  That table now looks like this:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
      year
    , league_id
    , hra_totals
    , bb_totals
    , hp_totals
    , k_totals
    , er_totals
    , ip_totals
    , ipf_totals
    , @HRAdj := 13*hra_totals AS Adjusted_HR
    , @BBAdj := 3*bb_totals AS Adjusted_BB
    , @HPAdj := 3*hp_totals AS Adjusted_HP
    , @KAdj  := 2*k_totals AS Adjusted_K
    , @InnPitch := ((ip_totals*3)+ipf_totals)/3 AS InnPitch
    , @lgERA := round((er_totals/@InnPitch)*9,2) AS lgERA
    , round(@lgERA - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch),2) AS FIPConstant
FROM (
         SELECT year
                , league_id
                , sum(hra) as hra_totals
                , sum(bb) as bb_totals
                , sum(hp) as hp_totals
                , sum(k) as k_totals
                , sum(er) as er_totals
                , sum(ip) as ip_totals
                , sum(ipf) as ipf_totals
          FROM players_career_pitching_stats
          GROUP BY year, league_id
      ) AS x;

And how did it work?  Better.

9 within 0.05; 26 within 0.11.  I’m still curious as to why I’m not matching up even better.  I still have a lingering suspicion that HBP is behind this, but I am going to let it lie for now unless it comes back to bite me on other calculations.

Our CalcPitching table to this point:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS fip
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1;

Pitching Stats 2: FIP – The False Start

FIP, or Fielding Independent Pitching, is based on the idea that pitchers are only in control of the “3 true outcomes” of a plate appearance: Strikeouts, Home Runs, and Free Passes (HBP and BB’s).  Everything else relies on defense which is largely beyond the pitcher’s control.  FIP is scaled, through the use of a constant, to a league’s ERA.

The formula to derive FIP is:

FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant

and the formula for the deriving the constant is similar:

FIP Constant = lgERA – (((13*lgHR)+(3*(lgBB+lgHBP))-(2*lgK))/lgIP)

We’re going to make a quick table to calculate the FIPConstant for each league year that we’ll reference when calculating FIP for each player stint.  Happily, the game gives us league ERA in the league_history_pitching_stats table, so we’ve been spared a step.  Because I am, apparently, not very good with the order of operations and parentheses, I have spent the last hour pulling my hair out trying to get a FIP Constant that looks reasonable.  In an attempt to save some of my last remaining hairs, I made a very inelegant table.  Behold my genius:

DROP TABLE IF EXISTS FIPConstant;
CREATE TABLE IF NOT EXISTS FIPConstant AS

SELECT
    lhps_id
    , year
    , league_id
    , level_id
    , hra
    , bb
    , hp
    , k
    , @HRAdj := 13*hra AS Adjusted_HR
    , @BBAdj := 3*bb AS Adjusted_BB
    , @HPAdj := 3*hp AS Adjusted_HP
    , @KAdj  := 2*k AS Adjusted_K
    , @InnPitch := ((ip*3)+ipf)/3 AS InnPitch
    , era
    , era - ((@HRAdj+@BBAdj+@HPAdj-@KAdj)/@InnPitch) AS FIPConstant
FROM league_history_pitching_stats;

On the CalcPitching table, we’re adding FIP and disregarding left/right splits for the moment.  Our table script now looks like this:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := ((3*ip)+ipf)/3 AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round(i.er/@InnPitch,2) AS ERA
    , round(((13*i.hra)+(3*(i.bb+i.hp))-(2*i.k))/@InnPitch+f.FIPConstant,2) AS fip
    
    
FROM players_career_pitching_stats AS i
    INNER JOIN FIPConstant AS f ON i.year=f.year AND i.league_id=f.league_id
WHERE i.split_id=1;

So, how did it go?  Not great.  I took a random sample from my database and compared it to the game’s generated stats.  I wanted my FIP calculations to be within .05 of the game’s.

While most were in the medium range, it seems that there’s something different in the way the game calculates FIP.  Our numbers are close enough that it can’t be a major difference.  I’m going to follow a hunch and guess that it’s Hit By Pitch.  I will remove HBP as a factor in both the FIPConstant and FIP calculations and see what that does to our results.

I got about a third of the way through the revised calcs when I noticed a problem with the FIPConstant table.  This table pulls data from the league_history_pitching_stats table.  The problem is there.  You see, as I mentioned in the table setup posts and then promptly forgot about, there are a couple of columns in the league_history tables that attempt to distinguish between subleagues but do not give any indication of which is which. (They are the team_id and game_id columns.)  What this does create two records for each league (one for each subleague) with different totals but no way to identify the subleague being referenced.  This is no good.

My new hunch is that HBP is not the issue.  The formula is probably fine, I will just have to change the FIPConstant table to sum data from players_career_pitching_stats.  I’m going to publish this post as a testament to my naiveté and get to work on the revised table.

Pitching Stats 1: The Easy Stuff

Here’s the same explanation of how the stats tables are organized as we used in the first Batting Stats post:

Stats are collected for each player who accumulates them.  Each player gets his own row.  For each year that a player accumulates stats, a new row of data is created for that player.  For each team that a player plays in a given year (stint), a new row of data is created for that player.  Stats are accumulated and placed into three splits for each player-year-stint: Overall, vs. Left, and vs. Right.

As we did for the batting stats, we’ll be creating a new table for all of the pitching stats together in one place; counting stats provided by the game and calculated stats that we’ll derive here.

We’re carrying over all of the counting stats, plus WPA and WAR.  The calculated stats we’re adding in this post fall in the category of Easy Stuff:

  • InnPitch – I set this as a variable to avoid having to elaborate every time. This is the IP integer plus the IPF (innings pitched fraction) x 0.33
    round(IP + (IPF * .33),1).
  • All of the “x9” stats: K/9, BB/9 etc.
  • WHIP
  • GB/FB – Ground Ball/Fly Ball outs
  • BABIP (see the batting post for more on this)
  • ERA

Here’s the code:

DROP TABLE IF EXISTS CalcPitching;
CREATE TABLE IF NOT EXISTS CalcPitching AS

SELECT
    i.player_id
    , i.year
    , i.stint
    , i.team_id
    , i.league_id
    , split_id
    , i.ip
    , i.ab
    , i.tb
    , i.ha
    , i.k
    , i.bf
    , i.rs
    , i.bb
    , i.r
    , i.er
    , i.gb
    , i.fb
    , i.pi
    , i.ipf
    , i.g
    , i.gs
    , i.w
    , i.l
    , i.s
    , i.sa
    , i.da
    , i.sh
    , i.sf
    , i.ta
    , i.hra
    , i.bk
    , i.ci
    , i.iw
    , i.wp
    , i.hp
    , i.gf
    , i.dp
    , i.qs
    , i.svo
    , i.bs
    , i.ra
    , i.cg
    , i.sho
    , i.sb
    , i.cs
    , i.hld
    , i.ir
    , i.irs
    , i.wpa
    , i.li
    , i.outs
    , i.war
    , @InnPitch := round(i.ip + (i.ipf*.33),1) AS InnPitch
    , round((9*i.k)/@InnPitch,1) AS 'k9'
    , round((9*i.bb)/@InnPitch,1) AS 'bb9'
    , round((9*i.hra)/@InnPitch,1) AS 'HR9'
    , round((i.bb+i.ha)/@InnPitch,2) AS WHIP
    , round(i.k/i.bb,2) AS 'K/BB'
    , i.gb/i.fb AS 'gb/fb'
    , round((i.ha-i.hra)/(i.ab-i.k-i.hra-i.sh+i.sf),3) AS BABIP
    , round((i.er/@InnPitch)*9,2) AS ERA
    
    
FROM players_career_pitching_stats AS i;

Batting Stats 10: wRC+

I will be writing this as I work through it, so this may be a little disjointed and have some false starts, but what the heck.

wRC+ is similar to wRC and wRAA in that it measures runs created by a batter in a particular league-year context.  The most significant differences are that wRC+ is a rate statistic rather than a counting stat- scaling to a 100 scale for easy interpretation.  The second biggest difference is that this stat is league and park adjusted.  This allows us to compare players across years and leagues.

The park and league adjustments present some challenges, though.  First, I was warned not to use the park factors from the park tables.  This is a huge bummer because it would have saved me a ton of difficult work.  In fact, I am going to try using the park factors from that table as a starting point, just to be sure it won’t work.  I am really not looking forward to doing those calcs myself.

The league adjustments won’t be as big of an issue. I can use the team_affiliations table to determine subleague (i.e. AL vs NL) for each player – or I can ignore it altogether.  I am thinking of disregarding subleagues because I haven’t noticed much difference between my leagues in game play.  I never use the DH, so offense isn’t skewed by that.  I will try it and see how easy it is.

The formula, per Fangraphs, is:
((wRAA/PA + League Runs Per PA) + (League Runs Per PA - (Park Factor*League Runs Per PA) / (Subleague wRC/PA)) ALL x 100 

Let’s break this down bit by bit.

  • Step 1: (Player wRAA / PA +
    • Will have to decide whether to elaborate the wRAA formula or use a variable in its definition, but very easy aside from that.  I will try it as a variable first.
  • Step 2: League Runs Per PA) +
    • Knowing that this would end up as part of the wRC+ calculation, I added it to the League Runs Per Out view I created in the Run Environment section.  And since we already referenced that view for wRAA, we can simply reference it here as RperPA
  • Step 3: (League Runs Per PA –
    • Same as Step 2
  • Step 4: (Park Factor * League Runs Per PA) / 
    • If the park factor from the parks table can be used as a reasonable substitute – and I’m really hoping it can – then this is pretty straightforward.  I would join to parks on team_id and return the park factor (avg).  If can’t be used, then I’m off down a rabbit hole to calculate those park factors.
  • Step 5: (Subleague wRC/Subleague PA)
    • Here’s a weird thing: Subleague wRC.  Just like a league wOBA is really just OBP, wouldn’t league wRC just be runs?  Let’s look at the wRC formula again as it would apply to the league:
      • League_wRC = (((League_wOBA-League_wOBA)/wOBA Scale)+(League_R/League_PA))*League_PA
        • League_wOBA – League_wOBA = 0.  And 0 divided by anything is 0.  So, we evaluate to: 0 + (League_R/League_PA))*League_PA
        • League Runs Per Plate Appearance times Plate Appearances = League Runs.
    • So that leaves us with Subleague_Runs divided by Subleague_PA – the same RperPA stat that we’ve used above, just at the subleague level.
  • Step 6: x 100
    • Self explanatory, really.

I have to do something about identifying subleagues and summing their data before I start coding this formula.  At this point, I only need runs and PA.  Since I am eager to keep moving, those are the only stats I’ll create.  I created a quick table with those summed stats, joining on the team relations table:

DROP TABLE IF EXISTS sub_league_history_batting;
CREATE TABLE IF NOT EXISTS sub_league_history_batting AS
   (SELECT b.year
      , b.league_id
      , t.sub_league_id
      , sum(b.PA) as slg_PA
      , sum(b.r) as slg_r
    FROM CalcBatting b
      INNER JOIN team_relations t ON b.team_id=t.team_id AND b.league_id=t.league_id
      INNER JOIN players p ON b.player_id=p.player_id
    WHERE p.position<>1
    GROUP BY b.year, b.league_id, t.sub_league_id
   );

OK, before the results, let’s set some initial expectations.  Fangraphs‘ rule of thumb chart is below.  It suggests, I think, that if I get within 10 points, I can trust that I’m in the right ballpark.  It won’t be super accurate for precise comparisons between players, but I can probably trust it for general analysis.

And here are the results – randomly selected player years.

Pretty ugly, actually.  19 of 30 are in the happy zone.  Of course, the happy zone is really much bigger than I would like it to be.  8 are borderline, and the remaining 3 are just awful.  Some are high; some are low.

At this point, I am not comfortable using this stat as a basis for any decision-making.  I’m also not ready to dive in and determine the park factors.  So, at least for the time being, I am going to leave this here and move on to pitching stats.

Here’s the script:

#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint #We can eventually move this down the list
    , b.split_id #We can eventually remove
    , b.team_id #We can eventually move this down the list
    , l.abbr as Lg
    , t.abbr as Team
    , b.g
    , b.ab
    , @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
    , b.r 
    , b.h
    , b.d
    , b.t
    , b.hr
    , b.rbi
    , b.sb
    , b.cs
    , b.bb
    , b.k
    , b.ibb
    , b.hp
    , b.sh
    , b.sf
    , b.gdp
    , b.ci
    , @BA := round(b.h/b.ab,3) AS ba
    , round(b.k/@PA,3) as krate
    , round((b.bb)/@PA,3) as bbrate
    , @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
    , round(100*(@OBP/r.woba),0) as OBPplus
    , @SLG := round((b.h+b.d+2*b.t+3*b.hr)/b.ab,3) as slg
    , round(@OBP+@SLG,3) as ops
    , round(@SLG-@BA,3) as iso
    , round((b.h-b.hr)/(b.ab-b.k-b.hr+b.sf),3) as babip
    , @woba := round((r.wobaBB*(b.bb-b.ibb) + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
       r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
       /(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba
    , @wRAA := round(((@woba-r.woba)/r.wOBAscale)*@PA,1) as wRAA
    , round((((@woba-r.woba)/r.wOBAscale)+(lro.totr/lro.totpa))*@PA,1) as wRC
    , ROUND((((@wRAA/@PA + lro.RperPA) + (lro.RperPA - p.avg*lro.RperPA))/(slg.slg_r/slg.slg_pa))*100,0) as 'wRC+'
    FROM 
      players_career_batting_stats b 
      INNER JOIN leagues l ON b.league_id=l.league_id 
      INNER JOIN teams t ON b.team_id=t.team_id
      INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
      INNER JOIN vLeagueRunsPerOut lro ON b.year=lro.year AND b.league_id=lro.league_id
      INNER JOIN parks p ON t.park_id=p.park_id
      INNER JOIN sub_league_history_batting slg ON t.sub_league_id=slg.sub_league_id AND b.league_id=slg.league_id
    WHERE b.ab<>0 AND b.split_id=1
    ORDER BY b.player_id, b.year