I think I found one of the culprits: Intentional Walks. Our two guys, Sink and McDonough, have IBB’s in that season well over the average. I queried the database to find their IBB’s as well as league average for those seasons:
SELECT b.player_id
, concat(p.first_name, " ", p.last_name) as player
, l.abbr
, b.year
, b.league_id as league
, b.ibb
, x.lg_avg_ibb
, b.ibb-x.lg_avg_ibb as diff
FROM (
SELECT b.year
, b.league_id
, round(avg(b.ibb),0) as lg_avg_ibb
FROM CalcBatting b
INNER JOIN players p ON b.player_id-p.player_id
WHERE b.ab>200 AND p.position<>1
GROUP BY b.year, b.league_id
) as x
INNER JOIN CalcBatting b ON x.year=b.year AND x.league_id=b.league_id
INNER JOIN players p ON b.player_id=p.player_id
INNER JOIN leagues l ON b.league_id=l.league_id
WHERE b.player_id IN (2574, 472) AND b.year=2015;
You’ll see that both were well above average:
Sink is about 1 standard deviation above average while McDonough is off the charts at almost 6.
I took a look at the 10 highest IBB variances to see if we can say with any certainty that we’re handling IBB’s incorrectly:
Looking at this, I think it’s pretty clear that if nothing else, IBB is at least contributing to the differences in wOBA by weighting it too highly.
It turns out that it’s not necessarily the weighting, but rather for formula for wOBA itself. I removed IBB from the formula and recalculated wOBA for these 10 player-seasons:
Every single one of them improved. Not yet perfect, but now that I have the outliers at least within the medium range, I feel comfortable moving on for real. So, here’s the revised script for CalcBatting with the corrected formula for wOBA:
#Calculated batting stats for OOTP
DROP TABLE IF EXISTS CalcBatting;
CREATE TABLE IF NOT EXISTS CalcBatting AS
SELECT b.year
, b.league_id
, b.player_id
, b.stint #We can eventually move this down the list
, b.split_id #We can eventually remove
, b.team_id #We can eventually move this down the list
, l.abbr as Lg
, t.abbr as Team
, b.g
, b.ab
, @PA := b.ab+b.bb+b.sh+b.sf+b.hp AS PA
, b.r
, b.h
, b.d
, b.t
, b.hr
, b.rbi
, b.sb
, b.cs
, b.bb
, b.k
, b.ibb
, b.hp
, b.sh
, b.sf
, b.gdp
, b.ci
, @BA := round(b.h/b.ab,3) AS ba
, round(b.k/@PA,3) as krate
, round((b.bb)/@PA,3) as bbrate
, @OBP := round((b.h + b.bb + b.hp)/(@PA-b.sh-b.ci),3) AS obp
, round(100*(@OBP/r.woba),0) as OBPplus
, @SLG := round((b.h+b.d+2*b.t+3*b.hr)/b.ab,3) as slg
, round(@OBP+@SLG,3) as ops
, round(@SLG-@BA,3) as iso
, round((b.h-b.hr)/(b.ab-b.k-b.hr+b.sf),3) as babip
, round((r.wobaBB*(b.bb-b.ibb) + r.wobaHB*b.hp + r.woba1B*(b.h-b.d-b.t-b.hr) +
r.woba2B*b.d + r.woba3B*b.t + r.wobaHR*b.hr)
/(b.ab+b.bb-b.ibb+b.sf+b.hp),3) as woba
/* NOT yet modified for OOTP and MySQL
, round((([woba]-r.lgwoba)/r.wobascale)*[PA],1) as wRAA
, round(((([woba]-r.lgwoba)/r.wobascale)+(l.totr/l.totpa))*[PA],0) as wRC
, ((([wRAA]/[PA] + l.RperPA) + (l.RperPA - t.bpf*l.RperPA))/(lb.wRC/lb.PA))*100 AS [wRC+]
*/
FROM
players_career_batting_stats b
INNER JOIN leagues l ON b.league_id=l.league_id
INNER JOIN teams t ON b.team_id=t.team_id
INNER JOIN tblRunValues2 r ON b.year=r.year AND b.league_id=r.league_id
WHERE b.ab<>0 AND b.split_id=1
ORDER BY b.player_id, b.year