Batting Stats 11: wRC+ Revisited

The results that we got from testing the wRC+ stat weren’t great.  In testing some other stuff, I realized that we’re getting duplicate entries for batters on the CalcBatting table:

Clearly the same player, same year, stint, team, and stats.  Well, all stats except for wRC+.  The issue is not being able to return a unique subleague for each row.  To deal with this, we’re going to make a couple of adjustments:

First, we’re going to change the sub_league_history_batting table to mirror the structure of the sub_league_history_pitching table.  This doesn’t directly solve the problem, but on reflection, I didn’t like how this table was calling data from a table it was sending data to.  It seems like a circular reference to me, even if the data is static.

DROP TABLE IF EXISTS sub_league_history_batting;
CREATE TABLE IF NOT EXISTS sub_league_history_batting AS

SELECT
       year
     , league_id
     , sub_league_id
     , slg_PA
     , slg_r
     
     FROM  (        
     SELECT p.year
          , p.league_id
          , t.sub_league_id
          , sum(pa) AS slg_PA
          , sum(r) AS slg_r
     FROM players_career_batting_stats AS p INNER JOIN team_relations AS t ON p.team_id=t.team_id
         INNER JOIN players ON p.player_id=players.player_id
     WHERE p.split_id=1 AND players.position<>1
     GROUP BY year, league_id, sub_league_id
      ) AS x ;

Now we have the subleague data pulling from the game-generated players career batting stats table.  A quick check shows that subleague runs are unchanged by this, but PA counts have changed a tiny bit – less than 10 over tens of thousands.  Not sure why, but I think the raw data from players_career_batting_stats table is more accurate.

Next, we’re going to add a sub_league column to CalcBatting, but we are going to do it in a way that avoids circular references. We’re actually going to alter the players_career_batting_stats table to include a subleague.  Then, we’ll get fancy and write a trigger that adds subleague to new records while leaving old ones unchanged.  Let’s start with altering the existing records.

We’ll add the subleague column thusly:

ALTER TABLE players_career_batting_stats
ADD COLUMN sub_league_id INT AFTER league_id;

Then, populate it:

UPDATE players_career_batting_stats2 AS b
INNER JOIN team_relations AS t ON b.league_id=t.league_id AND b.team_id=t.team_id
SET b.sub_league_id=t.sub_league_id;

We’ll come back to the trigger in another post.  We include the sub_league_id column near the top of CalcBatting returning the sub_league column we just included above:

USE mystruggle;
#Calculated batting stats for OOTP
    DROP TABLE IF EXISTS CalcBatting;
    CREATE TABLE IF NOT EXISTS CalcBatting AS

    SELECT b.year
    , b.league_id
    , b.player_id
    , b.stint 
    , b.split_id 
    , b.team_id 
    , b.sub_league_id
    , b.g
    , b.ab
Etc, etc, etc

You’ll notice I did a little other cleanup as well; I removed the league_abbr and team_abbr from the table.  We don’t need these columns taking up space in the table when we can easily pull them in when we need them.

Finally, and the cause of the duplicate records problem, was adding a missing JOIN element between players_career_batting_stats and sub_league_history_batting.  I had joined on league and sub-league, but not year.  Adding year brought me back to normal.

After I made a few more adjustments that I will talk about in  a later post, I re-ran a test for wRC+, adjusting my Happy Zone down to 5 points.

  • 20 out of 30 are within 5 points of the game
  • 24 out of 30 are within 7 points of the game
  • 27 out of 30 are within 10 points of the game
  • 3 missed by more than 10, the highest being 14 points.

14 points is really a lot.  However, a 90% pass rate is really pretty good considering where I was before I straightened out the sub-league situation.

 

Leave a Reply

Your email address will not be published. Required fields are marked *