Project Scoresheet Scorebook and Scorecard

I like to score baseball games while I’m attending them. I know there’s no reason to do this since the site I manage (http://www.baseball-reference.com) will have the full boxscores with play-by-play the next day anyway. But I find it sort of meditative.

That said, I’ve never been entirely happy with the traditional scoring methods. I’ve either found them too messy or simply too confusing. Since working at B-R, however, I have become very familiar with the Retrosheet method of recording plays, and recently decided to start recording games in the way that Project Scoresheet used to record them. If you aren’t familiar with it, Project Scoresheet was a project created in order to record every baseball game so that baseball analysts (sabermetricians) would have useful data to work with. The Project Scoresheet scoring system was developed by Craig Wright and improved on by Gary Gillette for the purpose of making the recording of games unambiguous (as possible). Some more details about the project are available at the Retrosheet website.

One problem with trying to use this system is that no one really uses it anymore. With the advent of modern technology, all games are recorded automatically, so there is no need to record them manually. Because of that, there aren’t really any sources for Project Scoresheet-style scoresheets/scorebooks. I decided to create one myself based on images from the original Project Scoresheet (with some modifications). At the end of this post you’ll find links to PDFs that can be printed at home or uploaded to a site like FedEx and bound in a nice scorebook. Here’s the final result:

Project Scoresheet Scorebook Cover2016-05-09 20.10.38-22016-05-09 20.10.38-1

I won’t go into how to score in this method (if you found this post, you probably already know), but I thought making these scoresheets available might help the few people out there that are crazy enough to want to use them. Note that I didn’t include a place for tracking pitches (balls, strikes, etc.). As I said above, I find this more meditative than anything. Tracking pitches is anything but “meditative.”

I printed my copy at FedEx with the following options:

  • Double sided
  • Black & White
  • Laser 32# paper
  • Print first page in color
  • Coil Binding
  • Card and Cover Stocks Front Cover
  • Gloss Cover
  • Vinyl Black Back Cover

Cover Page [pdf] – This is for the front cover of the book.

Project Scoresheet Book [pdf] – This is the inside matter of the book. It contains enough pages for 81 games.

Project Scoresheet Scorecard [pdf] – This is a single scorecard for one game.

How to Use SQLite from C/C++ in Xcode

This is a brief tutorial on how to get SQLite up and running in Xcode in a C++ project. If you are going to use it in an actual project, you will obviously want to wrap it up in a class and make it more friendly to use. This is just a small example to see the basics of how SQLite works in C++ (really C).

First, you’ll want to install the library. This is easiest to do using a package manager. I use MacPorts, but Brew will work just as well (with changes to the source directories in your Xcode project.) To install, run:

sudo port install sqlite3

Let’s create a basic test database:

$ sqlite3 test.db
SQLite version 3.10.2 2016-01-20 15:27:19
Enter ".help" for usage hints.
sqlite> .mode column
sqlite> .headers on
sqlite> create table test (id integer primary key, col1 float, col2 float);
sqlite> insert into test values (1, 1.0, 2.0);
sqlite> insert into test values (2, 2.0, 3.0);
sqlite> select * from test;
id          col1        col2
----------  ----------  ----------
1           1.0         2.0
2           2.0         3.0

Next, create a command-line C++ project in Xcode. Then, add the SQLite3 library by clicking on your project, then Build Phases, then expand Link Library With Libraries. Click the + (plus) and add the file /opt/local/lib/libsqlite3.a.

Screenshot 2016-02-18 20.29.35

Next, under Build Settings, edit Header Search Paths to include /opt/local/include.

Screenshot 2016-02-18 20.31.57

Now, we can actually get to the code. All this code does is open the database, prepare a basic SELECT statement, loop through the results (outputting the column names and data for each column) and then finalizing the statement and closing the connection. It assumes the first column is an integer and the rest are doubles:

#include <iostream>
#include <sqlite3.h>

int main(int argc, const char * argv[]) {

    sqlite3 *db;
    int rc;
    
    rc = sqlite3_open("/path/to/db/test.db", &db);
    if (rc) {
        std::cout << "Can't open database: " << sqlite3_errmsg(db) << std::endl;
        return 1;
    }
    
    sqlite3_stmt *stmt;
    
    rc = sqlite3_prepare_v2(db, "SELECT * FROM test", -1, &stmt, NULL);
    if (rc != SQLITE_OK) {
        std::cout << "SQL Error: " << sqlite3_errstr(rc) << std::endl;
    }
    
    while (sqlite3_step(stmt) == SQLITE_ROW) {
        int cnt = sqlite3_column_count(stmt);
        
        for (int i=0; i < cnt; i++) {
            std::cout << sqlite3_column_name(stmt, i) << std::endl;
        }
        std::cout << std::endl;
        
        for (int i=0; i < cnt; i++) {
            switch (i) {
                case 0:
                    std::cout << sqlite3_column_int(stmt, i) << std::endl;
                    break;
                    
                default:
                    std::cout << sqlite3_column_double(stmt, i) << std::endl;
                    break;
            }

        }
        
    }
    
    sqlite3_finalize(stmt);
    
    sqlite3_close(db);
    
    return 0;
}

And the results:

id
col1
col2

1
1
2
id
col1
col2

2
2
3

Generating Linear Weights from the Retrosheet Database

Similar to the previous article on generating a run expectancy matrix, generating linear weights from the Retrosheet database is relatively straightforward (and builds on the previous article).

First, we generate the run expectancies for all base/out situations (here we are just running data from 1993-2015 — the modern expansion era):

# Only process complete innings. Exclude home innings in 9th inning or later.
DROP TABLE IF EXISTS tmp_complete_innings;
CREATE TEMPORARY TABLE tmp_complete_innings
  SELECT game_id, inn_ct, bat_home_id
    FROM events
   WHERE inn_end_fl = 'T'
         AND (event_outs_ct + outs_ct = 3)   # make sure it's a complete half-inning
         AND NOT (inn_ct >= 9 and bat_home_id=1) # exclude home innings from 9th on 
;
ALTER TABLE tmp_complete_innings ADD INDEX (game_id, inn_ct, bat_home_id);

DROP TABLE IF EXISTS tmp_re;
CREATE TABLE tmp_re
  SELECT start_bases_cd,
         cds.bases_long_desc, 
         outs_ct, 
         AVG(event_runs_ct + fate_runs_ct) AS RE
    FROM events e
    JOIN tmp_complete_innings inns USING (game_id, inn_ct, bat_home_id)
    JOIN bases_cds cds ON e.start_bases_cd = cds.bases_cd
   WHERE year_id BETWEEN 1993 AND 2015
GROUP BY start_bases_cd, cds.bases_long_desc, outs_ct
;
ALTER TABLE tmp_re ADD INDEX (start_bases_cd, outs_ct);

Now, we use those run expectancies to find the average value of each event type. For now, we are averaging across all base/out situations, but this method can be easily modified to generate linear weights for each event in each base/out situation as well:

  SELECT e.event_cd,
         ecds.event_name,
         AVG(postevent.RE - preevent.RE + e.event_runs_ct) AS linwgt
    FROM events e
    JOIN tmp_complete_innings inns USING (game_id, inn_ct, bat_home_id)
    # find the RE value before this event
    JOIN tmp_re preevent USING (start_bases_cd, outs_ct)
    # find the RE value after this event (notice that we increment any outs)
    JOIN tmp_re postevent ON e.end_bases_cd = postevent.start_bases_cd 
                       AND (e.outs_ct + e.event_outs_ct) = postevent.outs_ct
    # get pretty names for event codes
    JOIN event_cds ecds USING (event_cd)
   WHERE e.event_cd IN ( 20, 21, 22, 23, 16, 9, 8, 6, 3, 2, 4, 14, 15 ) 
         AND year_id BETWEEN 1993 AND 2015
GROUP BY event_cd
;

With the results:

+----------+------------------+-------------+
| event_cd | event_name       | linwgt      |
+----------+------------------+-------------+
|        2 | Generic out      | -0.26870669 |
|        3 | Strikeout        | -0.29438586 |
|        4 | Stolen Base      |  0.18298537 |
|        6 | Caught Stealing  | -0.55215834 |
|        8 | Pickoff          | -0.24726686 |
|        9 | Wild Pitch       |  0.26448573 |
|       14 | Walk             |  0.31750011 |
|       15 | Intentional Walk |  0.17602767 |
|       16 | Hit by Pitch     |  0.34045109 |
|       20 | Single           |  0.46835426 |
|       21 | Double           |  0.77295931 |
|       22 | Triple           |  1.06242835 |
|       23 | Home Run         |  1.39543685 |
+----------+------------------+-------------+

A few notes:

  1. This was a quick and dirty run. From the strikeouts != regular outs and other variations, you can tell that I need to filter out some events to get “cleaner” results (e.g. some strikeouts result on players on base).
  2. The year range you run this over will cause variations between these results and other published linear weights.

Recreating Tango’s Run Expectancy Matrix

Here’s a simple example of how to use your own personal Retrosheet database: recreating Tango’s run expectancy matrix from 1999-2002.

First, build a temporary table of all innings for which we want to process data. We only want to include complete innings and we want to exclude home innings in the 9th inning or later.

DROP TABLE tmp_complete_innings;
CREATE TEMPORARY TABLE tmp_complete_innings
  SELECT game_id, inn_ct, bat_home_id
    FROM events
   WHERE inn_end_fl = 'T'
         # make sure it's a complete half-inning
         AND (event_outs_ct + outs_ct = 3) 
         # exclude home innings from 9th on
         AND NOT (inn_ct >= 9 and bat_home_id=1)  
;
ALTER TABLE tmp_complete_innings 
            ADD INDEX (game_id, inn_ct, bat_home_id);

Now, we simply find the averages for every base/out situation in the innings we specified above.

  SELECT cds.bases_long_desc, 
         outs_ct, 
         AVG(event_runs_ct + fate_runs_ct) AS RE24
    FROM events e
    JOIN tmp_complete_innings inns USING (game_id, inn_ct, bat_home_id)
    JOIN bases_cds cds ON e.start_bases_cd = cds.bases_cd
   WHERE year_id BETWEEN 1999 AND 2002
GROUP BY start_bases_cd, cds.bases_long_desc, outs_ct
;

And the results:

+-----------------+---------+--------+
| bases_long_desc | outs_ct | RE24   |
+-----------------+---------+--------+
| Bases Empty     |       0 | 0.5553 |
| Bases Empty     |       1 | 0.2972 |
| Bases Empty     |       2 | 0.1166 |
| 1B              |       0 | 0.9529 |
| 1B              |       1 | 0.5725 |
| 1B              |       2 | 0.2509 |
| 2B              |       0 | 1.1885 |
| 2B              |       1 | 0.7243 |
| 2B              |       2 | 0.3444 |
| 1B & 2B         |       0 | 1.5711 |
| 1B & 2B         |       1 | 0.9706 |
| 1B & 2B         |       2 | 0.4649 |
| 3B              |       0 | 1.4841 |
| 3B              |       1 | 0.9834 |
| 3B              |       2 | 0.3874 |
| 1B & 3B         |       0 | 1.9024 |
| 1B & 3B         |       1 | 1.2400 |
| 1B & 3B         |       2 | 0.5359 |
| 2B & 3B         |       0 | 2.0513 |
| 2B & 3B         |       1 | 1.4673 |
| 2B & 3B         |       2 | 0.6341 |
| Bases Loaded    |       0 | 2.4163 |
| Bases Loaded    |       1 | 1.6499 |
| Bases Loaded    |       2 | 0.8128 |
+-----------------+---------+--------+

Setting Up a Personal Retrosheet Database

Setting up a basic Retrosheet database is fairly straightforward (with a little knowledge of databases and scripting). Here’s how I did it:

    1. Install MySQL (this is left as an exercise to the reader).
    2. Install Python 3.X (ditto).
    3. Install the Chadwick Tools for processing Retrosheet data. I’m using version 0.6.5 https://sourceforge.net/projects/chadwick/files/chadwick-0.6/chadwick-0.6.5/ To compile yourself on OS X/Linux:
./configure --prefix="/usr/local"
make
sudo make install
    1. Create a database for the data. e.g.:
mysql -u root -p -e "create schema retrosheet"
  1. Create a directory to do all of your processing (something like ~/Projects/retrosheet on OS X/Linux, or C:\Projects\retrosheet on Windows)
  2. Download the season event files by decade from here: http://www.retrosheet.org/game.htm (or, you can get them from github here: https://github.com/chadwickbureau/retrosheet)
  3. Unzip all of the event files into a subdirectory called data (i.e. ~/Projects/retrosheet/data).
  4. Run the following script which will convert all of the event files into two big files, playbyplay.csv and substitution.csv.
    from glob import glob
    from os import chdir, getcwd
    from subprocess import call
    
    data_dir = "/Users/hvs/Projects/retrosheet/data"
    prev_dir = getcwd()
    
    chdir(data_dir)
    files = glob("*.EV[ANP]")
    
    for filename in files:
    	print("Processing {}".format(filename))
    	year = filename[:4]
    	call("cwevent -q -y {} -f 0-96 -x 0-62 {} >> playbyplay.csv".format(year, filename), shell=True)
    	call("cwsub -q -y {} {} >> substitutions.csv".format(year, filename), shell=True)
    
    chdir(prev_dir)
    
  5. Create the events table in your database with this script: https://github.com/cantpitch/baseball/blob/master/retrosheet_schema.sql
  6. Load your playbyplay.csv into the events table with this script: https://github.com/cantpitch/baseball/blob/master/retrosheet_load.sql (you’ll need to update the file location in that script).
  7. Some helpful lookup tables can be built with this script: https://github.com/cantpitch/baseball/blob/master/lookup_tables.sql