Colley Method for Ranking NFL Teams

The NFL season started this week, which means it is a perfect time to set up a program for ranking teams! My fiancee bought me a book last Christmas called Who’s #1, which is an awesome book on how ranking and rating methods work. Specifically the book is geared towards sports, but I was surprised to learn how many companies use ranking algorithms — for example, Google and Netflix. I am — by and large — ignorant of most things sports related, but this past year I had lived with five other guys who love sports and who had invited me to join their March Madness bracket group. Since I don’t really follow college basketball, but I wanted a shot at winning our groups bracket challenge (there was a cash prize!), I used the methods in the book to rank college basketball teams and fill out the brackets algorithmically. I ended up taking the second place prize out of a group of nearly 25 brackets in our group, and in the top 5% nationally. I took first in my bracket group I did with the other graduate students…but I didn’t put any money in that one! (Bummer…)

One of the first methods I learned is called the Colley Method, and you can read all about it here. You might recognize it as one of the key computer methods in college football’s Bowl Championship Series (no comment on that). The method takes the win-loss record of each team, assigns them a numerical rating based on who they beat (or lost to), and ranks them according to that rating. The big idea is that if you beat a very good team, it should count more than if you beat a poor team, and if you lose to a very good team, you should not be ‘penalized’ as much as if you lose to a terrible team. In a sense, it works by interconnecting teams based on who they won and lost against — kind of like a pseudo-social network. If you beat me, and I beat your friend, then we assume you would have beaten your friend, even if you never go to play them. Hopefully that makes sense!

Anyway, if you want to try it yourself, here is a link to the code that ranks both the 2012 and 2013 NFL teams. I included an option for preseason games, since there is (at the time of this writing) only one regular season game that has been played in the NFL. Note that you will need the numpy module for this to work.

To run it, type

python -e -y 2013

This will call all the other python files to fetch the data from and will then process it an implement the Colley method, giving you a colley.txt file containing the ratings of each team (sorted of course). The flag -y gives input for the year (2012 or 2013), and the -e flag, if present, includes exhibition matches as well (e.g. preseason games).

Now for the guts of the code (for those who are interested).

First we need a way to get the data from, and put the games into an array as well as get our teams. Because the games use an integer to represent the teams, I create a python dictionary to map the number to the team name. So does precisely this:

def get_teams():
""" Get team data from,
parse it into a dictionary with team number
as the key, team name as the value

f = urllib.urlopen("")
s =
my_teamnames = {}
for i in range(0,len(s)/2):
my_teamnames.update({i : s[i*2 + 1]})
return my_teamnames

def get_games(year,exhibition):
""" Get game data from,
parse it into a numpy array.

if year == 2013:
if exhibition == True:
f = urllib.urlopen('')
elif exhibition == False:
f = urllib.urlopen('')
sys.exit('"exhibition" must be "True" or "False"')
elif year == 2012:
if exhibition == True:
f = urllib.urlopen('')
elif exhibition == False:
f = urllib.urlopen('')
sys.exit('"exhibition" must be "True" or "False"')
sys.exit('Not a valid year')
s =
if exhibition == False:
file_name = str('games_'+str(year)+'.txt')
elif exhibition == True:
file_name = str('games_'+str(year)+'_exhib.txt')
k = open(file_name,'w')

my_games = genfromtxt(file_name, dtype = None, delimiter=',')

return my_games

Once we have our games and teams, we need to set up the Colley matrix (‘A’), and solve it against Colley’s ‘b’ vector. The takes care of this. We take data about the games, and set up a square matrix with the number of teams as the dimension. Where teams intersect contains data about how they fared against each other. The mathematical details can be found in the Who’s #1 and in the link to the Colley Method I gave. The end result of this code is a rating vector (called rating here). Anyway, here is that code:

def colley(num_teams,num_games,my_teams,my_games):

A = np.zeros((num_teams,num_teams))
wins = np.zeros(num_teams)
losses = np.zeros(num_teams)
total = np.zeros(num_teams)

for game in range(0,num_games):
w = my_games[game,2] - 1.0
l = my_games[game,5] - 1.0

A[w,w] = A[w,w] + 1.0
A[l,l] = A[l,l] + 1.0
A[l,w] = A[l,w] - 1.0
A[w,l] = A[w,l] - 1.0

for i in range(0,num_teams):
total[i] = total[i] + wins[i] + losses[i]

b = np.zeros(num_teams)
for i in range(0,num_teams):
A[i,i] = 2 + total[i] + A[i,i]
b[i] = 1 + (wins[i] - losses[i])/2

rating = np.linalg.solve(A,b)
return rating

Once we have these, we pair it up with our dictionary element containing the team names, and sort it. This is the code. I wrapped everything in the file if you are checking this out in my repo. It takes care of making all the files play nicely together.

So now the big question is, how does it fare? Here is the list as of today (9/7/13):

    Seattle, 0.806187648116
Detroit, 0.76229943304
Washington, 0.734350377447
Cleveland, 0.691797811483
New_England, 0.666157176065
New_Orleans, 0.657992419938
NY_Jets, 0.652519130556
San_Francisco, 0.648158042535
Arizona, 0.647296343048
Houston, 0.647271431013
Denver, 0.614105025386
Carolina, 0.600878984866
Buffalo, 0.563322480133
Cincinnati, 0.557969510878
Philadelphia, 0.53328007273
Indianapolis, 0.525416655862
Chicago, 0.522801819437
Kansas_City, 0.479567506516
Dallas, 0.453378981006
San_Diego, 0.437407308856
Miami, 0.414388770684
Oakland, 0.406726811416
Green_Bay, 0.37888674304
Tampa_Bay, 0.36195341794
Minnesota, 0.357868414451
Baltimore, 0.356824183447
St_Louis, 0.340268960559
NY_Giants, 0.339410132678
Tennessee, 0.288458533025
Jacksonville, 0.280125144891
Pittsburgh, 0.192367833585
Atlanta, 0.0805628953732

Not surprisingly, Seattle is at the top (Woot woot! Go Hawks!), having finished the preseason perfectly against tough teams. Atlanta is at the bottom, having lost every preseason game. Generally, the ratings stack up similar to how the teams did in the preseason.

Also not surprising? How poor of a representation I think it really is. Preseason games are the time where teams test out their second and third strings and avoid playing their top players. So I think, for example, Atlanta will fare a lot better than this preseason ranking suggests. While I like the Colley Method’s simplicity, I think it performs poorly as a predictive algorithm. The Massey method, on the other hand, is much better at predicting future match-ups. This was my experience with it in March Madness anyway. Google it if you are curious. If you are serious about doing any sports ranking, I also want to suggest that you take a look at weighting methods…for example, games played later in the season ‘count more’. I think that is a critical step to getting predictive algorithms. How to do that well, though, is more of an art :)

If you have any questions or comments, let me know!