07 September 2006

The Gift of Good Data Analysis

A few months back, I bagged on some of the analysis I was getting second-hand from a Malcolm Gladwell review of this book. I did go to the website of one of the authors in order to better argue my points, but I hadn't read the book and was therefore treading on shaky ground. I picked up the book shortly thereafter and started reading it, but was distracted by other shiny things. I'm coming back to it today and re-reading the second chapter. Unless there's some serious scholarship being edited out for the sake of sales, I've already got big problems with the way Berri, Schmidt, and Brook do their jobs.(1)

Chapter Two is entitled "Much Talking, Little Walking". The issue is work stoppages, and, as should be clear from the title, the effect on attendence and viewership upon return to play. At least, it should be about that. Unfortunately, they made a crucial error in selecting their data which makes a hash of their conclusion.

Economics(2), like all applied math fields, is data driven. Moreover, it is data-sensitive. A very small change in a constant can play havoc with results. Choosing inputs poorly can have similar effect. (I just started to write the formulaic "far be it from me to question..." and decided to avoid the hypocrisy. Of course I'm going to question. Of course I'm that arrogant.) Here is their conclusion, summarized in their own words:

To hammer home this point, consider one last piece of evidence. Population since 1980 in the United States and Canada has grown approximately 30%. In this same time period, average attendance for Major League Baseball teams grew from 1.65 million in 1980 to 2.43 million in 2004. A bit of quick math reveals that this represents a 47% increase. Despite repeated labor disputes people are coming out to the ballpark in much greater numbers. Given these numbers it is hard to believe the conventional wisdom that the repeated fights between owners and players dramatically harm professional sports.

Although we can go back and forth on the 1994-95 experience, we should not lose sight of the larger picture the data are telling. In our study of the NBA, NHL, NFL, and the 1981 strike in baseball, the data speak clearly. Again, we began this research believing that fans did respond negatively to strikes and lockouts. Despite our prior belief, we go where the data take us. And the data clearly take us to a very different answer. Consumers of sport may publicly talk, but the data show no evidence that many fans choose to walk.
In the endnotes for the chapter, the authors give us their data sources:

For our published academic articles, the attendance data were taken from a number of sources. For MLB, the attendance data were obtained for the years 1901-2000 from The Sporting News Complete Baseball Record Book (2001). The NFL data began in 1936 and concluded in 1999 and were obtained from The Sporting News Pro Football Guide (2000). Finally, attendance data for the NHL were obtained for the period 1960-2000 from Total Hockey (2001). Updates for these series, as well as the NBA attendance data employed, were taken from the web site of noted sports economist Rodney Fort. www.rodneyfort.com/SportsData/BizFrame.htm.

Did you see it? What negates the conclusion?

The datasets chosen look at a very special, self-selecting group of fans: those who attend games in person. Lori and I have season tickets to the Suns. We go to a lot of games. When we can't go, we make a very big effort to get someone in our seats because there's a financial incentive to do so.(3) Besides that, we're basketball fans and enjoy going to games. On any given night, there are about 16K fans in the stands. The number doesn't really vary much. What does vary is television viewship.

Here's a quick example of the sort of ratings data that should have been included, from Baseball Almanac. I'm not qualified to analyze the data; in fact, I'm not sure I can even correctly define how it should be analyzed(4) but I can say a few things: the drop-off of 7.4 ratings points from 1998 to 1999 ('99 being the lockout year) in the NBA finals (roughly 8.1 million households(5)) was probably not just because of the Jordan Effect; and compared to the roughly 12.4 million households who did watch in '99, the 16K or so who can fit into a given arena on a given evening (at approximately .1% of total viewers) probably represent less than the margin of error in the Nielsen measurements.(6)

Again, I'm no economist. Frankly, I don't know my ass from a Laffer curve, but I do know quite a bit about measurement and data analysis. If the analysis in this second chapter is indicative of the type of work in the rest of the book, I may be boring y'all some more in the next few days. This is just sloppy scholarship.

1No, I do not expect serious math in a book directed at the mass market; neither do I expect sloppy, indefensible logic based on incorrect or incomplete suppositions.
2Did I study econ? No. I studied math, physics, and software engineering. The relationship? I know all too well the problems of inaccurate measurements and incorrect inputs. While I am not accomplished with the specific tools in the Economist's Utility Belt, they bear striking similarities in both intended effect and side effect. Aim that Econo-rang a half degree off and you don't stop Doc Stagflation.
3Because it's in the team's best interests to have sell-outs - good for PR and concessions - they give us a rebate on our following year's ticket package (typically in the 5-10% range) if our seats are always occupied.
4I want to be clear: I know these World Series ratings are almost useless, so don't bother pointing that out to me. Ratings for televised sports are most impacted by the teams (and maybe matchups) involved, so except in cases where teams repeated in the WS in pre- and post-strike years, there's not much to be gained. I wanted an example of ratings data and this was readily available. I'd think the authors, given a few hours effort, could do much better. Besides, the fragmentation of television viewership due to the increase in entertainment choices over the past two or three decades most certainly is the biggest factor in the downward trend in these ratings.
5A ratings point, in 2005, represented approximately 1.1 million households.
6Sorry for relying on Wikipedia data for the so much of this paragraph. I got lazy in tracking down more reliable and definitive sources.


R.A. Porter said...

Um...okay. I'm happy with what I wrote, but not so happy with the formatting. I have gone over the posting four or five times now, and I can't figure out why the leading below the second blockquote is reduced so much. I can only assume it's something goofy in the CSS for the page.

That's one of the many prices for using the beta version of blogger.

EarlsDonuts said...

Oy. You clearly don't have it.

I know for a damn fact that NBA attendance was down 100% during the Fall of 1998. You can look it up. I can take just about any point in time and prove anything I want, even the 1999 NBA finals. You took math, why don't you run the numbers and I would hazard that the 1999 Finals ratings are easily within 2 deviations of the last non-Jordan finals in 94 & 95. The finals weren't televised live/prime-time until 1982.

The authors' point is that LONG-TERM, attendance does not seem to be affected labor disputes/work stoppages (at least in baseball).

(I'd be interested to see pre/post-lockout attendance for the NHL, but I think that was the point of that lockout--nobody was going/watching the games anyway.)

The operative phrase in their analysis is "consumer of sports".
Fans who buy tickets and attend games are making a classic supply/demand economic decision--putting their money where their mouth is. TV viewers are mercurial by nature and not making that same conscious economic decisions--they have no monetary stake in watching/not watching. I'm not an economic/math person either, but I'm pretty sure the supply/demand curve breaks when you have an UNLIMITED supply (free TV). Again in the classic sense, free TV is not 'commodified', and you don't know what something is worth if you give it away. The experience of buying a ticket and going to a game has changed the least

I can't be sure (because I changed the channel) but I would bet that MSG Game 5 '94 didn't empty out because of OJ, unlike TV viewership did. The point being that TV ratings can be very volatile.

When you open the discussion to TV ratings you bring in many many uncontrollable factors--promotion, counter-programming, cable/dish/TV, Al Michaels, preponderance of crappy graphics/John Tesh-penned theme music, and Al Michaels. And to dismiss the Jordan effect is extremely short-sighted--NBC built their entire operation around promoting Jordan's exploits for 8 years--he was the most poular person on the planet for chrissakes.

Ah jesus and the comment about 16k being within the margin of error for Nielsen ratings... awesome display of understanding significant sample size.

R.A. Porter said...

I'll comment further later in the weekend, but I'm worn out after our ridiculously long argument this afternoon. I think it's correct to say that you agree with the authors disregard of tv ratings (whether intentionally excluded or not) and think I'm insane for finding them relevant. Likewise, I think it's idiotic to only look at the fraction of fans who show up for games, ignoring the vast majority whose dollars are indirectly funneled to the teams and leagues (implicitly through advertising revenue, and tv contracts; explicitly through merchandising and pay media like mlb.com and NBA League Pass.)

I stand by my point that a theoretical league with 20M fans that stops work and loses 19M fans will still sell out its stadia, but lose revenue over time. I don't know if that league would lose 19M fans...for all I know it would gain 5M when play resumed. What I do know is that Berri, et. al. don't address that.

Quick points: From '82 to '06, the average finals rating was 13.22, and the standard deviation 2.95. That 11.3 for the post-strike, post-Jordan finals is within one deviation of the Rockets/Magic in '95. I was a bit worried last night that I'd underestimated the Jordan effect and you're absolutely right to nail me on it. I hope I was clear that I don't think finals/WS ratings are actually useful, but were readily available as a touchstone. I now think I'd have been better served by excluding those numbers entirely.

Also, ask and ye shall receive. Here's a spreadsheet with the pre- and post-lockout attendance numbers for the NHL and the deltas. Here's a direct link to ESPN's source data. Note that while they're truly Blue in St. Lou, overall league attendance was up 12K/game. Obviously, comparing national ratings for the years would be pointless as OLN isn't widely available. Comparison of team-by-team numbers for local broadcasts would be the only effective way, I'd think.

R.A. Porter said...

I was just about to shut down and go to sleep when I realized I had no context for the NHL attendance numbers...were arenas reconfigured to seat more or fewer patrons (the latter helps increase the number of sell outs the teams can report and seems most common in baseball,) or did something else effect the outlier teams? I don't know dick all about hockey, so I took a gander at the Blues' website. Looks like they were in an ownership transition last year, which might have impacted attendance. Then again, the Blues might just suck.