Small sample size. It’s the phrase that’s often beaten to death in the statistical community. This being a statistical-leaning blog, I thought it would be a good idea to briefly touch on the topic. “SSS,” in its simplest form, merely means that we need more data before we can draw a conclusion. How much data you need can vary between what you’re trying to measure, but the general idea is more data is always better.
The biggest problem with SSS is when people use statistics in a misleading way to try and make an argument. You’ll often see this practiced by fans and sportswriters alike that are grasping for straws.
To think of this another way, let’s do the old player comparison routine and look at some numbers of two different, and unnamed, players.
Player A: 44 PA, 2 BB, 12 SO, 7 H, 0 2B, 0 3B, 0 HR, .167/.205/.167
Player B: 48 PA, 4 BB, 15 SO, 8 H, 2 2B, 1 3B, 0 HR, .186/.250/.279
So, yeah. Player A and Player B both looked pretty bad over this stretch. Neither hit much (batting average-wise), got on base much, or hit for much power. Player A is Buster Posey from May 1st to May 12th this year; Player B is Brandon Belt from June 25th to July 15th. I don’t think anyone is making the argument that Buster Posey sucks. And yet, the noise on the topic of Brandon Belt, and whether or not he’s any good, seems to get louder each day. Most of that noise seems to be centered around SSS tomfoolery.
Over a full season of baseball, you’ll get more noise than clarity; and, in my humble opinion, it’s one of the reasons why I love baseball so much. You have to look at the big picture. On a game-by-game basis, nothing much really matters. Heck, in a 10-15 game sample nothing is going to be significant. That’s not to say that watching your favorite player go 4-4 isn’t something to behold, because it is, but rather when you try to craft narratives off a series of chaotic events, that are so noisy by nature, you’re going down the path towards disingenuousness. And really, we’re better than that.
So, the next time you read your favorite sportswriter droning on about Player A, or Player B, or Player C and how they’ve either become the greatest (or worst) player ever, and they’re using the smallest of samples to make their arguments, you can, and should, just move on.