Sunday, November 4, 2012

Overfitting models

We're looking at Nate Silver's The Signal and the Noise: Why So Many Predictions Fail-but Some Don't, starting here. Silver discusses some of the inherent problems and mistakes people make with statistical models.

One of the most important is overfitting.

The name overfitting comes from the way that statistical models are “fit” to match past observations. The fit can be too loose—this is called underfitting—in which case you will not be capturing as much of the signal as you could. Or it can be too tight—an overfit model—which means that you’re fitting the noise in the data rather than discovering its underlying structure. The latter error is much more common in practice.

It can lead to serious problems.

 

As obvious as this might seem when explained in this way, many forecasters completely ignore this problem. The wide array of statistical methods available to researchers enables them to be no less fanciful—and no more scientific—than a child finding animal patterns in clouds.* “With four parameters I can fit an elephant,” the mathematician John von Neumann once said of this problem. “And with five I can make him wiggle his trunk.” Overfitting represents a double whammy: it makes our model look better on paper but perform worse in the real world. Because of the latter trait, an overfit model eventually will get its comeuppance if and when it is used to make real predictions.

This is one of the great stories of financial markets. People are forever trying to come up with the equivalent of quantitative alchemy to transform historical data into gold. It is quite easy to tune a model so it performs very well on past data, and marches undulations with surprising precision. And it is amazingly easy to lose your shirt when the model goes awry when used to predict where the market will go next.

 

No comments:

Post a Comment