KDnuggets : News : 2008 : n13 : item18 < PREVIOUS | NEXT >

Publications

From: Bruce Ratner
Date: 2-Jul-2008
Subject: Overfitting: Old Problem, New Solution

Overfitting, a problem akin to model inaccuracy, is as old as model building itself, as it is part and parcel of the modeling process. An overfitted model is one that approaches reproducing the training data on which the model is built -- by "capitalizing on the idiosyncrasies" of the training data. The model brings about the complexity of the idiosyncrasies by including in the model extra unnecessary variables, interactions, and variable construction(s), all which are not part of the sought-after predominant pattern in the data.

As a result, a major characteristic of an overfitted model is that it has too many variables. Ergo, the overfitted model can be thought of a perfect "picture" of the predominant pattern. As such, individuals of a holdout data, drawn from the population of the training data, strangers who are unacquainted with the training data, cannot expect to "fit into" the model�s perfect picture of the predominant pattern to produce good predictions. When a model�s accuracy based on the holdout data is "out of the neighborhood" of the model�s accuracy based on the training data, the problem is one of overfitting, and the model is said to be an overfitted model.

Read more.

Bookmark using any bookmark manager!


KDnuggets : News : 2008 : n13 : item18 < PREVIOUS | NEXT >

Copyright © 2008 KDnuggets.   Subscribe to KDnuggets News!