At some point in machine learning, most beginners run into the same problem. And then a unanimous question arises. Why, with the training data set, does the model I am developing offer extreme forecasting reliability, while with a new data set it fails more than a fairground shotgun? At BETWEEN we know who the culprit is. And it's called overtraining or overfitting.
Overfitting in machine learning is a phenomenon that makes a predictive algorithm present a low percentage of success in its results, offering forecasts with a high variance. This happens if the sample used in training the model:
In opposition to overfitting, underfitting is defined, a problem that also generates poor reliability in the model's predictions, in this case because they present a high bias. In underfitting or underfitting, the cause is that the input data are insufficient to establish generalizations, or that they offer little information about the question to be deduced. A common mistake, an example of the latter, is to insist on constructing a linear regression - in order to try to know what will happen in the future - with a sample drawn from too short a period.
There is an unmistakable sign that a machine learning model is overfitting: with the training data set, its success rate is around 100%; but when it processes new records, the latter falls to half or less. Overtraining has led him to attribute with pinpoint precision the characteristics of what he already knows; but it has hampered him when it comes to generalizing the results in different areas.
Underfitting, on the other hand, is diagnosed when the machine learning model provides poor results with both the training sample and unknown input records.
To avoid or solve overfitting in machine learning, we can resort to various techniques that improve model training and correct inappropriate deviations in the results. Some of them are:
Entering the world of machine learning is a lot of trial and error, so don't despair if your models don't work the first time. IT professionals know well that their daily lives go through experimenting, correcting and learning from mistakes before achieving success. If this situation sounds familiar to you, why not come to BETWEEN to continue growing in your professional career thanks to our constantly updated range of job offers in IT? Place yourself in the best possible place to experience the news in the sector in the coming years, such as the expansion of big data or the generalization of the HTTP / 3 protocol for a faster Internet. With BETWEEN you will have no limits!