Bias and Variance in the Deep Learning era

One of the concepts of machine learning that is easy to learn but difficult to master is Bias and Variance. It’s like Artificial Intelligence for newbies, they talk a lot about it but don’t’ know how it works internally(no offense, guys).

Understanding Bias and Variance

Whenever we are training our model, what we want is that the model have a minimum loss and we get a decision boundary that differentiates classes with high accuracy. But that is not always what we get.

We take a dataset and divide it into 3 parts: train set, validation set, and test set. The train set is what we use to train our model, the validation set to validate our model learning, and the test set to test our model on unseen data. In our case, we’ll use the train set and validation set to define bias and variance.

In a utopian world, we’ll want our training error and validation error to be zero i.e. our model is just perfect, but this is not the case.

Let’s understand bias and variance with an example. Suppose we want to classify cats and dogs and in the below image, the green cross represents dogs and red dots represent cats. So, we need to find the best fit function that separates these two classes.

We’ll understand this bias and variance with an analogy of a school going child who is having a maths exam tomorrow. In this analogy, our model will be the child, the training set is our textbook questions and the exam paper is our validation set.

Understanding Bias: Suppose the child is not well equipped to practice textbook questions or don’t have the mental capacity to solve text book questions and this will result in performing badly in the exam. Similarly, if our model is not good enough or doesn’t have good quality data, it will perform badly on the training set as well as on the validation set. This is called bias or underfitting since our decision boundary doesn’t fit well on the train set. (Refer to above image).

Understanding Variance: Going by the child analogy, suppose we gave the child all the resources for preparing, but rather than understanding the concepts or generalizing them, the child memorized the solution. In this case, the child will perform very well on textbook questions but will fail to perform on new questions in the exam. Similarly, if our model is too large or complex, it will try to reduce the training error to a very small value but on the validation set(new examples) it will fail to perform since it hadn’t generalized the model. This is called variance or overfitting.

Identifying bias and variance in our model

Let’s have a look at the below table for training and validation set error and what exactly they mean. We’ll consider human level error on this dataset to be approximately zero.

High Variance: When our model has a low training error and high validation set error, it means it is suffering from high variance.

Dealing with High Variance

The following methods can be used to handle high variance in our model.

Use regularization and dropout techniques.
Using a different model architecture
Training on more variety and quantity of data(Think about how this will help!)

High Bias: When our model has both training and validation set error, it means it is suffering from high bias.

Dealing with High Bias

The following methods can be used to handle high bias in our model.

Using a different model architecture.
Using larger and deeper neural network.
Increasing the number of epochs.

High Bias and High Variance: This is a little tricky case in which training error is high but the validation error is even higher. This may happen when our model behaves in a weird way and learn some of our training data very well(not generalizing over it) but doesn’t learn the remaining data.

Dealing with High Bias and High Variance

First, try to reduce the bias using the methods mentioned above and once you have a low training error, you can then try to reduce the variance.

Homework Time

Enough from my side, now it’s time for some effort from you. Try to plot a graph between error(training and validation error) and model complexity and try to show how bias and variance impact the shape of these curves.

Bias and Variance in the Deep Learning era

Dealing with High Variance

Dealing with High Bias

Related Posts

Deep Dive into Logistic Regression

A quick guide to getting started with NumPy