Why, in a world where the only constant is change, we need to take a Continual Learning approach to AI models.

Imagine you have a small robot that is designed to walk around your garden and water your plants. Initially, you spend a few weeks collecting data to train and test the robot, investing considerable time and resources. The robot learns to navigate the parts of the garden where there is grass and bare soil.

However, as the weeks go by, flowers begin to bloom and the appearance of the garden changes significantly. The robot, trained on data from a different season, now fails to recognize its surroundings accurately and struggles to complete its tasks. To fix this problem, you need to add new examples of the blooming garden to the model.

Your first thought is to incorporate new data examples into the training and retrain the model from scratch. But this approach is expensive and you do not want to do this every time the environment changes. In addition, you have just realized that you do not have all the necessary historical training data.

Next, you consider just fine-tuning the model with new samples. But this approach is risky because the model may lose some of its previously learned capabilities, leading to catastrophic forgetting (a situation where the model loses previously acquired knowledge and skills when it learns new information).

So…. is there an alternative? Yes, Continual Learning (CL)!

Of course, a robot watering plants in a garden is only an illustrative example of the problem. Later in this article you will see more realistic applications.

Learn adaptively with Continual Learning

It is not possible to foresee and prepare for all the possible scenarios with which a model may be confronted in the future. Therefore, in many cases, adaptive training of the model as new samples arrive can be a good option.

In CL we want to find a balance between the stability of a model and its plasticity. Stability is the ability of a model to retain previously learned information, and plasticity is its ability to adapt to new information as new tasks are introduced.

“(…) in the Continual Learning scenario, a learning model is required to incrementally build and dynamically update internal representations as the distribution of tasks dynamically changes across its lifetime.” [2]

But how to control for stability and plasticity?

Researchers have identified a number of approaches to building adaptive models. The following categories have been established for these approaches[3]:

1.    Regularization-based approaches

  • In this approach, we add a regularization term that should balance the effects of old and new tasks on the model structure.
  • Weight regularization, for example, aims to control the variation of the parameters by adding a penalty term to the loss function. This term penalizes the change of the parameter by taking into account how much it contributed to the previous tasks.

2. Replay-based approaches

  • Methods in this category focus on recovering some of the historical data so that the model can still reliably solve previous tasks. One of the limitations of this approach is that we need access to historical data, and accessing it is not always possible.
  • Experience replay, for example, involves preserving and replaying a sample of old training data. When training a new task, some examples from previous tasks are added to expose the model to a mixture of old and new task types, thereby limiting catastrophic forgetting.

    3. Optimisation-based approaches
  • Here, we want to manipulate the optimization methods to maintain performance for all tasks, while reducing the effects of catastrophic forgetting.
  • Gradient projection, for example, is a method where gradients computed for new tasks are projected so as not to affect previous gradients.

    4. Representation-based approaches
  • Methods in this category focus on obtaining and using robust feature representations to avoid catastrophic forgetting.
  • Self-supervised learning, for example, involves having a model learn a robust representation of the data before being trained on specific tasks. The idea is for the model to learn high-quality features that reflect good generalization across different tasks that a model may encounter in the future.

    5. Architecture-based approaches
  • The previous methods assume a single model with a single parameter space, but there are also a number of techniques in CL that exploit a model’s architecture.
  • Parameter allocation, for example, involves giving each new task a dedicated subspace in a network during training so as to remove the problem of parameter destructive interference. However, if the network is not fixed, its size will grow with the number of new tasks.

And how to evaluate the performance of CL models?

The basic performance of CL models can be measured from a number of angles [3]:

  • Overall performance evaluation: average performance across all tasks
  • Memory stability evaluation: calculating the difference between maximum performance for a given task before and its current performance after continual training
  • Learning plasticity evaluation: measuring the difference between joint training performance (if trained on all data) and performance when trained using CL

So why don’t all AI researchers switch to Continual Learning right away?

If you have access to historical training data and are not worried about the computational cost, it may seem easier to just train from scratch.

Why? One reason is that the interpretability of what happens in the model during continual training is still limited. If training from scratch gives the same or better results than continual training, then it may be preferable to take the easier approach, i.e. retrain from scratch rather than spend time trying to understand the performance problems of CL methods.

In addition, current research tends to focus on the evaluation of models and frameworks that may not accurately reflect practical business use cases. There are many synthetic incremental benchmarks that do not effectively reflect real-world situations where there are natural evolutions of tasks. [6]

Finally, many papers on the topic of CL focus on storage rather than computational costs. In reality, storing historical data is much less costly and energy-intensive than retraining the model. [4]

If there were more focus on the inclusion of computational and environmental costs in model retraining, more researchers might be interested in improving the current state of the art in CL methods as they would see measurable benefits. For example, model re-training can exceed 10,000 GPU days of training for the latest large models. [4]

Why should we work on improving CL models?

Continual learning seeks to address one of the most challenging bottlenecks of current AI models. This challenge is the fact that data distribution changes over time. Retraining is expensive and requires large amounts of computation, which is not a very sustainable approach from both economic and environmental perspectives. Therefore, in the future, well-developed CL methods may allow for models that are more accessible and reusable by a larger community of people.

A list of applications that inherently require or could benefit from the well-developed CL methods includes[4]:

1. Model Editing, or selective editing of an error-prone part of a model without damaging other parts of the model. Continual Learning techniques could help to continuously correct model errors at much lower computational cost.

2. Personalization and specialization. General purpose models sometimes need to be adapted to be more personalized for specific users. With Continual Learning, we could update only a small set of parameters without introducing catastrophic forgetting into the model.

3. On-device learning. Small devices have limited memory and computational resources, so methods that can efficiently train the model in real time as new data arrives, without having to start from scratch, could be useful here.

4. Faster retraining with a warm start. Models need to be updated when new samples become available or when the distribution shifts significantly. With Continual Learning, this process can be made more efficient by updating only the parts affected by new samples, rather than retraining from scratch.

5. Reinforcement learning, which involves agents interacting with an environment that is often non-stationary. Therefore, efficient Continual Learning methods and approaches could be potentially useful for this use case.

Learn more

As you can see, there is still a lot of room for improvement with respect to Continual Learning methods. If you are interested you can start with the materials below:

  • An introductory course: [Continual Learning Course] Lecture #1: Introduction and Motivation from ContinualAI on YouTube
  • A paper about the motivation for the Continual Learning: Continual Learning: Application and the Road Forward [4]
  • A paper about state-of-the-art techniques in Continual Learning: Comprehensive Survey of Continual Learning: Theory, Method and Application [3]

If you have any questions or comments, please feel free to share them in the comments section.



