Imagine you have a small robot that is designed to walk around your garden and water your plants. Initially, you spend a few weeks collecting data to train and test the robot, investing considerable time and resources. The robot learns to navigate the parts of the garden where there is grass and bare soil.
However, as the weeks go by, flowers begin to bloom and the appearance of the garden changes significantly. The robot, trained on data from a different season, now fails to recognize its surroundings accurately and struggles to complete its tasks. To fix this problem, you need to add new examples of the blooming garden to the model.
Your first thought is to incorporate new data examples into the training and retrain the model from scratch. But this approach is expensive and you do not want to do this every time the environment changes. In addition, you have just realized that you do not have all the necessary historical training data.
Next, you consider just fine-tuning the model with new samples. But this approach is risky because the model may lose some of its previously learned capabilities, leading to catastrophic forgetting (a situation where the model loses previously acquired knowledge and skills when it learns new information).
So…. is there an alternative? Yes, Continual Learning (CL)!
Of course, a robot watering plants in a garden is only an illustrative example of the problem. Later in this article you will see more realistic applications.
It is not possible to foresee and prepare for all the possible scenarios with which a model may be confronted in the future. Therefore, in many cases, adaptive training of the model as new samples arrive can be a good option.
In CL we want to find a balance between the stability of a model and its plasticity. Stability is the ability of a model to retain previously learned information, and plasticity is its ability to adapt to new information as new tasks are introduced.
Researchers have identified a number of approaches to building adaptive models. The following categories have been established for these approaches[3]:
1. Regularization-based approaches
2. Replay-based approaches
The basic performance of CL models can be measured from a number of angles [3]:
So why don’t all AI researchers switch to Continual Learning right away?
If you have access to historical training data and are not worried about the computational cost, it may seem easier to just train from scratch.
Why? One reason is that the interpretability of what happens in the model during continual training is still limited. If training from scratch gives the same or better results than continual training, then it may be preferable to take the easier approach, i.e. retrain from scratch rather than spend time trying to understand the performance problems of CL methods.
In addition, current research tends to focus on the evaluation of models and frameworks that may not accurately reflect practical business use cases. There are many synthetic incremental benchmarks that do not effectively reflect real-world situations where there are natural evolutions of tasks. [6]
Finally, many papers on the topic of CL focus on storage rather than computational costs. In reality, storing historical data is much less costly and energy-intensive than retraining the model. [4]
If there were more focus on the inclusion of computational and environmental costs in model retraining, more researchers might be interested in improving the current state of the art in CL methods as they would see measurable benefits. For example, model re-training can exceed 10,000 GPU days of training for the latest large models. [4]
Continual learning seeks to address one of the most challenging bottlenecks of current AI models. This challenge is the fact that data distribution changes over time. Retraining is expensive and requires large amounts of computation, which is not a very sustainable approach from both economic and environmental perspectives. Therefore, in the future, well-developed CL methods may allow for models that are more accessible and reusable by a larger community of people.
A list of applications that inherently require or could benefit from the well-developed CL methods includes[4]:
1. Model Editing, or selective editing of an error-prone part of a model without damaging other parts of the model. Continual Learning techniques could help to continuously correct model errors at much lower computational cost.
2. Personalization and specialization. General purpose models sometimes need to be adapted to be more personalized for specific users. With Continual Learning, we could update only a small set of parameters without introducing catastrophic forgetting into the model.
3. On-device learning. Small devices have limited memory and computational resources, so methods that can efficiently train the model in real time as new data arrives, without having to start from scratch, could be useful here.
4. Faster retraining with a warm start. Models need to be updated when new samples become available or when the distribution shifts significantly. With Continual Learning, this process can be made more efficient by updating only the parts affected by new samples, rather than retraining from scratch.
5. Reinforcement learning, which involves agents interacting with an environment that is often non-stationary. Therefore, efficient Continual Learning methods and approaches could be potentially useful for this use case.
Learn more
As you can see, there is still a lot of room for improvement with respect to Continual Learning methods. If you are interested you can start with the materials below:
If you have any questions or comments, please feel free to share them in the comments section.
Cheers!
References
[1] Awasthi, A., & Sarawagi, S. (2019). Continual Learning with Neural Networks: A Review. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 362–365). Association for Computing Machinery.
[2] Continual AI Wiki Introduction to Continual Learning https://wiki.continualai.org/the-continualai-wiki/introduction-to-continual-learning
[3] Wang, L., Zhang, X., Su, H., & Zhu, J. (2024). A Comprehensive Survey of Continual Learning: Theory, Method and Application. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(8), 5362–5383.
[4] Eli Verwimp, Rahaf Aljundi, Shai Ben-David, Matthias Bethge, Andrea Cossu, Alexander Gepperth, Tyler L. Hayes, Eyke Hüllermeier, Christopher Kanan, Dhireesha Kudithipudi, Christoph H. Lampert, Martin Mundt, Razvan Pascanu, Adrian Popescu, Andreas S. Tolias, Joost van de Weijer, Bing Liu, Vincenzo Lomonaco, Tinne Tuytelaars, & Gido M. van de Ven. (2024). Continual Learning: Applications and the Road Forward https://arxiv.org/abs/2311.11908
[5] Awasthi, A., & Sarawagi, S. (2019). Continual Learning with Neural Networks: A Review. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (pp. 362–365). Association for Computing Machinery.
[6] Saurabh Garg, Mehrdad Farajtabar, Hadi Pouransari, Raviteja Vemulapalli, Sachin Mehta, Oncel Tuzel, Vaishaal Shankar, & Fartash Faghri. (2024). TiC-CLIP: Continual Training of CLIP Models.