Foundation models for neural data are large-scale, pre-trained models designed to learn general representations from brain recordings — spanning EEG, ECoG, fMRI, and intracortical spike trains — that can then be fine-tuned or adapted for specific downstream tasks. Borrowing architectures and training paradigms from natural language processing and computer vision, these models apply masked autoencoding, contrastive learning, and transformer-based sequence modelling to neural time series. The central promise is that a single pre-trained model, exposed to data from many subjects and sessions, can capture shared structure in neural dynamics that transfers across individuals, recording modalities, and experimental paradigms.

The emergence of scaling laws for neural data — demonstrated by models such as BrainLM — marked a turning point for the field in 2023. These results showed that decoder performance improves predictably with increasing dataset size and model capacity, mirroring trends seen in large language models. Practically, foundation models address one of the most persistent challenges in brain-computer interfaces: the need for lengthy per-user calibration. By pre-training on pooled multi-session, multi-participant datasets and then fine-tuning with minimal individual data, these approaches can dramatically reduce calibration time while maintaining or improving decoding accuracy. Universal decoders trained in this way have shown positive transfer across recording configurations, participants, and even species.

Foundation models also open new possibilities for neuroscience discovery beyond BCI engineering. Zero-shot inference of brain states, cross-modal alignment between neural and behavioural data, and unsupervised identification of latent neural structure are all active research directions. However, significant challenges remain, including the heterogeneity of neural recording formats, the relatively small scale of neuroscience datasets compared to text or image corpora, and the need for careful validation that learned representations capture genuine neural phenomena rather than recording artefacts.