Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation

University of Ljubljana, Faculty of Computer and Information Science
CVPR 2026
Code (coming soon) arXiv
Brewing Stronger Features

DEO, our dual-teacher pretraining framework, combines a DINO-style architecture for learning rich multispectral representations with simultaneous distillation from DINOv3 to capture strong optical features. This approach delivers state-of-the-art performance on multispectral Earth observation tasks while staying competitive in optical-only settings.

Abstract

Foundation models are transforming Earth Observation (EO), yet the diversity of EO sensors and modalities makes a single universal model unrealistic. Multiple specialized EO foundation models (EOFMs) will likely coexist, making efficient knowledge transfer across modalities essential. Most existing EO pretraining relies on masked image modeling, which emphasizes local reconstruction but provides limited control over global semantic structure. To address this, we propose a dual-teacher contrastive distillation framework for multispectral imagery that aligns the student’s pretraining objective with the contrastive self-distillation paradigm of modern optical vision foundation models (VFMs). Our approach combines a multispectral teacher with an optical VFM teacher, enabling coherent cross-modal representation learning. Experiments across diverse optical and multispectral benchmarks show that our model adapts to multispectral data without compromising performance on optical-only inputs, achieving state-of-the-art results in both settings, with an average improvement of 3.64 percentage points in semantic segmentation, 1.2 in change detection, and 1.31 in classification tasks. This demonstrates that contrastive distillation provides a principled and efficient approach to scalable representation learning across heterogeneous EO data sources.

Contributions

  • Dual-teacher pretraining strategy We introduce a dual-teacher pretraining strategy that unifies a contrastive self-distillation multispectral teacher with distillation from an optical teacher, combining global representation learning with transfer of semantic priors.
  • Effective feature alignment We demonstrate that matching the student's pretraining objective with that of a VFM teacher (e.g., DINOv3) enables a more effective and data-efficient transfer of optical priors to a multispectral student.

Dual-Teacher Pretraining

Brewing Stronger Features

Our pretraining dataset utilizes the fMoW-Sentinel dataset, augmented with high-resolution aerial images from fMoW-RGB where possible. We perform random crops and other augmentation operations on input images akin to DINO. We then pass the multispectral and optical images to either the multispectral teacher (red), optical teacher (blue), or student (green). The multispectral branch is a contrastive learning setup where the teacher is updated using EMA. In the optical branch, distillation is done using a frozen VFM teacher, e.g. DINOv3. The resulting model can then be used in various downstream tasks.

Effective Feature Alignment

Brewing Stronger Features

Our model aligns features between the VFM teacher and multispectral teacher more effectively compared to competing methods. Here, PCA feature are visualized and compared between Copernicus-FM, DINOv3-LS, and DEO (ours). We note the similarity of our method's features to those of DINOv3. Additionaly, we provide a quantitative Central Kernel Alignment analysis. On optical inputs, DEO aligns more closely with DINOv3 than Cop.-FM (0.65 vs. 0.49), supporting our claim that objective-level compatibility with contrastive self-distillation enables effective optical knowledge transfer. On MS inputs, alignment with DINOv3 is lower (0.15 vs. 0.50), reflecting the intended integration of additional spectral information beyond RGB.

Strong Performance

DEO achieves state-of-the-art performance across a range of optical and multispectral benchmarks, notably on semantic segmentation tasks, with an average improvement of 3.64 percentage points over the best competing method. It further achieves an improvement of 1.21 points on change detection, and 1.31 points on classification tasks. It also achieves competitive results in low-data environments. Finally, we present the overall rank of methods on evaluated datasets, where DEO again places first.

Qualitative Results

BibTeX

@article{wolf2026brewing,
  title={Brewing Stronger Features: Dual-Teacher Distillation for Multispectral Earth Observation},
  author={Wolf, Filip and Rolih, Blaž and Čehovin Zajc, Luka},
  journal={arXiv preprint arXiv:2602.19863},
  year={2026},
  url={https://arxiv.org/abs/2602.19863}
}