Investigating Large Vision Model Training Challenges on Satellite Datasets - INRIA - Institut National de Recherche en Informatique et en Automatique Accéder directement au contenu
Communication Dans Un Congrès Année : 2023

Investigating Large Vision Model Training Challenges on Satellite Datasets

Résumé

Contrastive learning methods that bridge textual descriptions and images, such as Contrastive Language-Image Pre-training (CLIP), have demonstrated remarkable advancements. These foundational models have shown exceptional performance in tasks related to zero-shot image classification, as evidenced by their substantial enhancement of zero-shot ImageNet accuracy from the prior state-of-the-art of 12\% to an impressive 76\%. However, the exposure of these models to satellite images during training has been limited, resulting in suboptimal performance when dealing with geospatial data. This limitation raises a pivotal question: Can these foundational models, which have demonstrated potential across multiple domains, be trained on geospatial imagery out-of-box? To answer this question, we perform a study on training CLIP on diverse geospatial datasets. Within our research, we delve into unique challenges in this context and discuss the strategies we employ to address these challenges effectively. We demonstrate that handling resolution is crucial when training CLIP like models on a large multi-resolution dataset.
Fichier principal
Vignette du fichier
Investigating_LVM_on_GeoSpatial_Data.pdf (7.86 Mo) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-04231035 , version 1 (06-10-2023)

Identifiants

  • HAL Id : hal-04231035 , version 1

Citer

Hitesh Jain, Sagar Verma, Siddharth Gupta. Investigating Large Vision Model Training Challenges on Satellite Datasets. InGARSS 2023 - India Geoscience and Remote Sensing Symposium, IEEE, Dec 2023, Bengaluru, India. ⟨hal-04231035⟩
125 Consultations
24 Téléchargements

Partager

Gmail Facebook X LinkedIn More