Our work on audio-visual GZSL using large multi-modal models was accepted at CVPR 2024 workshops (L3D-IVU).