technology hard Fill in the Blank

The technique that enables LLMs to reason about visual inputs by combining vision and language models is called ________.

  1. SWAYAM / NPTEL
  2. e-Sign / Aadhaar e-KYC
  3. dimensions
  4. Multimodal Learning

Answer: Multimodal Learning

Multimodal models (CLIP, LLaVA) process text, images, audio jointly, enabling visual question answering, image captioning, and cross-modal retrieval. Critical for next-gen AI applications.

Topic Advanced AI/ML
Exam Relevance UPSC, Banking, SSC