GK Question

technology medium mcq

Which technique enables LLMs to learn from human feedback to align with preferences?

  1. Supervised Fine-Tuning
  2. RLHF
  3. Prompt Engineering
  4. All of these

Answer: RLHF

Reinforcement Learning from Human Feedback (RLHF) trains reward models from human rankings, then optimizes LLM to maximize rewards. Critical for aligning AI with human values and safety.

Topic Advanced AI/ML
Exam Relevance UPSC, Banking, SSC