Wonder what’s the difference between RLHF method with finetuning method? Fine-Tuning is actually the first step of a RL
2023-06-17