Siddartha Pullakhandamsiddartha10.hashnode.dev·Sep 20, 2024Getting Started with Reinforcement Learning with Human FeedbackReinforcement Learning with Human Feedback(RLHF) is a technique combined with Reinforcement learning and human feedback to better align the LLMs with human preferences. This blog covers most of the concepts that i learnt. Before diving deep, let's un...Discuss·20 likes·34 readsPolicy gradient