this dope, had me laughing reading this, but one thing though, the reason why one epoch is enough it's because of computing power, we u have large data sets it's only reasonable to train your llm on half epoch or 1.2 epoch, to save computing power, and it's the standard of training llms
John Raphael
Machine learning
This is funny and at the same time knowledgeable, it’s nice to always try things out when building models , you never can tell what the best epoch or parameter is