this dope, had me laughing reading this, but one thing though, the reason why one epoch is enough it's because of computing power, we u have large data sets it's only reasonable to train your llm on half epoch or 1.2 epoch, to save computing power, and it's the standard of training llms