Researchers warn of catastrophic overtraining in LLMs

The researchers compared two versions of OLMo-1b: one pre-trained on 2.3 trillion tokens and another on 3 trillion tokens.

Read More
Related Posts