Zyphra debuts Zyda, a 1.3T language modeling dataset it claims outperforms Pile, C4, arxiv
Zyphra's Zyda is a 1.3T open dataset combining RefinedWeb, Starcoder, C4, Pile, Slimpajama, pe2so, and arxiv to help train large language models. Read More