Knowledge Infusion Scaling Law for Pre-Training Large Language Models

(arxiv.org)

26 points | by PaulHoule 3 hours ago ago

2 comments

adsharma an hour ago ago
I wish the authors calculated a plot of model size (number of params) vs number of triples it can hold before the memory collapse happens.
It's hard to map the frequency of knowledge injection to a real world understanding of "how much knowledge" can a 4B param model hold?
gdiamos 2 hours ago ago
I wonder if this depends on what is inside the domain specific data.
I’m happy to see ML papers on hacker news.