Have any studies been done on the use of newer or less popular programming languages in the era of LLMs? I'd guess that the relatively low number of examples and the overall amount of code available publicly in a particular language means that LLM output is less likely to be good.
If the hypothesis is correct, it sets an incredibly high bar for starting a new programming language today. Not only does one need to develop compiler, runtime, libraries, and IDE support (which is a tall order by itself), but one must also provide enough data for LLMs to be trained on, or even provide a custom fine-tuned snapshot of one of the open models for the new language.
Research takes some time, both to do but also to publish. In my area (programming languages), we have 4 major conferences a year, each with like a 6-to-8-month lag-time between submission and publication, assuming the submission is accepted by a double-blind peer review process.
I don't work in this area (I have a very unfavorable view of LLMs broadly), but I have colleagues who are working on various aspects of what you ask about, e.g., developing testing frameworks to help ensure output is valid or having the LLMs generate easily-checkable tests for their own generated code, developing alternate means of constraining output (think of, like, a special kind of type system), using LLMs in a way similar to program synthesis, etc. If there is fruit to be borne from this, I would expect to start seeing more publications about it at high-profile venues in the next year or two (or next week, which is when ICFP and SPLASH and their colocated workshops will convene this year, but I haven't seen the publications list to know if there's anything LLM-related yet).
It's not only the amount of code but also the quality of the available code. If a language has a low barrier to entry (e.g. python, javascript), there will be a lot of beginner code. If a language has good static analysis and type checking, the available code is free of certain error classes (e.g. Rust, Scala, Haskell).
I see that difference in llm generated code when switching languages. Generated rust code has a much higher quality than python code for example.
I know it's a meme project, but still it's impressive. And cc is at the point where you can take the repo of that language, ask it to "make it support emoji variables", and 5$ later it works. So yeah ... pretty impressive that we're already there.
on the other hand, it opens up the opportunity to build a language that is extremely easy to use with LLMs. I suspect a lot of issues in LLM usage comes from the fact that coding languages are built for humans.
Have any studies been done on the use of newer or less popular programming languages in the era of LLMs? I'd guess that the relatively low number of examples and the overall amount of code available publicly in a particular language means that LLM output is less likely to be good.
If the hypothesis is correct, it sets an incredibly high bar for starting a new programming language today. Not only does one need to develop compiler, runtime, libraries, and IDE support (which is a tall order by itself), but one must also provide enough data for LLMs to be trained on, or even provide a custom fine-tuned snapshot of one of the open models for the new language.
Research takes some time, both to do but also to publish. In my area (programming languages), we have 4 major conferences a year, each with like a 6-to-8-month lag-time between submission and publication, assuming the submission is accepted by a double-blind peer review process.
I don't work in this area (I have a very unfavorable view of LLMs broadly), but I have colleagues who are working on various aspects of what you ask about, e.g., developing testing frameworks to help ensure output is valid or having the LLMs generate easily-checkable tests for their own generated code, developing alternate means of constraining output (think of, like, a special kind of type system), using LLMs in a way similar to program synthesis, etc. If there is fruit to be borne from this, I would expect to start seeing more publications about it at high-profile venues in the next year or two (or next week, which is when ICFP and SPLASH and their colocated workshops will convene this year, but I haven't seen the publications list to know if there's anything LLM-related yet).
It's not only the amount of code but also the quality of the available code. If a language has a low barrier to entry (e.g. python, javascript), there will be a lot of beginner code. If a language has good static analysis and type checking, the available code is free of certain error classes (e.g. Rust, Scala, Haskell).
I see that difference in llm generated code when switching languages. Generated rust code has a much higher quality than python code for example.
> Not only does one need to develop compiler, runtime, libraries, and IDE support (which is a tall order by itself)
CC can do that by itself in a loop, in ~3mo apparently. https://cursed-lang.org/
I know it's a meme project, but still it's impressive. And cc is at the point where you can take the repo of that language, ask it to "make it support emoji variables", and 5$ later it works. So yeah ... pretty impressive that we're already there.
on the other hand, it opens up the opportunity to build a language that is extremely easy to use with LLMs. I suspect a lot of issues in LLM usage comes from the fact that coding languages are built for humans.
See also Opalang or Ur/Web for very similar ideas, both released ~15 years ago.