If LLMs can be taught to write assembly (or LLVM) very efficiently, what would it take to create a full or semi-automatic LLM compiler from high languages or even from pseudo-code or human language.
The advantages could be monumental:
- arguably much more efficient utilization of resources on every compile target
- compilation is flexible and not rule based. an LLM won’t complain over a missing “;” as it can “understand” the intent
- it can rewrite many of the software we have today just based on the disassembled binaries to squeeze more out of HW
- can we convert an assembly block from ARM to RISC? and vice versa?
- potentially, iterative compilation (ala open interprator) can also understand the runtime issues and exceptions to have a “live” assembly code that changes as issues arise
>> Any projects exploring this?
>> I feel it is an issue of dimensionality (ie “context” size), very similar to having a latent space for entire repos. Do you agree?
That’s a big if, not compared to human written but compared to optimized code.
That is an interesing angle, if you could build in concerns that aren’t currently taken into consideration
I think that’s a separate issue, and is closer to code completion than compilation. I don’t know why there aren’t automatic linters for the specific problem you mentioned.
You could probably get the behaviour you want from fine-tuning/RAG on a specific codebase. It will still require large context size.