• 0 Posts
  • 1 Comment
Joined 1 year ago
cake
Cake day: October 28th, 2023

help-circle
  • semi-automatic LLM compiler from high languages or even from pseudo-code or human language.

    Mmm, reminds me of a short story I read recently with exactly this but with annoying censorship alignment and no ability to reset the state that makes it not so helpful. Hopefully such a compiler will not be written like that.

    - can we convert an assembly block from ARM to RISC? and vice versa?

    As both ARM and RISC-V are RISC architectures, and since it is not that slow to emulate RISC architectures (like ARM) on CISC architectures (like amd64) but is substantially slower to emulate CISC architectures on RISC architectures, I think a better example would be converting from amd64 to arm64.

    - it can rewrite many of the software we have today just based on the disassembled binaries to squeeze more out of HW

    Imagine an LLM that can natively understand and edit assembly (with each instruction and byte of data being its own token, perhaps?) that can, just, rewrite an entire binary to do whatever you want and which can effortlessly translate from one assembly language to another or even translate the entire thing to fully functional, well-organised, commented code in your higher-level language of choice! Train on optimised vs non-optimised assembly (and other code) so it is good at that as well, and then refine it directly on its own results, of what is the fastest while still getting the correct output and not being buggy, to take that even further. Such a program would be insanely capable. GPTs are already insanely good at writing in and translating different human languages so I think for machine languages they could also do quite well.

    Given potentially less training needed than an entirely general purpose LLM especially for a simple proof of concept, I wonder how hard it would be to make an open source program that does this. Since we already have programs (compilers) that convert from code to assembly, one could even generate a huge amount of synthetic data relatively easily for a proof of concept of this small subset of tasks, one that just acts as a compiler only. It could then serve as an experiment for making higher-quality output than the original input data, by training and evaluating it on whatever instructions consistently get the correct output most quickly. Making it adversarial with another AI trying to induce bugs would probably be useful here in ensuring the faster output is not buggy in some way.

    I think it could be perfectly feasible to make this not as a big organisation. Maybe it could even edit its own inference code to go even faster. And if it could somehow be smart enough to understand high-level software and machine learning architectures and maybe even Hardware Description Language… Maybe even help enable an intelligence explosion?

    I think at least an LLM compiler might be feasible to make at a small scale and now I really want to try making one. Linking could be complex though, and probably some other things I haven’t thought of yet.