Github Link: https://github.com/desik1998/MathWithLLMs

Although LLMs are able to do a lot of tasks such as Coding, science etc, they often fail in doing Math tasks without a calculator (including the State of the Art Models).

Our intuition behind why models cannot do Math is because the instructions on the internet are something like a x b = c and do not follow the procedure which we humans follow when doing Math. For example when asked any human how to do 123 x 45, we follow the digit wise multiplication technique using carry, get results for each digit multiplication and then add the corresponding resulting numbers. But on the internet, we don’t show the procedure to do Math and instead just right the correct value. And now given LLMs are given a x b = c, they’ve to reverse engineer the algorithm for multiplication.

Most of the existing Literature gives instructions to the LLM instead of showing the procedure and we think this might not be the best approach to teach LLM.

What this project does?

This project aims to prove that LLMs can learn Math when trained on a step-by-step procedural way similar to how humans do it. It also breaks the notion that LLMs cannot do Math without using calculators. For now to illustrate this, this project showcases how LLMs can learn multiplication. The rationale behind taking multiplication is that GPT-4 cannot do multiplication for >3 digit numbers. We prove that LLMs can do Math when taught using a step-by-step procedure. For example, instead of teaching LLMs multiplication like 23 * 34 = 782, we teach it multiplication similar to how we do digit-wise multiplication, get values for each digit multiplication and further add the resulting numbers to get the final result.

Instruction Tuning: We’ve further done finetuning on OpenAI’s GPT-3.5 to teach Math.

There are close to 1300 multiplication instructions created for training and 200 for validation. The test cases were generated keeping in mind the OpenAI GPT-3.5 4096 token limit. A 5 x 5 digit multiplication can in general fit within 4096 limit but 6 x 6 cannot fit. But if one number is 6 digit, the other can be <= 4 digit and similarly if 1 number is 7 digit then the other can be <= 3 digit.

Also instead of giving * for multiplication and + for addition, different operators’ <<*>> and <<<+>>> are given. The rationale behind this is, using the existing * and + for multiplication and addition might tap on the existing weights of the neural network which doesn’t follow step-by-step instruction and directly give the result for multiplication in one single step.

Sample Instruction

The overall training/validation loss goes to 0 within 0.1 epochs

Results

The benchmarking was done on 200 test cases where each test case has two random numbers generated. For the 200 samples which were tested, excluding for 3 cases, the rest of the cases the multiplication is correct. Which means this overall accuracy is 98.5%. (We’re also looking for feedback from community about how to test this better.)

Future Improvements

  • Reach out to AI and open-source community to make this proposal better or identify any flaws.
  • Do the same process of finetuning using open-source LLMs.
  • Figure out what’s the smallest LLM that can do Math accurately when trained in a procedural manner (A 10 year kid can do Math). Check this for both normal models and distilled models as well.

Requesting for Feedback from AI Community!

  • undefdev@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    I don’t understand the motivation behind this.

    Fine, you’ve ran an experiment out of curiosity and you got the result, but why would you want to finetune more language models on this?

    It’s not like we need models that are almost as good at things computers are excellent at, while using orders of magnitude more resources.

    It would be way more useful to train tiny models to predict when a calculator should be used.

    • wishtrepreneur@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      It’s not like we need models that are almost as good at things computers are excellent at, while using orders of magnitude more resources.

      one of the arguments for learning math (calculus, linear algebra, etc.) in school is because it supposedly helps you with critical thinking, logic reasoning, etc.

      If this can be tested in LLMs then it gives weight to that proposal, because let’s face it, 99% of the population don’t use anything more complicated than exponential equations in their every day lives.

      • undefdev@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        Calculus, linear algebra and mathematics in general is a good idea. Arithmetics is probably not. To me that’s like training LLMs to count up to high numbers correctly. I’m arguing that instead of reading a book on “the first 10^12 natural numbers” one should read a book on linear algebra.

    • Desik_1998@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      1 year ago

      I don’t understand the motivation behind this. It’s not like we need models that are almost as good at things computers are excellent at, while using orders of magnitude more resources. It would be way more useful to train tiny models to predict when a calculator should be used

      I’m 100% on the fact that we should use calculators directly instead of LLMs for Math. I mean we humans also use calculators. But the thing is, many claim LLMs cannot do Math etc. This project is just to prove them wrong that LLMs when learnt the process especially in case of Math can do it. I mean if we consider the LLM to be the brain, it should ideally be able to master Math similar to how human brain does right

      Fine, you’ve ran an experiment out of curiosity and you got the result, but why would you want to finetune more language models on this?

      I mentioned this in Github Repo but missed out in the Reddit post. But here is the rationale: When initially tested using incontext learning for Gpt4, the model was able to follow the procedure showed in the 1shot prompting but it was sometimes failing with the overall result due to OpenAI’s addition technique. And given this is a procedural technique, doing a n-shot prompting would lead to even more tokens. Given these limitations, it was evident that finetuning would solve these issue. And also the model was able to quickly learn the procedure and the overall training and validation loss of model went close to 0 within 0.1 epochs

      • BackwardsPuzzleBox@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        I mean if we consider the LLM to be the brain, it should ideally be able to master Math similar to how human brain does right

        Except it’s not a brain. It’s, at best, a tiny portion of a brain’s processing, hence the move toward multi-modal models.

        It’s a bit like trying to do poetry with your visual cortex, or math using your autonomic system. I mean, more strength to you, but while you can use a hammer on a screw, it doesn’t mean you should.

        • Desik_1998@alien.topOPB
          link
          fedilink
          English
          arrow-up
          1
          ·
          1 year ago

          Yes agree Multimodal is the go to way but the current LLMs which are text based are also building a world view