[R] Positional description matters for transformer arithmetic

A new paper focuses on improving arithmetic skills in LLMs, primarily GPT-like models. There are three primary challenges faced by LLMs in arithmetic:

Complex Calculations: LLMs struggle with intricate arithmetic, particularly involving large numbers, leading to difficulties in internal intermediate steps.
Length Limitations: LLMs are limited to handling numbers within the range of their training data, restricting practicality.
Integration with Language: Merging arithmetic and natural language data encounters obstacles due to differences in surface formats, causing position-dependent representations that conflict.

To address these challenges, the article introduces techniques to enhance multiplication:

Padding: Number factors are padded to a fixed 15-digit length, ensuring uniformity and position invariance.
Reordering: The order of digits in the product is reversed to align with the natural progression of multiplication.

The outcomes are impressive. In testing, their approach achieves 99% accuracy in calculating products for numbers up to 12 digits. Simply asking GPT-4 to multiple two 4-digit numbers has an accuracy of less than 1%, by comparison.

To overcome length limitations, the paper explores data formats and positional encodings, including random spacing and alternative encodings. These innovations enable LLMs to generalize addition to handle additional digits.

The article also addresses the integration of arithmetic and language data by randomizing formats and using alternative positional encodings, enabling effective data integration.

TLDR: As the paper title says, positional description matters for transformer arithmetic.

Full summary is here. Paper is here.