Grokking Modular Polynomials
Authors: Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov
Summary: Neural networks readily study a subset of the modular arithmetic duties, whereas failing to generalize on the remainder. This limitation stays unmoved by the selection of structure and coaching methods. Then again, an analytical resolution for the weights of Multi-layer Perceptron (MLP) networks that generalize on the modular addition process is thought within the literature. On this work, we (i) prolong the category of analytical options to incorporate modular multiplication in addition to modular addition with many phrases. Moreover, we present that actual networks educated on these datasets study related options upon generalization (grokking). (ii) We mix these “professional” options to assemble networks that generalize on arbitrary modular polynomials. (iii) We hypothesize a classification of modular polynomials into learnable and non-learnable by way of neural networks coaching; and supply experimental proof supporting our claims.