Grokking Modular Polynomials
Authors: Darshil Doshi, Tianyu He, Aritra Das, Andrey Gromov
Abstract: Neural networks readily research a subset of the modular arithmetic duties, whereas failing to generalize on the rest. This limitation stays unmoved by the collection of construction and training strategies. Then once more, an analytical decision for the weights of Multi-layer Perceptron (MLP) networks that generalize on the modular addition course of is believed throughout the literature. On this work, we (i) delay the class of analytical choices to include modular multiplication along with modular addition with many phrases. Furthermore, we current that precise networks educated on these datasets research associated choices upon generalization (grokking). (ii) We combine these “skilled” choices to assemble networks that generalize on arbitrary modular polynomials. (iii) We hypothesize a classification of modular polynomials into learnable and non-learnable by the use of neural networks teaching; and provide experimental proof supporting our claims.