1.PAC-tuning:Effective-tuning Pretrained Language Fashions with PAC-driven Perturbed Gradient Descent
Authors: Guangliang Liu, Zhiyu Xue, Xitong Zhang, Kristen Marie Johnson, Rongrong Wang
Summary: Effective-tuning pretrained language fashions (PLMs) for downstream duties is a large-scale optimization downside, by which the selection of the coaching algorithm critically determines how effectively the skilled mannequin can generalize to unseen check information, particularly within the context of few-shot studying. To realize good generalization efficiency and keep away from overfitting, strategies resembling information augmentation and pruning are sometimes utilized. Nevertheless, including these regularizations necessitates heavy tuning of the hyperparameters of optimization algorithms, resembling the favored Adam optimizer. On this paper, we suggest a two-stage fine-tuning methodology, PAC-tuning, to deal with this optimization problem. First, based mostly on PAC-Bayes coaching, PAC-tuning instantly minimizes the PAC-Bayes generalization certain to be taught correct parameter distribution. Second, PAC-tuning modifies the gradient by injecting noise with the variance realized within the first stage into the mannequin parameters throughout coaching, leading to a variant of perturbed gradient descent (PGD). Prior to now, the few-shot situation posed difficulties for PAC-Bayes coaching as a result of the PAC-Bayes certain, when utilized to giant fashions with restricted coaching information, may not be stringent. Our experimental outcomes throughout 5 GLUE benchmark duties reveal that PAC-tuning efficiently handles the challenges of fine-tuning duties and outperforms robust baseline strategies by a visual margin, additional confirming the potential to use PAC coaching for another settings the place the Adam optimizer is at present used for coaching