An Analysis of Embedding Layers and Similarity Scores using Siamese Neural Networks
Authors: Yash Bingi, Yiqiao Yin
Abstract: Huge Lanugage Fashions (LLMs) are gaining rising recognition in a variety of use circumstances, from language understanding and writing to assist in utility enchancment. One of many very important crucial components for optimum funcionality of LLMs is embedding layers. Phrase embeddings are distributed representations of phrases in a gradual vector space. Inside the context of LLMs, phrases or tokens from the enter textual content material are reworked into high-dimensional vectors using distinctive algorithms specific to the model. Our evaluation examines the embedding algorithms from most important corporations throughout the commerce, similar to OpenAI, Google’s PaLM, and BERT. Using medical data, we have got analyzed similarity scores of each embedding layer, observing variations in effectivity amongst each algorithm. To strengthen each model and provide an extra encoding layer, we moreover carried out Siamese Neural Networks. After observing modifications in effectivity with the addition of the model, we measured the carbon footage per epoch of teaching. The carbon footprint associated to large language fashions (LLMs) is a significant concern, and should be thought of when deciding on algorithms for a variety of use circumstances. Whole, our evaluation in distinction the accuracy completely totally different, most important embedding algorithms and their carbon footage, allowing for a holistic overview of each embedding algorithm.