Find Text Similarity Using TensorFlow.js | by Kevin Hermawan | Apr, 2024

Discovering related items of textual content is essential for purposes similar to search engines like google, chatbots, and suggestion techniques. It helps present customers with extra related data. On this article, we’ll learn to use TensorFlow.js and the Universal Sentence Encoder model to search out the similarity between totally different texts.

TensorFlow.js is a JavaScript library that permits the coaching and deployment of machine studying fashions within the browser or on the server facet utilizing Node.js.

The Common Sentence Encoder (Cer et al., 2018) is a mannequin designed to encode textual content into 512-dimensional embeddings. These embeddings can be utilized in varied pure language processing duties, together with sentiment classification and textual similarity evaluation.

First issues first, we have to set up the mandatory TensorFlow.js packages. The set up course of varies relying in your setting:

Node.js

For server-side purposes utilizing Node.js:

npm set up @tensorflow/tfjs-node @tensorflow-models/universal-sentence-encoder

Internet Browser

To be used instantly in net browsers:

npm set up @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

GPU Acceleration

For optimized efficiency with GPU acceleration:

npm set up @tensorflow/tfjs-gpu @tensorflow-models/universal-sentence-encoder

To seek out textual content similarity, we have to arrange an setting, load the mandatory fashions, and outline the computation features. Right here’s how we will do it step-by-step in additional element:

Import Libraries

First, we have to configure TensorFlow.js based mostly on the environment. We import the required libraries utilizing the next code:

// Modify based mostly on setting: tfjs-node for Node.js
import * as tf from "@tensorflow/tfjs-node";
import use from "@tensorflow-models/universal-sentence-encoder";

Setup the Similarity Calculation

Subsequent, we outline the primary operate calculateSimilarity that may deal with the textual content similarity computation.

async operate calculateSimilarity(inputText: string, compareTexts: string[]) {
// Load the mannequin and generate embeddings
const mannequin = await use.load();
const embeddings = await mannequin.embed([inputText, ...compareTexts]);
const baseEmbedding = embeddings.slice([0, 0], [1]);const outcomes = [];
// Evaluate every textual content's embedding with the enter textual content's embedding
for (let i = 0; i < compareTexts.size; i++) {
const compareEmbedding = embeddings.slice([i + 1, 0], [1]);
const similarity = cosineSimilarity(baseEmbedding, compareEmbedding);
const similarityScore = similarity.dataSync()[0].toFixed(4);
outcomes.push({
"Enter Textual content": inputText,
"Comparability Textual content": compareTexts[i],
"Similarity Rating": similarityScore,
});
}
// Type outcomes by similarity rating in descending order
return outcomes.kind((a, b) => parseFloat(b["Similarity Score"]) - parseFloat(a["Similarity Score"]));
}

Cosine Similarity Perform

The cosineSimilarity operate is a operate that calculates the cosine similarity between two vectors (on this case, the textual content embeddings). Cosine similarity is a measure of how related two vectors are, based mostly on the cosine of the angle between them.

operate cosineSimilarity(a: tf.Tensor, b: tf.Tensor) {
const normalizedA = a.div(tf.norm(a, 'euclidean'));
const normalizedB = b.div(tf.norm(b, 'euclidean'));return tf.sum(tf.mul(normalizedA, normalizedB));
}

Contained in the operate above, we first normalize the enter vectors a and b utilizing the div and norm strategies from TensorFlow.js. Normalization ensures that the vectors have a size of 1, which is important for calculating the cosine similarity accurately.

Check the Textual content Similarity Mannequin

Let’s check the performance utilizing a particular instance. We’ll examine the enter textual content “Safe know-how” towards a set of various comparability texts.

const inputText = "Safe know-how";const compareTexts = [
"Geometry's elegant shapes define the space around us.",
"Socratic questioning uncovers truth beneath societal norms.",
"Blockchain technology revolutionizes security in digital transactions.",
"Calculus captures the essence of change through derivatives and integrals.",
"Utilitarian ethics seek the greatest good for the greatest number."
];
calculateSimilarity(inputText, compareTexts).then((outcomes) => {
console.desk(outcomes);
});

The operate output presents the similarity scores as follows:

┌───┬───────────────────┬────────────────────────────────────────────────────────────────────────────┬──────────────────┐
│   │ Enter Textual content        │ Comparability Textual content                                                            │ Similarity Rating │
├───┼───────────────────┼────────────────────────────────────────────────────────────────────────────┼──────────────────┤
│ 0 │ Safe know-how │ Blockchain know-how revolutionizes safety in digital transactions.     │ 0.5221           │
│ 1 │ Safe know-how │ Socratic questioning uncovers reality beneath societal norms.                │ 0.3258           │
│ 2 │ Safe know-how │ Calculus captures the essence of change via derivatives and integrals. │ 0.2328           │
│ 3 │ Safe know-how │ Utilitarian ethics search the best good for the best quantity.         │ 0.2156           │
│ 4 │ Safe know-how │ Geometry's elegant shapes outline the house round us.                      │ 0.1840           │
└───┴───────────────────┴────────────────────────────────────────────────────────────────────────────┴──────────────────┘

Based mostly on the output, the textual content “Blockchain know-how revolutionizes safety in digital transactions.” has the very best similarity rating of 0.5221 with the enter textual content “Safe know-how”, whereas the textual content “Geometry’s elegant shapes outline the house round us.” has the bottom similarity rating of 0.1840.

On this article, we discovered find out how to make the most of TensorFlow.js and the Common Sentence Encoder mannequin to successfully calculate textual content similarity. The step-by-step information lined organising the setting, importing required libraries, and defining core features for similarity computation utilizing cosine similarity on textual content embeddings. Whereas highly effective, it’s essential to acknowledge the potential limitations of machine studying fashions, particularly in advanced eventualities.

Source link

Find Text Similarity Using TensorFlow.js | by Kevin Hermawan | Apr, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Denodo Platform 9.1 Brings New Advanced AI Capabilities and Enhanced Data Lakehouse Performance

Harnessing AI in Agriculture – insideAI News

How Big Data Is Transforming Patient Care Delivery

How to Assist Human Agents & Transform Customer Experience with Conversational AI?

Salesforce Introduces Agentforce Testing Center: AI Agent Lifecycle Management Tooling for Testing Autonomous AI Agents at Scale

Our Picks

K-Means | by Jhoan Sebastián Fuentes Hernández | May, 2024

Top Data Science Tools and Technologies You Must Know In 2025

75 Days of Machine Learning: Day 12- EDA using Univariate Analysis | by Alok Shukla | Jun, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Find Text Similarity Using TensorFlow.js | by Kevin Hermawan | Apr, 2024

Node.js

Internet Browser

GPU Acceleration

Import Libraries

Setup the Similarity Calculation

Cosine Similarity Perform

Check the Textual content Similarity Mannequin

Related Posts