Find Text Similarity Using TensorFlow.js | by Kevin Hermawan | Apr, 2024

Discovering associated objects of textual content material is important for functions much like serps like google, chatbots, and suggestion methods. It helps current clients with additional associated knowledge. On this text, we’ll study to make use of TensorFlow.js and the Universal Sentence Encoder model to look out the similarity between completely totally different texts.

TensorFlow.js is a JavaScript library that allows the teaching and deployment of machine learning fashions inside the browser or on the server aspect using Node.js.

The Frequent Sentence Encoder (Cer et al., 2018) is a model designed to encode textual content material into 512-dimensional embeddings. These embeddings might be utilized in assorted pure language processing duties, along with sentiment classification and textual similarity analysis.

First points first, we’ve to arrange the necessary TensorFlow.js packages. The arrange course of varies relying in your setting:

Node.js

For server-side functions using Node.js:

npm arrange @tensorflow/tfjs-node @tensorflow-models/universal-sentence-encoder

Web Browser

For use immediately in internet browsers:

npm arrange @tensorflow/tfjs @tensorflow-models/universal-sentence-encoder

GPU Acceleration

For optimized effectivity with GPU acceleration:

npm arrange @tensorflow/tfjs-gpu @tensorflow-models/universal-sentence-encoder

To hunt out textual content material similarity, we’ve to rearrange an setting, load the necessary fashions, and description the computation options. Proper right here’s how we are going to do it step-by-step in extra component:

Import Libraries

First, we’ve to configure TensorFlow.js based mostly totally on the surroundings. We import the required libraries using the following code:

// Modify based mostly totally on setting: tfjs-node for Node.js
import * as tf from "@tensorflow/tfjs-node";
import use from "@tensorflow-models/universal-sentence-encoder";

Setup the Similarity Calculation

Subsequent, we define the first function calculateSimilarity which will take care of the textual content material similarity computation.

async function calculateSimilarity(inputText: string, compareTexts: string[]) {
// Load the model and generate embeddings
const model = await use.load();
const embeddings = await model.embed([inputText, ...compareTexts]);
const baseEmbedding = embeddings.slice([0, 0], [1]);const outcomes = [];
// Consider each textual content material's embedding with the enter textual content material's embedding
for (let i = 0; i < compareTexts.measurement; i++) {
const compareEmbedding = embeddings.slice([i + 1, 0], [1]);
const similarity = cosineSimilarity(baseEmbedding, compareEmbedding);
const similarityScore = similarity.dataSync()[0].toFixed(4);
outcomes.push({
"Enter Textual content material": inputText,
"Comparability Textual content material": compareTexts[i],
"Similarity Ranking": similarityScore,
});
}
// Kind outcomes by similarity ranking in descending order
return outcomes.type((a, b) => parseFloat(b["Similarity Score"]) - parseFloat(a["Similarity Score"]));
}

Cosine Similarity Carry out

The cosineSimilarity function is a function that calculates the cosine similarity between two vectors (on this case, the textual content material embeddings). Cosine similarity is a measure of how associated two vectors are, based mostly totally on the cosine of the angle between them.

function cosineSimilarity(a: tf.Tensor, b: tf.Tensor) {
const normalizedA = a.div(tf.norm(a, 'euclidean'));
const normalizedB = b.div(tf.norm(b, 'euclidean'));return tf.sum(tf.mul(normalizedA, normalizedB));
}

Contained within the function above, we first normalize the enter vectors a and b using the div and norm methods from TensorFlow.js. Normalization ensures that the vectors have a measurement of 1, which is necessary for calculating the cosine similarity precisely.

Examine the Textual content material Similarity Model

Let’s test the efficiency using a specific occasion. We’ll look at the enter textual content material “Secure know-how” in direction of a set of assorted comparability texts.

const inputText = "Secure know-how";const compareTexts = [
"Geometry's elegant shapes define the space around us.",
"Socratic questioning uncovers truth beneath societal norms.",
"Blockchain technology revolutionizes security in digital transactions.",
"Calculus captures the essence of change through derivatives and integrals.",
"Utilitarian ethics seek the greatest good for the greatest number."
];
calculateSimilarity(inputText, compareTexts).then((outcomes) => {
console.desk(outcomes);
});

The function output presents the similarity scores as follows:

┌───┬───────────────────┬────────────────────────────────────────────────────────────────────────────┬──────────────────┐
│   │ Enter Textual content material        │ Comparability Textual content material                                                            │ Similarity Ranking │
├───┼───────────────────┼────────────────────────────────────────────────────────────────────────────┼──────────────────┤
│ 0 │ Secure know-how │ Blockchain know-how revolutionizes security in digital transactions.     │ 0.5221           │
│ 1 │ Secure know-how │ Socratic questioning uncovers actuality beneath societal norms.                │ 0.3258           │
│ 2 │ Secure know-how │ Calculus captures the essence of change by way of derivatives and integrals. │ 0.2328           │
│ 3 │ Secure know-how │ Utilitarian ethics search one of the best good for one of the best amount.         │ 0.2156           │
│ 4 │ Secure know-how │ Geometry's elegant shapes define the home spherical us.                      │ 0.1840           │
└───┴───────────────────┴────────────────────────────────────────────────────────────────────────────┴──────────────────┘

Primarily based totally on the output, the textual content material “Blockchain know-how revolutionizes security in digital transactions.” has the easiest similarity ranking of 0.5221 with the enter textual content material “Secure know-how”, whereas the textual content material “Geometry’s elegant shapes define the home spherical us.” has the underside similarity ranking of 0.1840.

On this text, we found learn how to take advantage of TensorFlow.js and the Frequent Sentence Encoder model to efficiently calculate textual content material similarity. The step-by-step data lined organising the setting, importing required libraries, and defining core options for similarity computation using cosine similarity on textual content material embeddings. Whereas extremely efficient, it’s important to acknowledge the potential limitations of machine learning fashions, notably in superior eventualities.

Source link

Find Text Similarity Using TensorFlow.js | by Kevin Hermawan | Apr, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

The Role of AI in the Future of Insurance Software

The Evolution of Artificial Intelligence in Healthcare Technology

How AI-Powered Personalization is Transforming the Future of Customer Engagement

Data Annotation Trends for 2o25

Nvidia at CES: Omniverse Blueprint for Industry, Generative Physical AI, Access to Blackwells, Cosmos Model for Physical AI

Our Picks

Innovational Brain Language Model (BrainLM) Uses Generative AI to Unravel Brain Behaviour and Neurological Diseases | by Homera Hassan | May, 2024

Team Led by UMass Amherst Debunks Research Showing Facebook’s News-Feed Algorithm Curbs Election Misinformation

How AlexNet Architecture Revolutionized Deep Learning | by ParavisionLab | Jul, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Find Text Similarity Using TensorFlow.js | by Kevin Hermawan | Apr, 2024

Node.js

Web Browser

GPU Acceleration

Import Libraries

Setup the Similarity Calculation

Cosine Similarity Carry out

Examine the Textual content material Similarity Model

Related Posts