Depression prediction using Deep Learning (learn basics of NLP) | by M omid | May, 2024

on this paper you’ll be taught fundamentals of NLP
These are what you’ll be taught :

1 — What’s a tokenizer ?

2 — what’s texts_to_sequences ?

3 — What’s pad sequence ?

4 — What’s Embedding ?
5 — Make prediction mannequin ?

You should use this code for any binary NLP dataset comprises textual content knowledge

as everyone knows machine can simply perceive numbers so we want convert phrases to numbers

for instance if we need to perceive ‘hiya world’ to machine we must always convert it to numbers like this :
hiya represented by 0
world represented by 1

for doing this we use tokenizer

By tokenizer we convert phrases, subwords, characters to numbers and every phrase has transformed to a quantity is a token

in summery a tokenizer converts texts into tokens

To begin with we have to import dataset, yow will discover the dataset has used on this paper by means of the hyperlink under :

Let’s outline a variable for the dataset :

dataset = pd.read_csv("D:ITML projectPredict depressiondepression_dataset_reddit_cleaned.csv")

Now we have to outline two variables one for sentences and one other one for labels :

sentences = dataset['clean_text']
labels = dataset['is_depression']

For practice a mannequin we want a practice knowledge for coaching mannequin and a take a look at knowledge for testing and optimizing the mannequin.

so now we want separate the info into two half, practice and take a look at

the info comprises 7731 rows(pattern), we outline coaching knowledge from 0 to 6000, means all knowledge earlier than the pattern 6000 are for coaching and all knowledge after which might be for testing :

training_size = 6000training_sentences = sentences[0:training_size]
testing_sentences = sentences[training_size:]
training_labels = labels[0:training_size]
testing_labels = labels[training_size:]

let’s work with tokenizer

''' 
on this paper we work with tensorflow tokenizer 
'''from keras.preprocessing.textual content import Tokenizer #Import tokenizer 
vocab_size = 10000 #numbers of phrase that tokenizer count on
tokenizer = Tokenizer(num_words=vocab_size, oov_token='<OOV>', decrease=True)
tokenizer.fit_on_texts(training_sentences) #Convert phrases to numbers by the tokenizer
#word_index = tokenizer.word_index #present the numer(token) of every phrase 
# print(word_index)

oov_token=’<OOV>’ : this parameter helps to tokenizer to handel phrases which weren’t in vocabulary

decrease=True : changing all phrases to decrease case

By this methodology all of the numbers which characterize the phrases will grew to become in a sequence

let’s see an instance

sentence1 = 'canine is an effective animal'
sentence2 = 'my title is omid'tokenizer = Tokenizer(num_words=10, oov_token='<OOV>', decrease=True)
tokenizer.fit_on_texts([sentence1, sentence2])
word_index = tokenizer.word_index
print(word_index)
sequences = tokenizer.texts_to_sequences([sentence1, sentence2])
print(sequences)
'''
Out put :
{'<OOV>': 1, 'is': 2, 'canine': 3, 'a': 4, 'good': 5, 'animal': 6, 'my': 7, 'title': 8, 'omid': 9}
[[3, 2, 4, 5, 6], [7, 8, 2, 9]]
'''

because it clear, all sentences don’t have the identical size, for dealing with this we use Pad sequences
Think about we’ve got 2 sentences, one in all them has 3 phrases and the opposite one has 4 phrases, on this state of affairs pad sequence will make a 2*4 matrix, for the sentence which have 3 phrases the final or first matrix ingredient can be 0

let’s see it with an instance :

from keras.preprocessing.sequence import pad_sequencessequences = tokenizer.texts_to_sequences([sentence1, sentence2]) #just like the final code
sentences_padded = pad_sequences(sequences)
print(sentences_padded)
'''
output:
[[3 2 4 5 6]
[0 7 8 2 9]]
'''

for get extra details about it you may learn its doc

let’s return to our most important code ( despair prediction )

now that we all know what are texts_to_sequences and Pad sequences so lets course of the info by these

from keras.preprocessing.sequence import pad_sequencesmax_length = 100 #the max size of a sentence that tokenizer will settle for
training_sequences = tokenizer.texts_to_sequences(training_sentences)
training_padded = pad_sequences(training_sequences, maxlen=max_length)
testing_sequences = tokenizer.texts_to_sequences(testing_sentences)
testing_padded = pad_sequences(testing_sequences, maxlen=max_length)

Now the info is prepared and we will make mannequin however earlier than it, let’s see what’s Embedding

by embedding phrases convert to vectors. by doing this mannequin can perceive the connection between the phrases

as an illustration, think about phrases ‘good’ and ‘dangerous’ however what a few phrase like ‘not so dangerous’ this phrase assign to a detrimental feeling means dangerous, embedding helps the mannequin to grasp this ( relationship between phrases )

the mannequin educated with an embedding layer and after {that a} world common pooling1D, 24 dense( absolutely linked layer ) with relu activation operate and within the final layer 1 dens with sigmoid activation operate in 10 epochs

activation capabilities assist the mannequin to grasp the info higher

ReLU activation operate

relu is an activation operate that simply settle for the values bigger that 0 :

R(x) = max(0,x)

Sigmoid activation operate

We use sigmoid activation operate when labels of information are binary( 0 or 1 ), precisely just like the dataset we’re utilizing

if output of sigmoid activation operate (the final layer) is bigger than 0.5 it assigns to label 1 and if it’s decrease that 0.5 it assigns to 0 label

in summery :

output > 0.5 — — → 1

output < 0.5 — — -> 0

Code

from keras.fashions import Sequential
from keras.layers import Embedding, Dense, GlobalAveragePooling1Dembedding_dim = 16 #dimention of embedding layer 
mannequin = Sequential([
Embedding(vocab_size, output_dim=embedding_dim, input_length=max_length),
GlobalAveragePooling1D(),
Dense(24, activation='relu'),
Dense(1, activation='sigmoid')
])
mannequin.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])
num_epochs = 10
historical past = mannequin.match(training_padded, training_labels, epochs=num_epochs, validation_data=(testing_padded, testing_labels))

Plot

Let’s see progress of mannequin in 10 epochs

import matplotlib.pyplot as pltplt.plot(historical past.historical past['accuracy'])
plt.plot(historical past.historical past['loss'])
plt.xlabel('Epoch')
plt.ylabel('Accuracy')
plt.legend(['accuracy', 'loss'])

As you may see with every epoch, the mannequin has improvved, accuracy elevated, loss decreased

Testing the mannequin

now let’s take a look at the mannequin, don’t forget we have to do texts_to_sequence and pad_sequences on enter textual content

test_sentence = ['the life became so hard i can not take it any more i just wanna die ']
test_sentence = tokenizer.texts_to_sequences(test_sentence)
padded_test_sentence = pad_sequences(test_sentence, maxlen=max_length)
print(mannequin.predict(padded_test_sentence))'''
output :
[[0.6440944]]
'''

As you may see, clearly there may be unhappy emotions within the enter textual content ( test_sentence ) and the output of the mannequin is 0.64 which is bigger than 0.5 in order I discussed earlier than, it assigns to label 1 which suggests the despair is constructive

The code accessible on github by means of the hyperlink under :

thanks for studying, I hope you loved it

Source link

Depression prediction using Deep Learning (learn basics of NLP) | by M omid | May, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

Journey LLM 9 : Initialization Techniques | by Akshay Jain | Jun, 2024

Transformer Hands-On. Transformer Hands-On series from Python… | by My Skill | May, 2024

6 Best Purchase Order Software in 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Depression prediction using Deep Learning (learn basics of NLP) | by M omid | May, 2024

ReLU activation operate

Sigmoid activation operate

Code

Plot

Testing the mannequin

Related Posts