This textual content focuses on the implementation of artificial intelligence in an software program geared towards facilitating analysis and rising particular person productiveness. Notably, it entails making a sentiment analysis model built-in into the making use of. Subsequently, the service permits sentiment classification in social media posts by gathering and categorizing suggestions as constructive or damaging. The generated outcomes are visualized for the particular person in a short time, providing clear solutions on whether or not or not their publish has been well-received or not.
The creation of the comment classification model is detailed in terms of the way it’s educated by a dataset using algorithms, showcasing its means to make predictions for data it has on no account seen sooner than, as a lot because the accuracy it achieves in classification.
Creating a web-based software program that enables companies which deal with social media to see particular person impressions on the posted content material materials of managed pages. What the making use of will do is classify the sentiment of each comment by means of machine finding out algorithms based mostly totally on quite a few choices extracted from the dataset and visualize these classifications to the particular person.
The model is a program that is educated by means of a dataset the place, after teaching, it is able to make predictions for diverse enter data it has on no account seen sooner than.
Creation of the ML model
- Retrieving the dataset.
positive_tweets = twitter_samples.strings('positive_tweets.json')
negative_tweets = twitter_samples.strings('negative_tweets.json')
The ‘twitter_samples’ corpus contains a set of tweets, constructive and damaging, with about 5500 tweets for each constructive and damaging data group.
2. Preprocess the Data.
Processing the datasets by means of the preprocess_comment() carry out, which removes non-word characters and pointless areas throughout the textual content material, and converts all of the textual content material to lowercase.
def preprocess_comment(comment):
comment = comment.lower()
comment = re.sub(r'W', ' ', comment) # Take away non-word characters
comment = re.sub(r's+', ' ', comment) # Take away extra areas
return comment
3. Data Vectorization
Data vectorization is accomplished using the TF-IDF (Time interval Frequency — Inverse Doc Frequency) technique. This represents a numerical statistic reflecting the importance of a phrase in a doc relative to a set of paperwork.
TF calculates the frequency of each phrase throughout the comment.
tf(t,d) = (number of explicit phrases throughout the doc)/(entire number of phrases in all of the doc)
IDF calculates the logarithm of the number of paperwork relative to the number of paperwork containing that phrase.
idf(t)=log((number of paperwork throughout the corpus)/(number of paperwork the place the phrase is present + 1))
IDF is required because of it gives a small weight to frequent phrases like “is”, “are”, in every other case they’d have a extremely large weight and would determine the outcomes by their utilization. So IDF measures the informality of the time interval, as as soon as we calculate IDF, it will probably be very low for frequent phrases (stop-words)
tf-idf(t,d) = tf(t,d) * idf(t)
vectorizer = TfidfVectorizer()
X_transformed = vectorizer.fit_transform(X)
After vectorization, we place this data in a sparse matrix , which solely retailers non-zero values and their positions, and is memory atmosphere pleasant as a result of it does not profit from space for 0 elements, the place in our case there’ll probably be quite a few these elements.
The illustration of how the sparse matrix will seek for our dataset is seen in beneath picture , the place the expression contained within the brackets represents the place and the alternative expression the tf-idf price, (row, column) price.
4. Teaching the model
The model teaching will probably be accomplished by means of the SVM (Help Vector Machine) machine finding out algorithm. SVM, is a linear model for classification and regression points. The idea of SVM is easy: the algorithm creates a line or a hyperplane that separates the data into classes.
SVM is an algorithm that takes data as enter and outputs a line that separates these classes if potential.
Types of SVM : Linear SVM (When the data are totally separable in a linear technique. Utterly separable linearly implies that the data elements is likely to be categorised into two classes using a single straight line) and Non-linear SVM (When the data normally will not be separable linearly, then we’ll use non-linear SVM, which means when the data elements cannot be divided into two classes using a straight line , then we use some superior strategies like kernel strategies to classify them).
How does SVM work?
SVM is printed in terms of help vectors solely. Because of this truth, SVM enjoys some pure velocity. In our case, we now have now a dataset that has two classes, constructive and damaging suggestions.
In our program, the linear SVM model is created using the sklearn.svm library. Through the match() approach, the model is educated on the provided dataset, and the linear hyperplane is positioned which represents a straight line throughout the attribute space.
classifier = SVC(kernel='linear')
classifier.match(X_transformed, y)
5. Utilizing the model for prediction.
Subsequent, the classify_sentiment() approach is known as , which accepts as parameter the enter suggestions we want to classify, the educated classifier, and the vectorization of the dataset.
def classify_sentiment(comment, classifier, vectorizer):
preprocessed_comment = preprocess_comment(comment)
comment_transformed = vectorizer.rework([preprocessed_comment])
sentiment = classifier.predict(comment_transformed)[0]
return 'Optimistic' if sentiment == 1 else 'Unfavorable'
Initially, the transformation course of is utilized to the accepted suggestions using the rework() approach which takes the enter data and applies it to the realized transformation.
Classifying the sentiment of suggestions is accomplished by means of the predict() approach, the place it makes use of the teaching throughout the data and makes the prediction. If a comment is constructive, it is labelled as constructive, and whether or not it’s damaging, it is labelled as damaging.
For example, we take a comment “I like this product! It exceeded my expectations.”
The making use of supplies suppliers by means of two predominant endpoints:
1. classify_sentiments
2. generate_xlsx
classify_sentiments
This endpoint accepts as enter an excel file containing the title of the commentator, the date of the comment, and the message (comment), after which makes use of the ML model created for the classification of these suggestions. It generates a model new excel file with a model new column of sentiments, which has values of each Optimistic or Unfavorable.
generate_xlsx
This endpoint is very utilized by the managers of the social media, which asks the particular person for the publish id, profile id and token for the potential of accessing that publish from the server side to acquire suggestions after which for categorised them . Comment retrieval is accomplished using the Graph API, the place parameters moreover embody the token generated with out price by the META developer internet web page. After retrieving data on the server side, an excel file is created. All through this request, one different request for classification is made to the sooner classify_sentiments endpoint, and the response is returned within the similar means as throughout the case of the sooner endpoint. Thus, the server side consists of the backend elements that present suppliers and the ML model that makes use of those backend elements to hold out suppliers.
Occasion of software program utilization
Fetching suggestions from a publish on the social group Fb, and classifying the suggestions of that publish.
On this case, the server fetches the information and makes a request to the Fb server, the place it requests to fetch suggestions from the specified publish, after which after receiving the suggestions, and it makes use of the ML model for his or her classification.
For a gaggle of examined suggestions, our model has achieved an accuracy of 84.21%.
import re
import nltk
import numpy as np
import pandas as pd
from sklearn.svm import SVC
from sklearn.feature_extraction.textual content material import TfidfVectorizer
from nltk.tokenize import word_tokenize
from nltk.corpus import twitter_samples
from nltk.classify import NaiveBayesClassifier
from flask import Flask, request, send_file,jsonify
import os
from datetime import datetime
import requests
import json
from flask_cors import CORS
import randomnltk.receive('punkt')
nltk.receive('twitter_samples')
app = Flask(__name__)
CORS(app, origins='http://localhost:3000')
# Preprocess comment
def preprocess_comment(comment):
comment = comment.lower()
comment = re.sub(r'W', ' ', comment) # Take away non-word characters
comment = re.sub(r's+', ' ', comment) # Take away extra areas
return comment
# Tokenize and preprocess suggestions
def preprocess_comments(suggestions):
return [preprocess_comment(comment) for comment in comments]
# Put together a sentiment classifier using SVM
def train_sentiment_classifier():
positive_tweets = twitter_samples.strings('positive_tweets.json')
negative_tweets = twitter_samples.strings('negative_tweets.json')
preprocessed_positive_tweets = preprocess_comments(positive_tweets)
preprocessed_negative_tweets = preprocess_comments(negative_tweets)
X = preprocessed_positive_tweets + preprocessed_negative_tweets
y = np.concatenate([np.ones(len(preprocessed_positive_tweets)), np.zeros(len(preprocessed_negative_tweets))])
vectorizer = TfidfVectorizer()
X_transformed = vectorizer.fit_transform(X)
classifier = SVC(kernel='linear')
classifier.match(X_transformed, y)
return classifier, vectorizer
# Classify sentiment using the educated classifier
def classify_sentiment(comment, classifier, vectorizer):
preprocessed_comment = preprocess_comment(comment)
comment_transformed = vectorizer.rework([preprocessed_comment])
sentiment = classifier.predict(comment_transformed)[0]
return 'Optimistic' if sentiment == 1 else 'Unfavorable'
@app.route('/classify_sentiments', methods=['POST'])
def classify_sentiments():
file = request.data['file']
df = pd.read_excel(file)
suggestions = df['message'].tolist()
classifier, vectorizer = train_sentiment_classifier()
sentiments = []
for comment in suggestions:
sentiment = classify_sentiment(comment, classifier, vectorizer)
sentiments.append(sentiment)
timestamp = datetime.now().strftime("%YpercentmpercentdpercentHpercentMpercentS")
filename = os.path.splitext(file.filename)[0]
output_filename = f"{filename}_{timestamp}_with_sentiments.xlsx"
df['sentiments'] = sentiments
df.to_excel(output_filename, index=False)
positive_count = sentiments.rely('Optimistic')
negative_count = sentiments.rely('Unfavorable')
response = {
'Optimistic suggestions': positive_count,
'Unfavorable suggestions': negative_count
}
random_rows = df.sample(n=10)
selected_rows = random_rows[['name', 'time', 'message', 'sentiments']]
selected_data = selected_rows.to_dict(orient='data')
response['Selected data'] = selected_data
return jsonify(response)
@app.route('/generate_xlsx', methods=['POST'])
def generate_xlsx():
page_id = request.json['page_id']
post_id = request.json['post_id']
access_token = request.json['access_token']
url = f'https://graph.fb.com/v16.0/{page_id}_{post_id}/suggestions?access_token={access_token}'
response = requests.get(url)
data = json.lots(response.textual content material)
def get_comment(comment):
return {
'title': comment['from']['name'],
'time': comment['created_time'],
'message': comment['message'],
'sentiments': ''
}
excel_data = file(map(get_comment, data['data']))
df = pd.DataFrame(excel_data)
timestamp = datetime.now().strftime("%YpercentmpercentdpercentHpercentMpercentS")
file_path = f'comments_{timestamp}.xlsx'
df.to_excel(file_path, index=False)
classify_url = 'http://127.0.0.1:5000/classify_sentiments'
data = {'file': open(file_path, 'rb')}
response = requests.publish(classify_url, data=data)
end result = response.content material materials.decode('utf-8')
return end result
if __name__ == '__main__':
app.run()
On this enterprise, we effectively developed an software program for classifying suggestions using AI strategies to categorize them into constructive and damaging sentiments. The making use of demonstrates promising outcomes and has the potential for diverse real-world functions. The generated outcomes and the step-by-step strategy of how we arrived at these outcomes give us notion into how we prepare a computer from a set of data after which put it to make use of to bolster future work.