Pattern Recognition in Video Frames using Fine-Tuned BERT (Gen AI) Model, OCR(Optical Character Recognition), Streamlit. | by Shivanshu Anand | Jun, 2024

To determine sample in video frames, we improve a BERT (Bidirectional Encoder Representations from Transformers) mannequin on this examine. The mannequin is skilled on labelled textual content knowledge utilizing PyTorch and Hugging Face’s Transformers library. After that, we construct a Streamlit app that lets customers enter movies, extract frames, carry out optical character recognition (OCR) with PyTesseract, and recognise patterns in actual time. Beginning with knowledge preparation and mannequin fine-tuning, this text will stroll you thru the entire course of of putting in the detection system utilizing Streamlit.

Earlier than beginning, guarantee you could have the next libraries put in:

torch
transformers
datasets
pytesseract
Pillow
streamlit
tqdm
pandas

You’ll be able to set up these dependencies utilizing a necessities.txt file:

torch==1.10.0
transformers==4.9.2
datasets==1.11.0
pytesseract==0.3.8
Pillow==8.4.0
streamlit==0.88.0
tqdm==4.62.3
pandas==1.3.3

Set up the dependencies with the next command:

pip set up -r necessities.txt

First, put together your dataset in a CSV format with a textual content column containing the sequences and a label column for the labels. Save this as knowledge.csv.

Subsequent, create a script fine_tune_bert.py to fine-tune the BERT mannequin:

import torch
from transformers import BertTokenizer, BertForSequenceClassification, Coach, TrainingArguments
from datasets import Dataset
import pandas as pd# Load dataset from CSV
knowledge = pd.read_csv("knowledge.csv")
dataset = Dataset.from_pandas(knowledge)
# Tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
def tokenize_function(examples):
return tokenizer(examples['text'], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Cut up dataset
train_test_split = tokenized_datasets.train_test_split(test_size=0.2)
train_dataset = train_test_split['train']
test_dataset = train_test_split['test']
# Load mannequin and transfer to CPU
system = torch.system("cpu")
mannequin = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2).to(system)
# Outline coaching arguments
training_args = TrainingArguments(
output_dir='./outcomes',
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
)
# Coach
coach = Coach(
mannequin=mannequin,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)
# Practice mannequin
coach.prepare()
# Save mannequin
mannequin.save_pretrained('fine_tuned_bert')
tokenizer.save_pretrained('fine_tuned_bert')

Clarification of the Wonderful-Tuning

Load Dataset: The script masses a dataset from a CSV file containing textual content sequences and their labels.
Tokenization: The textual content knowledge is tokenized utilizing the BertTokenizer from the Hugging Face library. The tokenize_function handles the tokenization course of.
Dataset Cut up: The tokenized dataset is cut up into coaching and testing units.
Mannequin Initialization: A BERT mannequin for sequence classification is loaded and moved to the CPU.
Coaching Arguments: Outline the coaching parameters comparable to studying price, batch measurement, and the variety of epochs.
Coach Initialization: The Coach class from Hugging Face is used to deal with the coaching course of.
Coaching: The mannequin is skilled utilizing the desired coaching arguments and dataset.
Save Mannequin: The fine-tuned mannequin and tokenizer are saved for later use.

Run the script to fine-tune the BERT mannequin:

python fine_tune_bert.py

Create a brand new script streamlit_app.py for the Streamlit app:

import streamlit as st
import pytesseract
from PIL import Picture
import torch
from transformers import BertTokenizer, BertForSequenceClassification
import re
import os
from tqdm import tqdm
from extract_frames import extract_frames
# Load the mannequin and tokenizer and transfer to CPU
system = torch.system("cpu")
tokenizer = BertTokenizer.from_pretrained('fine_tuned_bert')
mannequin = BertForSequenceClassification.from_pretrained('fine_tuned_bert').to(system)
st.title("Sample Detection from Video Frames")
uploaded_file = st.file_uploader("Select a video file...", kind=["mp4", "avi", "mov"])
if uploaded_file isn't None:
# Extract frames
frames_folder = "frames_folder"
if not os.path.exists(frames_folder):
os.makedirs(frames_folder)
with open("uploaded_video.mp4", "wb") as f:
f.write(uploaded_file.getbuffer())
st.video(uploaded_file)
extract_frames("uploaded_video.mp4", frames_folder)
pattern_found = False
progress_bar = st.progress(0)
frame_files = sorted(os.listdir(frames_folder))for i, body in enumerate(tqdm(frame_files, desc="Analyzing Frames", unit="body")):
if pattern_found:
break
img = Picture.open(os.path.be a part of(frames_folder, body))
st.picture(img, caption=body, use_column_width=True)
# Extract textual content utilizing OCR
textual content = pytesseract.image_to_string(img)
st.write(f"Extracted Textual content: {textual content}")
# Detect any 10-digit quantity
digit_sequences = re.findall(r'd{10}', textual content)
st.write(f"Detected Sequences: {digit_sequences}")
for sequence in digit_sequences:
inputs = tokenizer(sequence, return_tensors="pt").to(system)
outputs = mannequin(**inputs)
prediction = outputs.logits.argmax().merchandise()
st.write(f"Prediction for sequence {sequence}: {prediction}")
if prediction == 1:  # Assuming label 1 signifies a sound sample
st.write(f"Sample has been discovered: {sequence}")
pattern_found = True
break
# Replace progress bar
progress_bar.progress((i + 1) / len(frame_files))
progress_bar.empty()  # Take away progress bar after completion
if not pattern_found:
st.write("No sample discovered")

Clarification of the Streamlit App Script

Import Libraries: The script imports mandatory libraries together with Streamlit, PyTesseract, PIL, PyTorch, Transformers, and others.
Load Mannequin and Tokenizer: The fine-tuned BERT mannequin and tokenizer are loaded and moved to the CPU.
Streamlit Interface: The Streamlit app interface is created, permitting customers to add a video file.
Body Extraction: The uploaded video file is saved domestically, and frames are extracted utilizing a helper operate extract_frames.
Body Evaluation: The app iterates over the extracted frames, displaying every body and performing OCR to extract textual content.
Sample Detection: The extracted textual content is scanned for any 10-digit sequences utilizing a daily expression. These sequences are then fed into the fine-tuned BERT mannequin for prediction.
Show Outcomes: If a sound sample is detected (indicated by the mannequin’s prediction), the app shows the detected sample and stops additional evaluation. In any other case, it continues till all frames are processed or a sample is discovered.
Progress Bar: A progress bar is up to date to replicate the evaluation progress.

To run the Streamlit app, use the next command:

streamlit run streamlit_app.py

Add a video file (MP4, AVI, MOV) to provoke the evaluation. The app will extract frames, apply OCR, and use the fine-tuned BERT mannequin to detect 10-digit numbers in real-time, displaying the outcomes instantly upon detection.

On this challenge, we efficiently fine-tuned a BERT mannequin to detect a sample (initially we’re searching for 10 digits quantity sequence) in video frames and created a user-friendly Streamlit app to deploy this answer. This method could be prolonged to different sample detection duties in varied purposes, demonstrating the flexibility and energy of mixing state-of-the-art NLP fashions with accessible deployment instruments like Streamlit.

Source link

Pattern Recognition in Video Frames using Fine-Tuned BERT (Gen AI) Model, OCR(Optical Character Recognition), Streamlit. | by Shivanshu Anand | Jun, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

“Empowering Innovation: Harnessing Azure AI and Machine Learning for Transformative Solutions” | by T&T Techies Guide | May, 2024

What happened when 20 comedians got AI to write their routines

MakeNude AI Pricing, Features, Details, Alternatives

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Pattern Recognition in Video Frames using Fine-Tuned BERT (Gen AI) Model, OCR(Optical Character Recognition), Streamlit. | by Shivanshu Anand | Jun, 2024

Clarification of the Wonderful-Tuning

Clarification of the Streamlit App Script

Related Posts