To determine sample in video frames, we improve a BERT (Bidirectional Encoder Representations from Transformers) mannequin on this examine. The mannequin is skilled on labelled textual content knowledge utilizing PyTorch and Hugging Face’s Transformers library. After that, we construct a Streamlit app that lets customers enter movies, extract frames, carry out optical character recognition (OCR) with PyTesseract, and recognise patterns in actual time. Beginning with knowledge preparation and mannequin fine-tuning, this text will stroll you thru the entire course of of putting in the detection system utilizing Streamlit.
Earlier than beginning, guarantee you could have the next libraries put in:
torch
transformers
datasets
pytesseract
Pillow
streamlit
tqdm
pandas
You’ll be able to set up these dependencies utilizing a necessities.txt
file:
torch==1.10.0
transformers==4.9.2
datasets==1.11.0
pytesseract==0.3.8
Pillow==8.4.0
streamlit==0.88.0
tqdm==4.62.3
pandas==1.3.3
Set up the dependencies with the next command:
pip set up -r necessities.txt
First, put together your dataset in a CSV format with a textual content
column containing the sequences and a label
column for the labels. Save this as knowledge.csv
.
Subsequent, create a script fine_tune_bert.py
to fine-tune the BERT mannequin:
import torch
from transformers import BertTokenizer, BertForSequenceClassification, Coach, TrainingArguments
from datasets import Dataset
import pandas as pd# Load dataset from CSV
knowledge = pd.read_csv("knowledge.csv")
dataset = Dataset.from_pandas(knowledge)
# Tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
def tokenize_function(examples):
return tokenizer(examples['text'], padding="max_length", truncation=True)
tokenized_datasets = dataset.map(tokenize_function, batched=True)
# Cut up dataset
train_test_split = tokenized_datasets.train_test_split(test_size=0.2)
train_dataset = train_test_split['train']
test_dataset = train_test_split['test']
# Load mannequin and transfer to CPU
system = torch.system("cpu")
mannequin = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2).to(system)
# Outline coaching arguments
training_args = TrainingArguments(
output_dir='./outcomes',
evaluation_strategy="epoch",
learning_rate=2e-5,
per_device_train_batch_size=16,
per_device_eval_batch_size=16,
num_train_epochs=3,
weight_decay=0.01,
)
# Coach
coach = Coach(
mannequin=mannequin,
args=training_args,
train_dataset=train_dataset,
eval_dataset=test_dataset,
)
# Practice mannequin
coach.prepare()
# Save mannequin
mannequin.save_pretrained('fine_tuned_bert')
tokenizer.save_pretrained('fine_tuned_bert')
Clarification of the Wonderful-Tuning
- Load Dataset: The script masses a dataset from a CSV file containing textual content sequences and their labels.
- Tokenization: The textual content knowledge is tokenized utilizing the
BertTokenizer
from the Hugging Face library. Thetokenize_function
handles the tokenization course of. - Dataset Cut up: The tokenized dataset is cut up into coaching and testing units.
- Mannequin Initialization: A BERT mannequin for sequence classification is loaded and moved to the CPU.
- Coaching Arguments: Outline the coaching parameters comparable to studying price, batch measurement, and the variety of epochs.
- Coach Initialization: The
Coach
class from Hugging Face is used to deal with the coaching course of. - Coaching: The mannequin is skilled utilizing the desired coaching arguments and dataset.
- Save Mannequin: The fine-tuned mannequin and tokenizer are saved for later use.
Run the script to fine-tune the BERT mannequin:
python fine_tune_bert.py
Create a brand new script streamlit_app.py
for the Streamlit app:
import streamlit as st
import pytesseract
from PIL import Picture
import torch
from transformers import BertTokenizer, BertForSequenceClassification
import re
import os
from tqdm import tqdm
from extract_frames import extract_frames
# Load the mannequin and tokenizer and transfer to CPU
system = torch.system("cpu")
tokenizer = BertTokenizer.from_pretrained('fine_tuned_bert')
mannequin = BertForSequenceClassification.from_pretrained('fine_tuned_bert').to(system)
st.title("Sample Detection from Video Frames")
uploaded_file = st.file_uploader("Select a video file...", kind=["mp4", "avi", "mov"])
if uploaded_file isn't None:
# Extract frames
frames_folder = "frames_folder"
if not os.path.exists(frames_folder):
os.makedirs(frames_folder)
with open("uploaded_video.mp4", "wb") as f:
f.write(uploaded_file.getbuffer())
st.video(uploaded_file)
extract_frames("uploaded_video.mp4", frames_folder)
pattern_found = False
progress_bar = st.progress(0)
frame_files = sorted(os.listdir(frames_folder))for i, body in enumerate(tqdm(frame_files, desc="Analyzing Frames", unit="body")):
if pattern_found:
break
img = Picture.open(os.path.be a part of(frames_folder, body))
st.picture(img, caption=body, use_column_width=True)
# Extract textual content utilizing OCR
textual content = pytesseract.image_to_string(img)
st.write(f"Extracted Textual content: {textual content}")
# Detect any 10-digit quantity
digit_sequences = re.findall(r'd{10}', textual content)
st.write(f"Detected Sequences: {digit_sequences}")
for sequence in digit_sequences:
inputs = tokenizer(sequence, return_tensors="pt").to(system)
outputs = mannequin(**inputs)
prediction = outputs.logits.argmax().merchandise()
st.write(f"Prediction for sequence {sequence}: {prediction}")
if prediction == 1: # Assuming label 1 signifies a sound sample
st.write(f"Sample has been discovered: {sequence}")
pattern_found = True
break
# Replace progress bar
progress_bar.progress((i + 1) / len(frame_files))
progress_bar.empty() # Take away progress bar after completion
if not pattern_found:
st.write("No sample discovered")
Clarification of the Streamlit App Script
- Import Libraries: The script imports mandatory libraries together with Streamlit, PyTesseract, PIL, PyTorch, Transformers, and others.
- Load Mannequin and Tokenizer: The fine-tuned BERT mannequin and tokenizer are loaded and moved to the CPU.
- Streamlit Interface: The Streamlit app interface is created, permitting customers to add a video file.
- Body Extraction: The uploaded video file is saved domestically, and frames are extracted utilizing a helper operate
extract_frames
. - Body Evaluation: The app iterates over the extracted frames, displaying every body and performing OCR to extract textual content.
- Sample Detection: The extracted textual content is scanned for any 10-digit sequences utilizing a daily expression. These sequences are then fed into the fine-tuned BERT mannequin for prediction.
- Show Outcomes: If a sound sample is detected (indicated by the mannequin’s prediction), the app shows the detected sample and stops additional evaluation. In any other case, it continues till all frames are processed or a sample is discovered.
- Progress Bar: A progress bar is up to date to replicate the evaluation progress.
To run the Streamlit app, use the next command:
streamlit run streamlit_app.py
Add a video file (MP4, AVI, MOV) to provoke the evaluation. The app will extract frames, apply OCR, and use the fine-tuned BERT mannequin to detect 10-digit numbers in real-time, displaying the outcomes instantly upon detection.
On this challenge, we efficiently fine-tuned a BERT mannequin to detect a sample (initially we’re searching for 10 digits quantity sequence) in video frames and created a user-friendly Streamlit app to deploy this answer. This method could be prolonged to different sample detection duties in varied purposes, demonstrating the flexibility and energy of mixing state-of-the-art NLP fashions with accessible deployment instruments like Streamlit.