Implementing my first machine-learning scientific paper | by Victor Assis | Jun, 2024

Andrew Ng as soon as mentioned that implementing many scientific papers is one of the best path to turning into a fantastic Machine Studying Engineer or Knowledge Scientist. If somebody is beginning on the sphere proper now (like me), he most likely gained’t be able to understanding and implementing most papers simply from a primary view. The trail will look way more like regularly lowering the extent of ache and confusion till one can write some code and see some outcomes.

That is the primary machine-learning paper I’ve ever applied. To take advantage of use of it and share it with others, I made a decision additionally to jot down about what I’ve realized. This text is about “Employing deep learning and transfer learning for accurate brain tumor detection”, revealed in Nature. It’s a pc imaginative and prescient work, by which authors are in search of a turnaround for the shortage of publicly out there medical photos to coach deep studying fashions.

It wasn’t that a lot painful to know at first view. However this was a choice, not a coincidence. As my first machine studying paper implementation, I made a decision to go for a less complicated one. This Nature paper makes use of a Kaggle dataset and leverages well-known deep studying architectures like ResNet and DenseNet. I additionally had a deadline, this text is a part of a pc imaginative and prescient school course venture. So, let’s not make massive leaps.

What’s deep studying?

Deep studying is a kind of synthetic intelligence that teaches computer systems to do what comes naturally to people: study from expertise. It’s a selected machine studying methodology that makes use of neural networks with many layers (therefore the “deep” within the title). These networks are impressed by our understanding of the biology of the human mind and are designed to establish and perceive advanced patterns in knowledge regularly. Neural community layers could be broadly summarized as quite a few logistic regressions or different related features layered collectively relying on the programmer’s selection.

As an example, if you present a deep studying mannequin 1000’s of photos of cats and canines (in our case, mind tumors), it learns to distinguish between the 2 with out being explicitly programmed to acknowledge particular options like whiskers or tails. As an alternative, it figures out what makes a cat a cat and a canine a canine all by itself — and regardless of this sense like witchery, it’s simply arithmetic. To coach a deep studying mannequin means to determine what are one of the best parameters for its layered features. This functionality makes deep studying exceptionally good at duties comparable to voice recognition, language translation, and sure, even figuring out medical situations from photos like mind scans.

What’s switch studying?

Earlier than determining the parameters of the layered features to correctly infer the output worth of an enter, we have to begin in some way, to select preliminary parameters. One method can be to initialize every part as zero or as utterly random numbers, however this may not be sensible. Correctly selecting parameters can scale back the coaching time mandatory to achieve good outcomes, to get out of native optima (in the event you studied calculus, the identical native optima idea of some other perform), and even assist remedy the vanishing gradient problem.

One technique is to make use of a beforehand educated mannequin as preliminary parameters for the brand new mannequin. This technique assumes {that a} deep studying mannequin that already is aware of methods to differentiate a tree from a automobile will be capable to study quicker or higher methods to distinguish a glioma from a meningioma tumor. That’s why this technique known as switch studying: we try to leverage a few of the fashions’ earlier information, it doesn’t have to study every part from scratch. Regardless of being two utterly completely different contexts, some capabilities could also be helpful.

In “Using deep studying and switch studying for correct mind tumor detection” 4 fashions beforehand educated with the ImageNet dataset have been used as preliminary parameters. ImageNet is a large-scale dataset consisting of over 14 million annotated photos, categorized into greater than 20,000 courses, which is broadly used for coaching and benchmarking picture recognition algorithms in machine studying and pc imaginative and prescient. The mannequin architectures have been ResNet152, DenseNet169, VGG19, and MobileNetV3.

My companion on this venture (shout out to Caproni) for the pc imaginative and prescient course has accomplished a video going deep into every one in all these architectures. On this article, I’m going solely to provide a common thought of every structure and attempt to categorical the summary thought behind it.

{Video Placeholder}

ResNet152

Conventional deep neural networks can battle with vanishing gradients, the place info weakens because it travels by way of layers. ResNet152 tackles this by introducing “skip connections.” These connections act like shortcuts, permitting gradients to circulate instantly from earlier layers to later ones. This mitigates the vanishing gradient downside and helps the community bear in mind essential info all through the training course of. With 152 layers, ResNet boasts important depth, making it notably adept at capturing advanced patterns in medical imagery.

DenseNet169

DenseNet169 champions the idea of characteristic reuse. In contrast to conventional fashions the place every layer connects solely to the following layer, DenseNet connects each layer to all subsequent layers. This fosters collaboration between layers, permitting every layer to profit from the options realized by all previous ones. This improves characteristic extraction and reduces the required parameters, making DenseNet169 a extra environment friendly mannequin, particularly contemplating its 169 layers.

VGG19

VGG19 takes a extra standard method, counting on a easy structure with stacked convolutional layers. Every layer extracts progressively intricate options from the enter picture. Whereas VGG19 lacks the flamboyant connections of ResNet or DenseNet, its easy design (19 layers) makes it simple to know and implement. Nonetheless, the sheer variety of layers could make it computationally costly in comparison with extra fashionable architectures.

MobileNetV3

MobileNetV3 prioritizes effectivity, making it ideally suited for resource-constrained environments like cellular units. It makes use of depthwise separable convolutions, a way that breaks down advanced operations into less complicated ones, considerably lowering computational prices. Moreover, MobileNetV3 incorporates cutting-edge developments like squeeze-and-excitation modules to optimize characteristic extraction. This stability between accuracy and effectivity makes MobileNetV3 a robust selection for real-time medical diagnostic purposes on cellular platforms.

Implementation

I used Google Colab notebooks (referenced beneath in mannequin names) for every structure to implement the deep studying fashions: ResNet152, DenseNet169, VGG19, and MobileNetV3. This method allowed me to leverage the computational assets supplied by Google Colab, comparable to GPU acceleration, which is important for coaching deep studying fashions effectively.

To keep away from redundancy and keep clear code, I created a utils.py file that accommodates all of the repeated features used throughout the completely different notebooks. This file contains features for importing the Kaggle dataset, preprocessing photos, augmenting knowledge, and creating the mannequin architectures. Doing this ensured that solely the required code was proven in every Colab pocket book, making it simpler to observe and perceive.

Listed here are some key features from utils.py:

Knowledge Downloading and Preprocessing:

def upload_kaggle_json():
from google.colab import recordsdata
recordsdata.add() # Add kaggle.jsondef download_dataset():
import subprocess
subprocess.run('mkdir -p ~/.kaggle', shell=True, verify=True)
subprocess.run('mv kaggle.json ~/.kaggle/', shell=True, verify=True)
subprocess.run('chmod 600 ~/.kaggle/kaggle.json', shell=True, verify=True)
subprocess.run('kaggle datasets obtain -d masoudnickparvar/brain-tumor-mri-dataset', shell=True, verify=True)
import zipfile
zip_file_path = "brain-tumor-mri-dataset.zip"
extract_dir = "uncooked"
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
zip_ref.extractall(extract_dir)
def preprocess_images():
import numpy as np
from tqdm import tqdm
import cv2
import os
IMG_SIZE = 256
def transform_data(dir_father, dest_data_path_name, src_data_path_name):
for dir in dir_father:
save_path = dest_data_path_name + dir
path = os.path.be a part of(src_data_path_name, dir)
image_dir = os.listdir(path)
for img in image_dir:
picture = cv2.imread(os.path.be a part of(path,img))
new_img = _crop_img(picture)
new_img = cv2.resize(new_img,(IMG_SIZE,IMG_SIZE))
if not os.path.exists(save_path):
os.makedirs(save_path)
cv2.imwrite(save_path+'/'+img, new_img)
train_data_path = "uncooked/Coaching"
test_data_path = "uncooked/Testing"
training_dir = os.listdir(train_data_path)
testing_dir = os.listdir(test_data_path)
transform_data(training_dir, 'processed/TrainingValidation/', train_data_path)
transform_data(testing_dir, 'processed/Testing/', test_data_path)

Knowledge Augmentation:

def augment_data():
from tensorflow.keras.preprocessing.picture import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True,
)
test_val_datagen = ImageDataGenerator(
rescale=1./255
)
training_batch = datagen.flow_from_directory(
'processed/Coaching/',
save_format='jpg',
color_mode='rgb',
)
val_batch = test_val_datagen.flow_from_directory(
'processed/Validation/',
save_format='jpg',
color_mode='rgb',
)
test_batch = test_val_datagen.flow_from_directory(
'processed/Testing/',
save_format='jpg',
color_mode='rgb',
)
return training_batch, val_batch, test_batch

Mannequin Creation:

def create_densenet_model(weights='imagenet'):
from tensorflow.keras import layers, Mannequin
from tensorflow.keras.purposes import DenseNet169densenet_base_model = DenseNet169(
weights=weights,
include_top=False,
input_shape=(256, 256, 3),
courses=4,
)
densenet_base_flat_model = layers.Flatten()(densenet_base_model.output)
densenet_base_top_model = layers.Dense(1000, activation='relu')(densenet_base_flat_model)
densenet_output_layer = layers.Dense(4, activation='softmax')(densenet_base_top_model)
return Mannequin(inputs=densenet_base_model.enter, outputs=densenet_output_layer)
def create_resnet_model(weights='imagenet'):
from tensorflow.keras import layers, Mannequin
from tensorflow.keras.purposes import ResNet152
resnet_base_model = ResNet152(
weights=weights,
include_top=False,
input_shape=(256, 256, 3),
courses=4,
)
resnet_base_flat_model = layers.Flatten()(resnet_base_model.output)
resnet_base_top_model = layers.Dense(1000, activation='relu')(resnet_base_flat_model)
resnet_output_layer = layers.Dense(4, activation='softmax')(resnet_base_top_model)
return Mannequin(inputs=resnet_base_model.enter, outputs=resnet_output_layer)
def create_vgg_model(weights='imagenet'):
from tensorflow.keras import layers, Mannequin
from tensorflow.keras.purposes import VGG19
vgg19_base_model = VGG19(
weights=weights,
include_top=False,
input_shape=(256, 256, 3)
)
vgg19_base_flat_model = layers.Flatten()(vgg19_base_model.output)
vgg19_base_top_model = layers.Dense(1000, activation='relu')(vgg19_base_flat_model)
vgg_output_layer = layers.Dense(4, activation='softmax')(vgg19_base_top_model)
return Mannequin(inputs=vgg19_base_model.enter, outputs=vgg_output_layer)
def create_mobilenet_model(weights='imagenet'):
from tensorflow.keras import layers, Mannequin
from tensorflow.keras.purposes import MobileNetV3Large
mobilenetv3_base_model = MobileNetV3Large(
weights=weights,
include_top=False,
input_shape=(256, 256, 3)
)
mobilenetv3_base_top_model = layers.GlobalAveragePooling2D()(mobilenetv3_base_model.output)
dense_layer = layers.Dense(1000, activation='relu')(mobilenetv3_base_top_model)
mobilenet_output_layer = layers.Dense(4, activation='softmax')(dense_layer)
return Mannequin(inputs=mobilenetv3_base_model.enter, outputs=mobilenet_output_layer)

These utility features ensured that every pocket book was centered on the precise mannequin structure being applied, making it simpler to observe and debug.

Coaching the Fashions

For every structure, I adopted the same course of:

Knowledge Preparation: Utilizing the utils.py features to obtain, preprocess, and increase the information.
Mannequin Creation: Creating the mannequin utilizing the architecture-specific perform from utils.py.
Mannequin Compilation: Compiling the mannequin with an applicable optimizer and loss perform.
Mannequin Coaching: Coaching the mannequin on the ready knowledge and validating it.
Analysis: Evaluating the mannequin’s efficiency on the check knowledge.

Right here is an instance of how I educated the VGG19 mannequin:

from utils import upload_kaggle_json, download_dataset, preprocess_images, separate_training_and_validation, augment_data, create_vgg_model, custom_summaryupload_kaggle_json()
download_dataset()
preprocess_images()
separate_training_and_validation()
training_batch, val_batch, test_batch = augment_data()
vgg_model = create_vgg_model()
from tensorflow.keras import losses
from tensorflow.keras.optimizers import Adam
vgg_model.compile(
optimizer=Adam(learning_rate=0.0001),
loss=losses.categorical_crossentropy,
metrics=['accuracy'],
)
custom_summary(vgg_model)
historical past = vgg_model.match(
training_batch,
steps_per_epoch=len(training_batch),
epochs=50,
validation_data=val_batch,
validation_steps=len(val_batch),
batch_size=32,
)
import matplotlib.pyplot as plt
acc = historical past.historical past['accuracy']
val_acc = historical past.historical past['val_accuracy']
epochs = vary(1, len(acc) + 1)
plt.determine(figsize=(10, 6))
plt.plot(epochs, acc, 'bo-', label='Coaching accuracy')
plt.plot(epochs, val_acc, 'ro-', label='Validation accuracy')
plt.title('VGG19 - Coaching and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.present()
test_loss, test_accuracy = vgg_model.consider(
test_batch,
steps=len(test_batch)
)
print(test_loss)
print(test_accuracy)

Outcomes

DenseNet169 was the one structure that behaved some intently as anticipated, displaying a risky coaching course of. This conduct was in step with the findings within the paper, the place DenseNet exhibited excessive variance in coaching metrics and by no means converged. In my implementation, DenseNet did obtain a minimally affordable end result round epoch 20 earlier than unexpectedly dropping and beginning to converge once more, probably because of the restricted quantity of coaching knowledge. This dropping and converging conduct was not noticed within the paper’s implementation. Its last testing accuracy was 88.94%.

DenseNet169 validation/test chart showing high volatility

MobileNetV3, which was one of the best performer within the authentic paper, sadly, overfitted the dataset in my implementation. Whereas it achieved excessive accuracy on the coaching set, the validation accuracy was considerably decrease. This means that the mannequin realized to memorize the coaching knowledge slightly than generalize properly to new, unseen knowledge. Its last testing accuracy was 23.34%.

VGG19 underperformed, exhibiting indicators of underfitting. Each coaching and validation accuracy remained low and stagnant after all of the epochs, suggesting that the mannequin was not capable of successfully study from the dataset. Its last testing accuracy was 30.89%.

VGG19 validation/test chart showing underfitting

ResNet152 had the same conduct as DenseNet169, displaying a risky coaching course of, dropping center coaching, and converging once more. This was once more sudden. Within the precise paper, ResNet was the second-best performer, solely shedding to MobileNetV3 due to its slower converging time. Its last testing accuracy was 75.89%.

Dialogue

As noticed, the conduct described within the authentic paper couldn’t be reproduced on this implementation. My essential speculation for this discrepancy lies within the knowledge augmentation step. The authors didn’t present particular particulars about their knowledge augmentation strategies, nor did they publicly share their code. I tried to achieve out to them through the e-mail deal with supplied within the paper. Nonetheless, I obtained an automatic response indicating that I would like authorization from the area administrator to ship an e-mail to that deal with.

Provided that completely different coaching inputs can result in completely different outputs and that the issues seen are widespread to occur as a result of a scarcity of information, it’s affordable to imagine that the divergence in outcomes stems from variations in knowledge augmentation strategies. All different related steps within the mannequin coaching and analysis course of have been adequately detailed within the paper, and assuming these steps have been appropriately applied, I couldn’t establish some other potential sources of divergence.

This highlights the significance of transparency and reproducibility in scientific analysis. With out entry to the precise knowledge augmentation strategies utilized by the authors, replicating their outcomes turns into difficult. Future analysis ought to emphasize sharing full methodologies, together with knowledge preprocessing and augmentation steps, to facilitate reproducibility and validation of findings.

Conclusion

In abstract, whereas DenseNet169’s conduct was considerably (somewhat) in step with the paper’s findings, different fashions like MobileNetV3, ResNet152, and VGG19 didn’t carry out as anticipated, probably as a result of variations in knowledge augmentation. This reinforces the necessity for complete documentation and open sharing of all experimental procedures in machine studying and pc imaginative and prescient analysis.

References

Source link

Implementing my first machine-learning scientific paper | by Victor Assis | Jun, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

Drone racing drives AI innovation for space exploration

The Future of Medical Assessment: the ML-Powered Pose-Mapping Technique

About AI/ML(Ops). Summary | by Alexzap | Apr, 2024

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

Implementing my first machine-learning scientific paper | by Victor Assis | Jun, 2024

What’s deep studying?

What’s switch studying?

ResNet152

DenseNet169

VGG19

MobileNetV3

Implementation

Coaching the Fashions

Outcomes

Dialogue

Conclusion

References

Related Posts