Andrew Ng as soon as mentioned that implementing many scientific papers is one of the best path to turning into a fantastic Machine Studying Engineer or Knowledge Scientist. If somebody is beginning on the sphere proper now (like me), he most likely gained’t be able to understanding and implementing most papers simply from a primary view. The trail will look way more like regularly lowering the extent of ache and confusion till one can write some code and see some outcomes.
That is the primary machine-learning paper I’ve ever applied. To take advantage of use of it and share it with others, I made a decision additionally to jot down about what I’ve realized. This text is about “Employing deep learning and transfer learning for accurate brain tumor detection”, revealed in Nature. It’s a pc imaginative and prescient work, by which authors are in search of a turnaround for the shortage of publicly out there medical photos to coach deep studying fashions.
It wasn’t that a lot painful to know at first view. However this was a choice, not a coincidence. As my first machine studying paper implementation, I made a decision to go for a less complicated one. This Nature paper makes use of a Kaggle dataset and leverages well-known deep studying architectures like ResNet and DenseNet. I additionally had a deadline, this text is a part of a pc imaginative and prescient school course venture. So, let’s not make massive leaps.
What’s deep studying?
Deep studying is a kind of synthetic intelligence that teaches computer systems to do what comes naturally to people: study from expertise. It’s a selected machine studying methodology that makes use of neural networks with many layers (therefore the “deep” within the title). These networks are impressed by our understanding of the biology of the human mind and are designed to establish and perceive advanced patterns in knowledge regularly. Neural community layers could be broadly summarized as quite a few logistic regressions or different related features layered collectively relying on the programmer’s selection.
As an example, if you present a deep studying mannequin 1000’s of photos of cats and canines (in our case, mind tumors), it learns to distinguish between the 2 with out being explicitly programmed to acknowledge particular options like whiskers or tails. As an alternative, it figures out what makes a cat a cat and a canine a canine all by itself — and regardless of this sense like witchery, it’s simply arithmetic. To coach a deep studying mannequin means to determine what are one of the best parameters for its layered features. This functionality makes deep studying exceptionally good at duties comparable to voice recognition, language translation, and sure, even figuring out medical situations from photos like mind scans.
What’s switch studying?
Earlier than determining the parameters of the layered features to correctly infer the output worth of an enter, we have to begin in some way, to select preliminary parameters. One method can be to initialize every part as zero or as utterly random numbers, however this may not be sensible. Correctly selecting parameters can scale back the coaching time mandatory to achieve good outcomes, to get out of native optima (in the event you studied calculus, the identical native optima idea of some other perform), and even assist remedy the vanishing gradient problem.
One technique is to make use of a beforehand educated mannequin as preliminary parameters for the brand new mannequin. This technique assumes {that a} deep studying mannequin that already is aware of methods to differentiate a tree from a automobile will be capable to study quicker or higher methods to distinguish a glioma from a meningioma tumor. That’s why this technique known as switch studying: we try to leverage a few of the fashions’ earlier information, it doesn’t have to study every part from scratch. Regardless of being two utterly completely different contexts, some capabilities could also be helpful.
In “Using deep studying and switch studying for correct mind tumor detection” 4 fashions beforehand educated with the ImageNet dataset have been used as preliminary parameters. ImageNet is a large-scale dataset consisting of over 14 million annotated photos, categorized into greater than 20,000 courses, which is broadly used for coaching and benchmarking picture recognition algorithms in machine studying and pc imaginative and prescient. The mannequin architectures have been ResNet152, DenseNet169, VGG19, and MobileNetV3.
My companion on this venture (shout out to Caproni) for the pc imaginative and prescient course has accomplished a video going deep into every one in all these architectures. On this article, I’m going solely to provide a common thought of every structure and attempt to categorical the summary thought behind it.
{Video Placeholder}
ResNet152
Conventional deep neural networks can battle with vanishing gradients, the place info weakens because it travels by way of layers. ResNet152 tackles this by introducing “skip connections.” These connections act like shortcuts, permitting gradients to circulate instantly from earlier layers to later ones. This mitigates the vanishing gradient downside and helps the community bear in mind essential info all through the training course of. With 152 layers, ResNet boasts important depth, making it notably adept at capturing advanced patterns in medical imagery.
DenseNet169
DenseNet169 champions the idea of characteristic reuse. In contrast to conventional fashions the place every layer connects solely to the following layer, DenseNet connects each layer to all subsequent layers. This fosters collaboration between layers, permitting every layer to profit from the options realized by all previous ones. This improves characteristic extraction and reduces the required parameters, making DenseNet169 a extra environment friendly mannequin, particularly contemplating its 169 layers.
VGG19
VGG19 takes a extra standard method, counting on a easy structure with stacked convolutional layers. Every layer extracts progressively intricate options from the enter picture. Whereas VGG19 lacks the flamboyant connections of ResNet or DenseNet, its easy design (19 layers) makes it simple to know and implement. Nonetheless, the sheer variety of layers could make it computationally costly in comparison with extra fashionable architectures.
MobileNetV3
MobileNetV3 prioritizes effectivity, making it ideally suited for resource-constrained environments like cellular units. It makes use of depthwise separable convolutions, a way that breaks down advanced operations into less complicated ones, considerably lowering computational prices. Moreover, MobileNetV3 incorporates cutting-edge developments like squeeze-and-excitation modules to optimize characteristic extraction. This stability between accuracy and effectivity makes MobileNetV3 a robust selection for real-time medical diagnostic purposes on cellular platforms.
Implementation
I used Google Colab notebooks (referenced beneath in mannequin names) for every structure to implement the deep studying fashions: ResNet152, DenseNet169, VGG19, and MobileNetV3. This method allowed me to leverage the computational assets supplied by Google Colab, comparable to GPU acceleration, which is important for coaching deep studying fashions effectively.
To keep away from redundancy and keep clear code, I created a utils.py
file that accommodates all of the repeated features used throughout the completely different notebooks. This file contains features for importing the Kaggle dataset, preprocessing photos, augmenting knowledge, and creating the mannequin architectures. Doing this ensured that solely the required code was proven in every Colab pocket book, making it simpler to observe and perceive.
Listed here are some key features from utils.py
:
Knowledge Downloading and Preprocessing:
def upload_kaggle_json():
from google.colab import recordsdata
recordsdata.add() # Add kaggle.jsondef download_dataset():
import subprocess
subprocess.run('mkdir -p ~/.kaggle', shell=True, verify=True)
subprocess.run('mv kaggle.json ~/.kaggle/', shell=True, verify=True)
subprocess.run('chmod 600 ~/.kaggle/kaggle.json', shell=True, verify=True)
subprocess.run('kaggle datasets obtain -d masoudnickparvar/brain-tumor-mri-dataset', shell=True, verify=True)
import zipfile
zip_file_path = "brain-tumor-mri-dataset.zip"
extract_dir = "uncooked"
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
zip_ref.extractall(extract_dir)
def preprocess_images():
import numpy as np
from tqdm import tqdm
import cv2
import os
IMG_SIZE = 256
def transform_data(dir_father, dest_data_path_name, src_data_path_name):
for dir in dir_father:
save_path = dest_data_path_name + dir
path = os.path.be a part of(src_data_path_name, dir)
image_dir = os.listdir(path)
for img in image_dir:
picture = cv2.imread(os.path.be a part of(path,img))
new_img = _crop_img(picture)
new_img = cv2.resize(new_img,(IMG_SIZE,IMG_SIZE))
if not os.path.exists(save_path):
os.makedirs(save_path)
cv2.imwrite(save_path+'/'+img, new_img)
train_data_path = "uncooked/Coaching"
test_data_path = "uncooked/Testing"
training_dir = os.listdir(train_data_path)
testing_dir = os.listdir(test_data_path)
transform_data(training_dir, 'processed/TrainingValidation/', train_data_path)
transform_data(testing_dir, 'processed/Testing/', test_data_path)
Knowledge Augmentation:
def augment_data():
from tensorflow.keras.preprocessing.picture import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True,
)
test_val_datagen = ImageDataGenerator(
rescale=1./255
)
training_batch = datagen.flow_from_directory(
'processed/Coaching/',
save_format='jpg',
color_mode='rgb',
)
val_batch = test_val_datagen.flow_from_directory(
'processed/Validation/',
save_format='jpg',
color_mode='rgb',
)
test_batch = test_val_datagen.flow_from_directory(
'processed/Testing/',
save_format='jpg',
color_mode='rgb',
)
return training_batch, val_batch, test_batch
Mannequin Creation:
def create_densenet_model(weights='imagenet'):
from tensorflow.keras import layers, Mannequin
from tensorflow.keras.purposes import DenseNet169densenet_base_model = DenseNet169(
weights=weights,
include_top=False,
input_shape=(256, 256, 3),
courses=4,
)
densenet_base_flat_model = layers.Flatten()(densenet_base_model.output)
densenet_base_top_model = layers.Dense(1000, activation='relu')(densenet_base_flat_model)
densenet_output_layer = layers.Dense(4, activation='softmax')(densenet_base_top_model)
return Mannequin(inputs=densenet_base_model.enter, outputs=densenet_output_layer)
def create_resnet_model(weights='imagenet'):
from tensorflow.keras import layers, Mannequin
from tensorflow.keras.purposes import ResNet152
resnet_base_model = ResNet152(
weights=weights,
include_top=False,
input_shape=(256, 256, 3),
courses=4,
)
resnet_base_flat_model = layers.Flatten()(resnet_base_model.output)
resnet_base_top_model = layers.Dense(1000, activation='relu')(resnet_base_flat_model)
resnet_output_layer = layers.Dense(4, activation='softmax')(resnet_base_top_model)
return Mannequin(inputs=resnet_base_model.enter, outputs=resnet_output_layer)
def create_vgg_model(weights='imagenet'):
from tensorflow.keras import layers, Mannequin
from tensorflow.keras.purposes import VGG19
vgg19_base_model = VGG19(
weights=weights,
include_top=False,
input_shape=(256, 256, 3)
)
vgg19_base_flat_model = layers.Flatten()(vgg19_base_model.output)
vgg19_base_top_model = layers.Dense(1000, activation='relu')(vgg19_base_flat_model)
vgg_output_layer = layers.Dense(4, activation='softmax')(vgg19_base_top_model)
return Mannequin(inputs=vgg19_base_model.enter, outputs=vgg_output_layer)
def create_mobilenet_model(weights='imagenet'):
from tensorflow.keras import layers, Mannequin
from tensorflow.keras.purposes import MobileNetV3Large
mobilenetv3_base_model = MobileNetV3Large(
weights=weights,
include_top=False,
input_shape=(256, 256, 3)
)
mobilenetv3_base_top_model = layers.GlobalAveragePooling2D()(mobilenetv3_base_model.output)
dense_layer = layers.Dense(1000, activation='relu')(mobilenetv3_base_top_model)
mobilenet_output_layer = layers.Dense(4, activation='softmax')(dense_layer)
return Mannequin(inputs=mobilenetv3_base_model.enter, outputs=mobilenet_output_layer)
These utility features ensured that every pocket book was centered on the precise mannequin structure being applied, making it simpler to observe and debug.
Coaching the Fashions
For every structure, I adopted the same course of:
- Knowledge Preparation: Utilizing the
utils.py
features to obtain, preprocess, and increase the information. - Mannequin Creation: Creating the mannequin utilizing the architecture-specific perform from
utils.py
. - Mannequin Compilation: Compiling the mannequin with an applicable optimizer and loss perform.
- Mannequin Coaching: Coaching the mannequin on the ready knowledge and validating it.
- Analysis: Evaluating the mannequin’s efficiency on the check knowledge.
Right here is an instance of how I educated the VGG19 mannequin:
from utils import upload_kaggle_json, download_dataset, preprocess_images, separate_training_and_validation, augment_data, create_vgg_model, custom_summaryupload_kaggle_json()
download_dataset()
preprocess_images()
separate_training_and_validation()
training_batch, val_batch, test_batch = augment_data()
vgg_model = create_vgg_model()
from tensorflow.keras import losses
from tensorflow.keras.optimizers import Adam
vgg_model.compile(
optimizer=Adam(learning_rate=0.0001),
loss=losses.categorical_crossentropy,
metrics=['accuracy'],
)
custom_summary(vgg_model)
historical past = vgg_model.match(
training_batch,
steps_per_epoch=len(training_batch),
epochs=50,
validation_data=val_batch,
validation_steps=len(val_batch),
batch_size=32,
)
import matplotlib.pyplot as plt
acc = historical past.historical past['accuracy']
val_acc = historical past.historical past['val_accuracy']
epochs = vary(1, len(acc) + 1)
plt.determine(figsize=(10, 6))
plt.plot(epochs, acc, 'bo-', label='Coaching accuracy')
plt.plot(epochs, val_acc, 'ro-', label='Validation accuracy')
plt.title('VGG19 - Coaching and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.present()
test_loss, test_accuracy = vgg_model.consider(
test_batch,
steps=len(test_batch)
)
print(test_loss)
print(test_accuracy)
Outcomes
DenseNet169 was the one structure that behaved some intently as anticipated, displaying a risky coaching course of. This conduct was in step with the findings within the paper, the place DenseNet exhibited excessive variance in coaching metrics and by no means converged. In my implementation, DenseNet did obtain a minimally affordable end result round epoch 20 earlier than unexpectedly dropping and beginning to converge once more, probably because of the restricted quantity of coaching knowledge. This dropping and converging conduct was not noticed within the paper’s implementation. Its last testing accuracy was 88.94%.
MobileNetV3, which was one of the best performer within the authentic paper, sadly, overfitted the dataset in my implementation. Whereas it achieved excessive accuracy on the coaching set, the validation accuracy was considerably decrease. This means that the mannequin realized to memorize the coaching knowledge slightly than generalize properly to new, unseen knowledge. Its last testing accuracy was 23.34%.
VGG19 underperformed, exhibiting indicators of underfitting. Each coaching and validation accuracy remained low and stagnant after all of the epochs, suggesting that the mannequin was not capable of successfully study from the dataset. Its last testing accuracy was 30.89%.
ResNet152 had the same conduct as DenseNet169, displaying a risky coaching course of, dropping center coaching, and converging once more. This was once more sudden. Within the precise paper, ResNet was the second-best performer, solely shedding to MobileNetV3 due to its slower converging time. Its last testing accuracy was 75.89%.
Dialogue
As noticed, the conduct described within the authentic paper couldn’t be reproduced on this implementation. My essential speculation for this discrepancy lies within the knowledge augmentation step. The authors didn’t present particular particulars about their knowledge augmentation strategies, nor did they publicly share their code. I tried to achieve out to them through the e-mail deal with supplied within the paper. Nonetheless, I obtained an automatic response indicating that I would like authorization from the area administrator to ship an e-mail to that deal with.
Provided that completely different coaching inputs can result in completely different outputs and that the issues seen are widespread to occur as a result of a scarcity of information, it’s affordable to imagine that the divergence in outcomes stems from variations in knowledge augmentation strategies. All different related steps within the mannequin coaching and analysis course of have been adequately detailed within the paper, and assuming these steps have been appropriately applied, I couldn’t establish some other potential sources of divergence.
This highlights the significance of transparency and reproducibility in scientific analysis. With out entry to the precise knowledge augmentation strategies utilized by the authors, replicating their outcomes turns into difficult. Future analysis ought to emphasize sharing full methodologies, together with knowledge preprocessing and augmentation steps, to facilitate reproducibility and validation of findings.
Conclusion
In abstract, whereas DenseNet169’s conduct was considerably (somewhat) in step with the paper’s findings, different fashions like MobileNetV3, ResNet152, and VGG19 didn’t carry out as anticipated, probably as a result of variations in knowledge augmentation. This reinforces the necessity for complete documentation and open sharing of all experimental procedures in machine studying and pc imaginative and prescient analysis.