Andrew Ng as quickly as talked about that implementing many scientific papers is without doubt one of the finest path to turning right into a improbable Machine Finding out Engineer or Data Scientist. If anyone is starting on the sphere correct now (like me), he most probably gained’t have the ability to understanding and implementing most papers merely from a main view. The path will look far more like often reducing the extent of ache and confusion until one can write some code and see some outcomes.
That’s the main machine-learning paper I’ve ever utilized. To benefit from use of it and share it with others, I decided moreover to jot down about what I’ve realized. This textual content is about “Employing deep learning and transfer learning for accurate brain tumor detection”, revealed in Nature. It is a computer imaginative and prescient work, by which authors are in quest of a turnaround for the scarcity of publicly on the market medical pictures to educate deep learning fashions.
It wasn’t that rather a lot painful to know at first view. Nevertheless this was a alternative, not a coincidence. As my first machine learning paper implementation, I decided to go for a simpler one. This Nature paper makes use of a Kaggle dataset and leverages well-known deep learning architectures like ResNet and DenseNet. I moreover had a deadline, this textual content is part of a computer imaginative and prescient faculty course enterprise. So, let’s not make large leaps.
What’s deep learning?
Deep learning is a type of artificial intelligence that teaches pc techniques to do what comes naturally to folks: research from experience. It’s a specific machine learning methodology that makes use of neural networks with many layers (subsequently the “deep” inside the title). These networks are impressed by our understanding of the biology of the human thoughts and are designed to ascertain and understand superior patterns in information often. Neural group layers may very well be broadly summarized as fairly a number of logistic regressions or totally different associated options layered collectively counting on the programmer’s choice.
For example, in case you current a deep learning model 1000’s of pictures of cats and canines (in our case, thoughts tumors), it learns to differentiate between the two with out being explicitly programmed to acknowledge explicit choices like whiskers or tails. Instead, it figures out what makes a cat a cat and a canine a canine all by itself — and no matter this sense like witchery, it is merely arithmetic. To educate a deep learning model means to find out what are among the finest parameters for its layered options. This performance makes deep learning exceptionally good at duties corresponding to voice recognition, language translation, and positive, even determining medical conditions from pictures like thoughts scans.
What’s swap learning?
Sooner than figuring out the parameters of the layered options to appropriately infer the output price of an enter, we now have to start ultimately, to pick preliminary parameters. One methodology will be to initialize each half as zero or as completely random numbers, nonetheless this might not be smart. Appropriately deciding on parameters can cut back the teaching time obligatory to realize good outcomes, to get out of native optima (within the occasion you studied calculus, the equivalent native optima thought of another carry out), and even help treatment the vanishing gradient problem.
One method is to utilize a beforehand educated model as preliminary parameters for the model new model. This system assumes {{that a}} deep learning model that already is conscious of strategies to distinguish a tree from a vehicle will probably be succesful to review faster or greater strategies to differentiate a glioma from a meningioma tumor. That’s why this method referred to as swap learning: we attempt to leverage a number of of the fashions’ earlier data, it does not have to review each half from scratch. No matter being two completely fully totally different contexts, some capabilities may be useful.
In “Utilizing deep learning and swap learning for proper thoughts tumor detection” 4 fashions beforehand educated with the ImageNet dataset have been used as preliminary parameters. ImageNet is a large-scale dataset consisting of over 14 million annotated pictures, categorized into better than 20,000 programs, which is broadly used for teaching and benchmarking image recognition algorithms in machine learning and computer imaginative and prescient. The model architectures have been ResNet152, DenseNet169, VGG19, and MobileNetV3.
My companion on this enterprise (shout out to Caproni) for the computer imaginative and prescient course has completed a video going deep into each one in every of these architectures. On this text, I’m going solely to offer a standard considered each construction and try and categorical the abstract thought behind it.
{Video Placeholder}
ResNet152
Typical deep neural networks can battle with vanishing gradients, the place information weakens as a result of it travels by the use of layers. ResNet152 tackles this by introducing “skip connections.” These connections act like shortcuts, allowing gradients to flow into immediately from earlier layers to later ones. This mitigates the vanishing gradient draw back and helps the group keep in mind important information all by way of the coaching course of. With 152 layers, ResNet boasts essential depth, making it notably adept at capturing superior patterns in medical imagery.
DenseNet169
DenseNet169 champions the thought of attribute reuse. In distinction to standard fashions the place each layer connects solely to the next layer, DenseNet connects every layer to all subsequent layers. This fosters collaboration between layers, allowing each layer to revenue from the choices realized by all earlier ones. This improves attribute extraction and reduces the required parameters, making DenseNet169 a additional setting pleasant model, notably considering its 169 layers.
VGG19
VGG19 takes a additional commonplace methodology, relying on a simple construction with stacked convolutional layers. Each layer extracts progressively intricate choices from the enter image. Whereas VGG19 lacks the flamboyant connections of ResNet or DenseNet, its simple design (19 layers) makes it easy to know and implement. Nonetheless, the sheer number of layers might make it computationally expensive compared with additional modern architectures.
MobileNetV3
MobileNetV3 prioritizes effectivity, making it ideally fitted to resource-constrained environments like mobile models. It makes use of depthwise separable convolutions, a manner that breaks down superior operations into simpler ones, significantly reducing computational costs. Furthermore, MobileNetV3 incorporates cutting-edge developments like squeeze-and-excitation modules to optimize attribute extraction. This stability between accuracy and effectivity makes MobileNetV3 a strong choice for real-time medical diagnostic functions on mobile platforms.
Implementation
I used Google Colab notebooks (referenced beneath in model names) for each construction to implement the deep learning fashions: ResNet152, DenseNet169, VGG19, and MobileNetV3. This methodology allowed me to leverage the computational belongings provided by Google Colab, corresponding to GPU acceleration, which is essential for teaching deep learning fashions successfully.
To avoid redundancy and hold clear code, I created a utils.py
file that accommodates the entire repeated options used all through the fully totally different notebooks. This file incorporates options for importing the Kaggle dataset, preprocessing pictures, augmenting information, and creating the model architectures. Doing this ensured that solely the required code was confirmed in each Colab pocket guide, making it easier to look at and understand.
Listed below are some key options from utils.py
:
Data Downloading and Preprocessing:
def upload_kaggle_json():
from google.colab import recordsdata
recordsdata.add() # Add kaggle.jsondef download_dataset():
import subprocess
subprocess.run('mkdir -p ~/.kaggle', shell=True, confirm=True)
subprocess.run('mv kaggle.json ~/.kaggle/', shell=True, confirm=True)
subprocess.run('chmod 600 ~/.kaggle/kaggle.json', shell=True, confirm=True)
subprocess.run('kaggle datasets receive -d masoudnickparvar/brain-tumor-mri-dataset', shell=True, confirm=True)
import zipfile
zip_file_path = "brain-tumor-mri-dataset.zip"
extract_dir = "raw"
with zipfile.ZipFile(zip_file_path, 'r') as zip_ref:
zip_ref.extractall(extract_dir)
def preprocess_images():
import numpy as np
from tqdm import tqdm
import cv2
import os
IMG_SIZE = 256
def transform_data(dir_father, dest_data_path_name, src_data_path_name):
for dir in dir_father:
save_path = dest_data_path_name + dir
path = os.path.be part of(src_data_path_name, dir)
image_dir = os.listdir(path)
for img in image_dir:
image = cv2.imread(os.path.be part of(path,img))
new_img = _crop_img(image)
new_img = cv2.resize(new_img,(IMG_SIZE,IMG_SIZE))
if not os.path.exists(save_path):
os.makedirs(save_path)
cv2.imwrite(save_path+'/'+img, new_img)
train_data_path = "raw/Teaching"
test_data_path = "raw/Testing"
training_dir = os.listdir(train_data_path)
testing_dir = os.listdir(test_data_path)
transform_data(training_dir, 'processed/TrainingValidation/', train_data_path)
transform_data(testing_dir, 'processed/Testing/', test_data_path)
Data Augmentation:
def augment_data():
from tensorflow.keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(
rescale=1./255,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,
vertical_flip=True,
)
test_val_datagen = ImageDataGenerator(
rescale=1./255
)
training_batch = datagen.flow_from_directory(
'processed/Teaching/',
save_format='jpg',
color_mode='rgb',
)
val_batch = test_val_datagen.flow_from_directory(
'processed/Validation/',
save_format='jpg',
color_mode='rgb',
)
test_batch = test_val_datagen.flow_from_directory(
'processed/Testing/',
save_format='jpg',
color_mode='rgb',
)
return training_batch, val_batch, test_batch
Model Creation:
def create_densenet_model(weights='imagenet'):
from tensorflow.keras import layers, Model
from tensorflow.keras.functions import DenseNet169densenet_base_model = DenseNet169(
weights=weights,
include_top=False,
input_shape=(256, 256, 3),
programs=4,
)
densenet_base_flat_model = layers.Flatten()(densenet_base_model.output)
densenet_base_top_model = layers.Dense(1000, activation='relu')(densenet_base_flat_model)
densenet_output_layer = layers.Dense(4, activation='softmax')(densenet_base_top_model)
return Model(inputs=densenet_base_model.enter, outputs=densenet_output_layer)
def create_resnet_model(weights='imagenet'):
from tensorflow.keras import layers, Model
from tensorflow.keras.functions import ResNet152
resnet_base_model = ResNet152(
weights=weights,
include_top=False,
input_shape=(256, 256, 3),
programs=4,
)
resnet_base_flat_model = layers.Flatten()(resnet_base_model.output)
resnet_base_top_model = layers.Dense(1000, activation='relu')(resnet_base_flat_model)
resnet_output_layer = layers.Dense(4, activation='softmax')(resnet_base_top_model)
return Model(inputs=resnet_base_model.enter, outputs=resnet_output_layer)
def create_vgg_model(weights='imagenet'):
from tensorflow.keras import layers, Model
from tensorflow.keras.functions import VGG19
vgg19_base_model = VGG19(
weights=weights,
include_top=False,
input_shape=(256, 256, 3)
)
vgg19_base_flat_model = layers.Flatten()(vgg19_base_model.output)
vgg19_base_top_model = layers.Dense(1000, activation='relu')(vgg19_base_flat_model)
vgg_output_layer = layers.Dense(4, activation='softmax')(vgg19_base_top_model)
return Model(inputs=vgg19_base_model.enter, outputs=vgg_output_layer)
def create_mobilenet_model(weights='imagenet'):
from tensorflow.keras import layers, Model
from tensorflow.keras.functions import MobileNetV3Large
mobilenetv3_base_model = MobileNetV3Large(
weights=weights,
include_top=False,
input_shape=(256, 256, 3)
)
mobilenetv3_base_top_model = layers.GlobalAveragePooling2D()(mobilenetv3_base_model.output)
dense_layer = layers.Dense(1000, activation='relu')(mobilenetv3_base_top_model)
mobilenet_output_layer = layers.Dense(4, activation='softmax')(dense_layer)
return Model(inputs=mobilenetv3_base_model.enter, outputs=mobilenet_output_layer)
These utility options ensured that each pocket guide was centered on the exact model construction being utilized, making it easier to look at and debug.
Teaching the Fashions
For each construction, I adopted the identical course of:
- Data Preparation: Using the
utils.py
options to acquire, preprocess, and enhance the data. - Model Creation: Creating the model using the architecture-specific carry out from
utils.py
. - Model Compilation: Compiling the model with an relevant optimizer and loss carry out.
- Model Teaching: Teaching the model on the prepared information and validating it.
- Evaluation: Evaluating the model’s effectivity on the test information.
Proper right here is an occasion of how I educated the VGG19 model:
from utils import upload_kaggle_json, download_dataset, preprocess_images, separate_training_and_validation, augment_data, create_vgg_model, custom_summaryupload_kaggle_json()
download_dataset()
preprocess_images()
separate_training_and_validation()
training_batch, val_batch, test_batch = augment_data()
vgg_model = create_vgg_model()
from tensorflow.keras import losses
from tensorflow.keras.optimizers import Adam
vgg_model.compile(
optimizer=Adam(learning_rate=0.0001),
loss=losses.categorical_crossentropy,
metrics=['accuracy'],
)
custom_summary(vgg_model)
historic previous = vgg_model.match(
training_batch,
steps_per_epoch=len(training_batch),
epochs=50,
validation_data=val_batch,
validation_steps=len(val_batch),
batch_size=32,
)
import matplotlib.pyplot as plt
acc = historic previous.historic previous['accuracy']
val_acc = historic previous.historic previous['val_accuracy']
epochs = fluctuate(1, len(acc) + 1)
plt.decide(figsize=(10, 6))
plt.plot(epochs, acc, 'bo-', label='Teaching accuracy')
plt.plot(epochs, val_acc, 'ro-', label='Validation accuracy')
plt.title('VGG19 - Teaching and Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.current()
test_loss, test_accuracy = vgg_model.think about(
test_batch,
steps=len(test_batch)
)
print(test_loss)
print(test_accuracy)
Outcomes
DenseNet169 was the one construction that behaved some intently as anticipated, displaying a dangerous teaching course of. This conduct was consistent with the findings inside the paper, the place DenseNet exhibited extreme variance in teaching metrics and not at all converged. In my implementation, DenseNet did receive a minimally inexpensive finish outcome spherical epoch 20 sooner than unexpectedly dropping and starting to converge as soon as extra, most likely due to the restricted amount of teaching information. This dropping and converging conduct was not seen inside the paper’s implementation. Its final testing accuracy was 88.94%.
MobileNetV3, which was among the finest performer inside the genuine paper, sadly, overfitted the dataset in my implementation. Whereas it achieved extreme accuracy on the teaching set, the validation accuracy was significantly lower. Which means the model realized to memorize the teaching information barely than generalize correctly to new, unseen information. Its final testing accuracy was 23.34%.
VGG19 underperformed, exhibiting indicators of underfitting. Every teaching and validation accuracy remained low and stagnant after the entire epochs, suggesting that the model was not able to efficiently research from the dataset. Its final testing accuracy was 30.89%.
ResNet152 had the identical conduct as DenseNet169, displaying a dangerous teaching course of, dropping middle teaching, and converging as soon as extra. This was as soon as extra sudden. Throughout the exact paper, ResNet was the second-best performer, solely shedding to MobileNetV3 as a consequence of its slower converging time. Its final testing accuracy was 75.89%.
Dialogue
As seen, the conduct described inside the genuine paper could not be reproduced on this implementation. My important hypothesis for this discrepancy lies inside the information augmentation step. The authors did not current explicit particulars about their information augmentation methods, nor did they publicly share their code. I attempted to realize out to them by way of the e-mail cope with provided inside the paper. Nonetheless, I obtained an automated response indicating that I would love authorization from the world administrator to ship an e-mail to that cope with.
Offered that fully totally different teaching inputs may end up in fully totally different outputs and that the problems seen are widespread to happen because of a shortage of knowledge, it is inexpensive to think about that the divergence in outcomes stems from variations in information augmentation methods. All totally different associated steps inside the model teaching and evaluation course of have been adequately detailed inside the paper, and assuming these steps have been appropriately utilized, I could not set up another potential sources of divergence.
This highlights the importance of transparency and reproducibility in scientific evaluation. With out entry to the exact information augmentation methods utilized by the authors, replicating their outcomes turns into troublesome. Future evaluation ought to emphasise sharing full methodologies, along with information preprocessing and augmentation steps, to facilitate reproducibility and validation of findings.
Conclusion
In summary, whereas DenseNet169’s conduct was significantly (considerably) consistent with the paper’s findings, totally different fashions like MobileNetV3, ResNet152, and VGG19 did not perform as anticipated, most likely because of variations in information augmentation. This reinforces the need for full documentation and open sharing of all experimental procedures in machine learning and computer imaginative and prescient evaluation.