HEALTH CARE
In healthcare, the most common application of traditional machine learning is precision medicine – predicting what treatment protocols are likely to succeed on a patient based on various patient attributes and the treatment context. The great majority of machine learning and precision medicine applications require a training dataset for which the outcome variable (eg onset of disease) is known; this is called supervised learning.
​
​
A more complex form of machine learning is the neural network – a technology that has been available since the 1960s has been well established in healthcare research for several decades and has been used for categorization applications like determining whether a patient will acquire a particular disease. It views problems in terms of inputs, outputs and weights of variables or ‘features’ that associate inputs with outputs. It has been likened to the way that neurons process signals, but the analogy to the brain's function is relatively weak. Below is one of its applications in diagnosing diseases.
PURPOSE
The purpose of this research is to build a classifier that can correctly distinguish between Pneumonia and Covid-19. Why lung diseases?
100,000 Deaths per year due to the misdiagnosis of pneumonia. Wrongful diagnosis of pneumonia can be very life threatening given that it leads to an increase in severity due to lack of treatment. Especially in cases where the patient might have a more serious infection like COVID-19.
Pneumonia is the reason for 1 out of 6 childhood death making it the leading cause of fatality in kids under 5 years.
In the United States, the death rate of pneumonia is 10 out of every 100,000 individuals and this usually the rate in most developed countries. Meanwhile, in Africa, the death rate of pneumonia is 100 out of every 100,000 individuals and this is normal in most developing countries.
Model
The network used is VGG19 because it’s known for having pretty high accuracies for image classification problems so there's no doubt it would work perfectly for this problem. After importing the VGG19 model and set the appropriate weights for the type of images in the dataset and set the Include Top parameter to false. This will ensure that the last layer is drop and this was done because we don’t want to classify thousand different categories when the specific problem only has two categories. So, for this we skipped the last layer. The first layer is also dropped since we can simply provide our own image size as we did.
After that, the images were inserted with a batch size of 32 meaning 32 images should be used for training at a given instance, with image size of 64 X 64.
Covid-19 Models Results
The accuracy is 99 % and this is the amount of time the predicted result is actually correct. The recall percentage is 99% and this is the probability of the model diagnosing a correct positive diagnosis out of all the times it diagnosed positive. This would be the best metric in this case as we would rather give a wrong positive diagnosis than give a wrong negative diagnosis.
COVID-19 MODEL RESULTS
Pneumonia Models Results
The accuracy is 94 % and this is the amount of time the predicted result is actually correct.
The recall percentage is 95% and this is the probability of the model diagnosing a correct positive diagnosis out of all the times it diagnosed positive. This would be the best metric in this case as we would rather give a wrong positive diagnosis than give a wrong negative diagnosis.
PNEUMONIA MODEL RESULTS
The model loss is 0.17 out and this is the amount the model penalizes for incorrect predictions.
The AUC score is 0.90 and this is the average probability that the model can diagnose each X-ray image correctly.
Pneumonia vs Covid-19 Models Results
The Pneumonia model has a recall score of 100% for pneumonia, the covid model has a recall score of 93% for covid-19, and the pneumonia vs covid multi-classification model has a recall score of 100% for covid-19. They could be improved by trying different parameters but these scores are good enough as it is so Doctors and Radiologists are more than welcomed to integrate this models into their medical applications to help in the correct diagnosing of lung diseases, after thorough verification.
The model loss is 0.02 out and this is the amount the model penalizes for incorrect predictions.
The AUC score is 0.93 and this is the average probability that the model can diagnose each X-ray image correctly.
Recommendation
Use the vgg-19 model since it shows its 26% better at correctly diagnosing a covid case in the binary classification model and 15% better at correctly diagnosing a covid case in the multi-classification model.
Add a dropout layer before the final dense layer to dropout half of the output from the prior dense layer using 512 nodes in order to reduce overfitting when using the VGG19 model.
​Future Work
Other Lung Diseases: Create a classifier to differentiate pneumonia x-rays from other lung infections like Tuberculosis, etc.
Target Detection: Create a classifier to detect what section of the lungs the infection is located.
Model Improvement: Collect more data and tune more layers to the transfer learning model to improve its performance.