In this project, I aim to work with 3D images and UNET models. Before I couldn’t have any chance to work with them thus I don’t have any idea what they are. But this project will be so educational for me. As well I aim to make practice in algorithms.
Dataset Link: https://ipp.cbica.upenn.edu/categories/brats2020
In order to download the dataset, first, you must sign up. After sign up, your account will be reviewed within 3 days. Then you can send a request to download data, after this step you will get an email (if I am not wrong) that contains REGISTRATION_STATUS.txt. In this txt file, you can find train and validation dataset links.
First I started with uploading data using by glob, glob is one of the awesome libraries of Python. Using that I can get paths. But defining glob, I used recursive=True which means reach subfiles.
After reading the data, now I need to convert to array. In order to do that, I will use skimage.io but first, it must be installed, thus:
pip install skimage
Now, I can read the data but first I will go on step-by-step. I meant I will show pic step-by-step, after all I will define a function to read data. I import skimage.io but I will use it as io whatever, using by io.imread() I can easily read the pic. But this function contains a really important parameter called plugin that means which kind of pics read, I’d rather choose simpleitk due to work on medical images. This simpleitk is used for medical images. But first I need to install that using this short magical word :
pip install SimpleITK
After installation, I am reading the pic :
As you can see the image is 3D, that’s great because I am working on 3D images for the first time.
Now I will visualize this 3D image axes-by-axes using by the traditional way (using matplotlib)
The image that was reviewed is not segmented. This time I will read and visualize the segmented image. For that, I will use skimage.io again.
As I read my data, now I will visualize this segmented img as 3 axes:
I need to convert all images to the array. To do that, I will define a function and then I will explain what it is.
This function makes you think. But no worries, I will explain now. Probably the second loop makes you think, so I will start there.
In data, there are no significant images before 60 and after 130. All significant images are between 60 and 130, so that’s the reason why I only make them contain 60 and 130. Also, you can find some images that are before 60 and after 130:
As you can see above, the tumor is not certain. Thus there is no meaning to contain all data. Also if all images are contained, it makes my pc force. Unfortunately, my pc doesn’t have enough space. Anyway...
Also, I want to explain what np.expand_dims() works.
I am working on 3D images, therefore I need to work with U-Net. If you work with U-Net I need 3 input dimensions that are width, height, and channel. But the images are 3D and also there is no channel.
But using by np.expand_dims() I can add one more size to axis=0
As you can see, now the image has a channel.
If I give a short summary,
- First I define a function
- I define an empty list called img_list that will be used for appending images in the end.
- Using by glob I reach the locations
- Then using by random library, I shuffled them
- I define a loop due to open images.
- After opening images made standardization and convert them float32
- Then I created another for loop due to work on limited images.
- In Axial (transverse) plane totally of 155 images are available. But I only wanted to work them in between 60 and 130. Now there are 70 images.
- I applied np.expand_dims() deu to add one more size that is called a channel.
- And finally I appended them to the image list that was defined before, and then it is returned as a np.array()!
1 - Non-enhancing Tumor
2 - Edema
4 - Enhancing Tumor
To understand clearly, I will visualize step-by-step tumor areas. But first I will show you up the segmented image again.
In the image above, you can see the layer by layer of Non-enhancing Tumor, Edema, and Enhancing Tumor.
First I will start with all area tumors, for that, I will equal all values 1 that do not equal 0. Thus only all areas of the tumor will appear clearly.
After that, I will go on with a Non-enhancing Tumor that is marked by 1. To make it show up, I will equal all area 0, except for those that equal 1.
And now I can visualize Edema, in order to do that, I need to cover (equal zero) Non-enhancing Tumor (1) and Enhancing Tumor (4). For that, if values equal 1 and 4, I will equal them 0. But if other values do not equal zero, I will equal them 1. These are what I’m expecting.
Then lastly, I can visualize Enhancing Tumor for that, Only I will cover those that do not equal 4. I meant I will only show up values that do not equal 4.
Now it is time to define a function that turns images into array.
Also, we need the original segmented image and I labeled it as 0. This original segmented image will be used for comparative.
But also, I can not use these functions because of the empty size of my computer. When I try I got Memory Error. Maybe next day, I can figure out that. I hope so.
Finally, I figured out that! Instead of jupyter notebook, I am using Google Colab. That’s the best way actually, even probably I will use it in my all projects :)
Anyway, I want to talk about how to upload data to colab, etc.
First, you must connect your Gmail account and go to the drive section. After this step, you must upload your data into drive. That’s all! Then open a colab notebook and start to code! I don’t know if there is another way, actually I didn’t searched for that because I was annoyed by my pc.
But this time, I face another problem. When I run the code that converts to array train images, google colab file gives a run time error. But when I run the code that converts to array for segmentation dataset, there is no problem, it works and I don’t face any error.
Also, I fixed up these errors :) what I did is resize all images. Their sizes were 240,240, after resize they have 120,120 sizes.
Before defining the model, I will define Flair, T2, and seg images. Also during training, I will use Flair + T2. The point is Flair and T2 will be added each self. The reason why I will add two images called Flair and T2 is that these two images complete their deficiency. For example, we can see clearly Edema in Flair but in T2, we can see the center of the Tumor clearly.
The shape of images are like that : (2, 120, 120),
if images are like that: (120, 120, 2) then I would use this code
When we use accuracy in U-NET models, we can not realize that if the model was trained or not. Thus we use Dice Coefficient. When we put images on top of each other, Dice Coefficient gives differences of pixels. In U-NET Models, input and output are images !! so therefore we should analyze the difference of pixels.
Unfortunately for 2 days, I have been trying to solve some UNET errors.
The first of them was about the TensorFlow version. In my environment, I’ve been using 1.13.1 tf version since 2017. Then my solution was, using Colab again. Because Colab uses 2.4.1 tf version.
Another and last error was that:
Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 512, 14, 14), (None, 512, 15, 15)]
When I googled that, what I found was related by the size of images. I’ve learned that when you work with Neural Networks, the sizes of your images must be divided by 32. Then I changed the sizes as (128,128)
Then using GPU of Colab I’ve trained my model in the 15th epochs.
In order to test my model, I will randomly choose an image and then I will make it predicted by the model.
BUT first I need to expand the size of my image. Because, during training, the model worked with 4 dimensions, thus I will use np.expand_dims() again.
First, its size was (2, 128, 128) after expand, its size is (1, 2, 128, 128)
Actually, all these parts were the easy part. Now it’s time to switch to the hard part!
Now I will find out a solution for segmenting separately Enhancing Tumor and center of Tumor.
In the above images, as you can see Non-enhancing Tumor and Enhancing Tumor images seem really less in the pictures. So what my aim is to crop useless areas of these images.
For that I will apply these steps:
I will get coordinates of Tumor Centers of Non-enhancing Tumor and Enhancing Tumor using by Numpy
Then I will be cropping images using these coordinates as the center point.
These cropping steps, all cropped images will be input for the UNET model for training.
After all, I will add the new output to the original image using the coordinate of the cropped image.
So how can I find the center point of Tumors? The answer is that using by np.where()
np.where gives coordinates as a list for 2 axes (x,y)
According to these results, when I calculate center point, what I need to do is that (min_value + max_value) / 2 gives me center point for 2 axes.
Now I can crop it but first I will create a matrix that has (64,64) sizes. After this matrix, which has only zeros, I will add T1ce image. But an important point is when I add them, I have to add 64/2 and also I have to subtract 64/2 these operations provide that the image will be middle of the matrix.
After T1ce image, I will do it same steps for the segmentation image.
In order to explain my steps, I only happened to all these steps just one image that is 280 images of the dataset.
It’s time to define a function that applies all these steps for all images!
As you can see in the above image, I defined my function. Also, I don’t think that I need to explain what they are because I believe that I mentioned them in a nice way!
After applying these functions, you can realize that all images are got bigger and they are more clear.
In the next step, all images will be turned into np.array by for loop.
BUT unfortunately, when I use this loop for all images, I get an error again, I checked it meticulously but then I realized that there is no error in code. Because, in order to check that, I’ve created a subset that has about 217 images (36 files of HGG), and when I use them, the code works! That means there is no error in code, the problem is on images. But I don’t know why. Next few days, probably I can work on that why.
The error is:
After using subset (a part of all dataset of HGG images)
As you can see, there is no change in code but it works for images that are in subset.
After this loop, all segmentation images will be in the center of the frame and also they look bigger, more clear!
Now, I will train a model again for cropped images
The model doesn’t obtain well results because of that I worked with a subset (less dataset) but next days, if I solve the error, I will upload that again.
In this great project, I’ve learned tons of useful knowledge about algorithms, computer vision, and above all, 3D images and UNET Models. The other important thing that I learned is absolutely fixing error :) they took a few days but I solved all of them! These projects are the best way to improve myself, facing new errors and solving them.
But actually, I can not say this project ends here because I have some idea about that I have to update this project but I can not because of my exams. Unfortunately, my exams are approaching so I need to study.
Some ideas that I have to add to this project:
- Improving crop function. I determined the size as (64, 64) but if the tumor is bigger than this size what will happen? The tumor can not fit in the frame. So I have to define a new function that contains some simple conditions according to tumor size.
Anyways maybe I can improve this function too.