Analyzing Breast Cancer Imaging Data
The CBIS-DDSM (Curated Breast Imaging Subset of DDSM) is an updated and standardized version of the Digital Database for Screening Mammography (DDSM). It contains a subset of normal, benign and malignant cases with verified pathological information selected and verified by a trained mammographer.
The images are classified into following classes:
Mass-Training Full Mammogram images.
Mass-Training ROI and Cropped Images.
Calcified Training Full Mammogram.
Calcified Training ROI and Cropped images.
Along with the images there are two csv files, for Mass and Calcified training and test images. I have used a sample of the training csv files to visualize and understand some patterns in the data using Tableau.
Following are my observations:
In the first bar graph it is very evident that among Mass shape, irregular shaped malignant cases have maximum number when compared to others, and malignancy in left breast is marginally more than the right breast.
In Calcified type, Pleomorphic malignancy in the left breast has highest count when compared to other malignancy types.
Mass Abnormality Type
In the tree map we can see that the combination of abnormality 1, breast density 2 and subtlety 5 has the highest cases for Mass malignant type.
Calcified Abnormality Type
In the Calcified type, benign with abnormality type 1 and breast density 4 has the highest cases.
Overall when the data was analysed it was found that number of malignant cases were more than benign and benign without callback cases in both Mass and Calcified. Another evident outcome is that malignancy is more predominant in left breast when compared to the right breast.
There is lot more to understand about this data set. This is just the tip of the iceberg. For further analysis and model building, data can be downloaded from the CBIS-DDSM database.
This work is part of the AI Without Borders Initiative.