CTScan dataset
torchmil.datasets.CTScanDataset
Bases: ProcessedMILDataset
This class represents a dataset of Computed Tomography (CT) scans for Multiple Instance Learning (MIL).
MIL and CT scans. Computed Tomography (CT) scans are medical imaging techniques that use X-rays to obtain detailed images of the body. Usually, a CT scan is a 3D volume and is composed of a sequence of slices. Each slice is a 2D image that represents a cross-section of the body. In the context of MIL, a CT scan is considered a bag, and the slices are considered instances.
Directory structure.
It is assumed that the bags have been processed and saved as numpy files.
For more information on the processing of the bags, refer to the ProcessedMILDataset
class.
This dataset expects the following directory structure:
features_path
├── ctscan1.npy
├── ctscan2.npy
└── ...
labels_path
├── ctscan1.npy
├── ctscan2.npy
└── ...
inst_labels_path
├── ctscan1.npy
├── ctscan2.npy
└── ...
Order of the slices and the adjacency matrix. This dataset assumes that the slices of the CT scans are ordered. An adjacency matrix \(\mathbf{A} = \left[ A_{ij} \right]\) is built using this information:
where \(\mathbf{x}_i \in \mathbb{R}^d\) and \(\mathbf{x}_j \in \mathbb{R}^d\) are the features of instances \(i\) and \(j\), respectively.
__init__(features_path, labels_path, slice_labels_path=None, ctscan_names=None, bag_keys=['X', 'Y', 'y_inst', 'adj', 'coords'], adj_with_dist=False, norm_adj=True, load_at_init=True)
Class constructor.
Parameters:
-
features_path
(str
) –Path to the directory containing the matrices of the CT scans
-
labels_path
(str
) –Path to the directory containing the labels of the CT scans.
-
slice_labels_path
(str
, default:None
) –Path to the directory containing the labels of the slices.
-
ctscan_names
(list
, default:None
) –List of the names of the CT scans to load. If None, all CT scans in the
features_path
directory are loaded. -
bag_keys
(list
, default:['X', 'Y', 'y_inst', 'adj', 'coords']
) –List of keys to use for the bags. Must be in ['X', 'Y', 'y_inst', 'coords'].
-
adj_with_dist
(bool
, default:False
) –If True, the adjacency matrix is built using the Euclidean distance between the slices features. If False, the adjacency matrix is binary.
-
norm_adj
(bool
, default:True
) –If True, normalize the adjacency matrix.
-
load_at_init
(bool
, default:True
) –If True, load the bags at initialization. If False, load the bags on demand.
__getitem__(index)
Parameters:
-
index
(int
) –Index of the bag to retrieve.
Returns:
-
bag_dict
(TensorDict
) –Dictionary containing the keys defined in
bag_keys
and their corresponding values.- X: Features of the bag, of shape
(bag_size, ...)
. - Y: Label of the bag.
- y_inst: Instance labels of the bag, of shape
(bag_size, ...)
. - adj: Adjacency matrix of the bag. It is a sparse COO tensor of shape
(bag_size, bag_size)
. Ifnorm_adj=True
, the adjacency matrix is normalized. - coords: Coordinates of the bag, of shape
(bag_size, coords_dim)
.
- X: Features of the bag, of shape