RSNA dataset
torchmil.datasets.RSNAMILDataset
Bases: BinaryClassificationDataset
, CTScanDataset
RSNA Intracranial Hemorrhage Detection dataset for Multiple Instance Learning (MIL). Download it from Hugging Face Datasets.
About the original RSNA Dataset. The original RSNA-ICH dataset contains head CT scans. The task is to identify whether a CT scan contains acute intracranial hemorrhage and its subtypes. The dataset includes a label for each slice.
Dataset description. We have preprocessed the CT scans by computing features for each slice using various feature extractors.
- A slice is labeled as positive (
slice_label=1
) if it contains evidence of hemorrhage. - A CT scan is labeled as positive (
label=1
) if it contains at least one positive slice.
This means a CT scan is considered positive if there is any evidence of hemorrhage.
Directory structure.
After extracting the contents of the .tar.gz
archives, the following directory structure is expected:
root
├── features
│ ├── features_{features}
│ │ ├── ctscan_name1.npy
│ │ ├── ctscan_name2.npy
│ │ └── ...
├── labels
│ ├── ctscan_name1.npy
│ ├── ctscan_name2.npy
│ └── ...
├── slice_labels
│ ├── ctscan_name1.npy
│ ├── ctscan_name2.npy
│ └── ...
└── splits.csv
Each .npy
file corresponds to a single CT scan. The splits.csv
file defines train/test splits for standardized experimentation.
__init__(root, features='resnet50', partition='train', bag_keys=['X', 'Y', 'y_inst', 'adj', 'coords'], adj_with_dist=False, norm_adj=True, load_at_init=True)
Parameters:
-
root
(str
) –Path to the root directory of the dataset.
-
features
(str
, default:'resnet50'
) –Type of features to use. Must be one of ['resnet18', 'resnet50', 'vit_b_32']
-
partition
(str
, default:'train'
) –Partition of the dataset. Must be one of ['train', 'test'].
-
bag_keys
(list
, default:['X', 'Y', 'y_inst', 'adj', 'coords']
) –List of keys to use for the bags. Must be in ['X', 'Y', 'y_inst', 'coords'].
-
adj_with_dist
(bool
, default:False
) –If True, the adjacency matrix is built using the Euclidean distance between the patches features. If False, the adjacency matrix is binary.
-
norm_adj
(bool
, default:True
) –If True, normalize the adjacency matrix.
-
load_at_init
(bool
, default:True
) –If True, load the bags at initialization. If False, load the bags on demand.
__getitem__(index)
Parameters:
-
index
(int
) –Index of the bag to retrieve.
Returns:
-
bag_dict
(TensorDict
) –Dictionary containing the keys defined in
bag_keys
and their corresponding values.- X: Features of the bag, of shape
(bag_size, ...)
. - Y: Label of the bag.
- y_inst: Instance labels of the bag, of shape
(bag_size, ...)
. - adj: Adjacency matrix of the bag. It is a sparse COO tensor of shape
(bag_size, bag_size)
. Ifnorm_adj=True
, the adjacency matrix is normalized. - coords: Coordinates of the bag, of shape
(bag_size, coords_dim)
.
- X: Features of the bag, of shape