RSNA dataset

`torchmil.datasets.RSNAMILDataset`

Bases: BinaryClassificationDataset, CTScanDataset

RSNA Intracranial Hemorrhage Detection dataset for Multiple Instance Learning (MIL). Download it from Hugging Face Datasets.

About the original RSNA Dataset. The original RSNA-ICH dataset contains head CT scans. The task is to identify whether a CT scan contains acute intracranial hemorrhage and its subtypes. The dataset includes a label for each slice.

Dataset description. We have preprocessed the CT scans by computing features for each slice using various feature extractors.

A slice is labeled as positive (slice_label=1) if it contains evidence of hemorrhage.
A CT scan is labeled as positive (label=1) if it contains at least one positive slice.

This means a CT scan is considered positive if there is any evidence of hemorrhage.

Directory structure.

After extracting the contents of the .tar.gz archives, the following directory structure is expected:

root
├── features
│   ├── features_{features}
│   │   ├── ctscan_name1.npy
│   │   ├── ctscan_name2.npy
│   │   └── ...
├── labels
│   ├── ctscan_name1.npy
│   ├── ctscan_name2.npy
│   └── ...
├── slice_labels
│   ├── ctscan_name1.npy
│   ├── ctscan_name2.npy
│   └── ...
└── splits.csv

Each .npy file corresponds to a single CT scan. The splits.csv file defines train/test splits for standardized experimentation.

`init(root, features='resnet50', partition='train', bag_keys=['X', 'Y', 'y_inst', 'adj', 'coords'], adj_with_dist=False, norm_adj=True, load_at_init=True)`

Parameters:

root (str) –

Path to the root directory of the dataset.
features (str, default: 'resnet50' ) –

Type of features to use. Must be one of ['resnet18', 'resnet50', 'vit_b_32']
partition (str, default: 'train' ) –

Partition of the dataset. Must be one of ['train', 'test'].
bag_keys (list, default: ['X', 'Y', 'y_inst', 'adj', 'coords'] ) –

List of keys to use for the bags. Must be in ['X', 'Y', 'y_inst', 'coords'].
adj_with_dist (bool, default: False ) –

If True, the adjacency matrix is built using the Euclidean distance between the patches features. If False, the adjacency matrix is binary.
norm_adj (bool, default: True ) –

If True, normalize the adjacency matrix.
load_at_init (bool, default: True ) –

If True, load the bags at initialization. If False, load the bags on demand.

`getitem(index)`

Parameters:

index (int) –

Index of the bag to retrieve.

Returns:

bag_dict ( TensorDict ) –
Dictionary containing the keys defined in bag_keys and their corresponding values.
- X: Features of the bag, of shape (bag_size, ...).
- Y: Label of the bag.
- y_inst: Instance labels of the bag, of shape (bag_size, ...).
- adj: Adjacency matrix of the bag. It is a sparse COO tensor of shape (bag_size, bag_size). If norm_adj=True, the adjacency matrix is normalized.
- coords: Coordinates of the bag, of shape (bag_size, coords_dim).

RSNA dataset

torchmil.datasets.RSNAMILDataset

__init__(root, features='resnet50', partition='train', bag_keys=['X', 'Y', 'y_inst', 'adj', 'coords'], adj_with_dist=False, norm_adj=True, load_at_init=True)

__getitem__(index)

`torchmil.datasets.RSNAMILDataset`

`init(root, features='resnet50', partition='train', bag_keys=['X', 'Y', 'y_inst', 'adj', 'coords'], adj_with_dist=False, norm_adj=True, load_at_init=True)`

`getitem(index)`