ISMRM 2015 Tractography challenge - Evaluation
The original goal of the Challenge was to evaluate all submissions using the exact same
metrics and techniques as in the original
Tractometer system and
However, during the initial evaluation phase, it became evident that the classical techniques were too restrictive when used with datasets that really simulate real-world conditions. For example, the classical approach uses masks of the endpoints of bundles to determine if streamlines are valid connections. However, with a high number of close bundles, and endpoints masks being only 1 voxel thick, most of the streamlines were not classified as valid, even if a visual inspection showed them to be quite close to the groundtruth bundles.
In order to have evaluation results that more closely match the observed reality of the submitted datasets, we developed an improved "relaxed" scoring technique. All results that are now viewable on this website were obtained with that improved technique. Details of the technique will be discussed later on, and will be presented in an upcoming paper.
Global connectivity metrics definitions
Global connectivity metrics were used. All definitions and description are given in the Cote et al. Tractometer paper. We detail them here, updating them in the context of the relaxed scoring technique.
The number of valid bundles that were correctly reconstructed in the contestant's submission and that exist in the groundtruth data. In the context of the challenge, there were 25 groundtruth bundles.
Valid connections (VC)
The percentage of streamlines that were part of the Valid Bundles.
The number of bundles that seemed realistic but were not matched to a known groundtruth bundle. Those are bundles that can be extracted from the submitted dataset, but do not match any existing bundle.
Invalid connections (VC)
The percentage of streamlines that were part of the Invalid Bundles.
The percentage of streamlines that were not assigned to VC or IC. They normally are very short streamlines, or streamlines that are alone in their shape and position, meaning that when clustered, they still are singletons.
Fidelity metrics definitions
Additionnal metrics were implemented for the challenge evaluation. The so-called Fidelity metrics aim to give a general overview of the coverage of VB over their groundtruth counterparts. This aims to complement the global connectivity metrics, since a submission can find a specific valid bundle, but have a very poor coverage, since it may have found only a few streamlines to represent that bundle.
Proportion of the voxels within the volume of a ground truth bundle that is traversed by at least one valid streamline associated with the bundle. This value shows how well the tractography result recovers the original volume of the bundle.
Bundle overreach (OR)
Fraction of voxels outside the volume of a ground truth bundle that is traversed by at least one valid streamline associated with the bundle over the total number of voxels within the ground truth bundle. This value shows how much the valid connections extend beyond the ground truth bundle volume.
Angular error scores
Local Angular Error
The mean voxel-wise angular error between the main local tractogram fiber directions and the respective ground truth fiber directions. Missing directions are penalized with the maximum 90 degree error.
Values are split in 2 different categories: voxels containing 1 fiber population, and voxels with crossing fiber populations.