Skip to content

harmonization_clinicalcombat

ComBAT harmonization of clinical MRI data. It’s a ComBAT implementations for adapting sites to a reference site along with ready-to-run scripts to prepare datasets, fit a model, apply the harmonization and analyze the outputs. Two harmonization methods are available pairwise (linear) and clinic (non-linear).

Keywords : Harmonization, Pairwise, Clinical, ComBAT


Format : tuple path(ref_site), path(move_site)

TypeDescriptionMandatoryPattern
ref_sitefileCSV for Reference site data for one metric, all bundles and subjects. Must include sid,site,bundle,metric,mean,age,sex,handedness,disease. The Disease column must include at least the label HC.
True*.{csv,csv.gz}
move_sitefileCSV for Moving site data for one metric, all bundles and subjects. Must include sid,site,bundle,metric,mean,age,sex,handedness,disease. The Disease column must include at least the label HC.
True*.{csv,csv.gz}

Format : path(*.model.csv)

TypeDescriptionMandatoryPattern
*.model.csvfileHarmonization fitted model.True*.model.csv

Format : path(*.harmonized.csv.gz)

TypeDescriptionMandatoryPattern
*.harmonized.csv.gzfileHarmonized moving site data.True*.harmonized.csv.gz

Format : path(*bhattacharrya.txt)

TypeDescriptionMandatoryPattern
*bhattacharrya.txtfileBhattacharrya distance QC reports for harmonized data. This file contains the Bhattacharrya distance between the reference and moving site distributions (one file for pre-harmonization and one for post-harmonization). Lower values indicate better alignment between the two distributions. The first column indicates the number of HC subjects used to compute the distance. Other columns correspond to the bundles. The first row contains the bundle names and the second row contains the count (of HC subjects) or the Bhattacharrya distances.
True*bhattacharrya.txt

Format : path(*.png)

TypeDescriptionMandatoryPattern
*.pngfileFigures generated to visualize harmonization results.True*.png

Format : path(*.json)

TypeDescriptionMandatoryPattern
*.jsonfileJSON files used to properly plot the harmonization results in a downstream MultiQC report. These files contain the regression curves and the percentiles computed before and after harmonization for all bundles of a given metric. The files have the following structure:

{
“bundle_name_1”: {
“metric1”: {
“plot_1”: { … },
“plot_2”: { … },
},
“bundle_name_2”: {
“metric1”: {
“plot_1”: { … },
“plot_2”: { … },
}
}

True*.json

Format : path(versions.yml)

TypeDescriptionMandatoryPattern
versions.ymlfileFile containing software versionsTrueversions.yml

TypeDescriptionDefaultChoices
methodstringHarmonization strategy to use clinic (non-linear) or pairwise (linear)clinic
bundleslistList of bundles subset used to plot. By default all bundles in the csv input file were used to compute model harmonization.
all
limit_age_rangebooleanDrop reference subjects outside the moving age range.disabled
ignore_sexbooleanRemove sex variable when estimating the model. If all subjects have the same value, it will be automatically applied.
disabled
ignore_handednessbooleanRemove handedness variable when estimating the model. If all subjects have the same value, it will be automatically applied.
disabled
no_empiral_bayesbooleanIgnore the empirical Bayesian estimate
disabled
regul_refintegerRidge penalty applied to reference regression. Parameter use for clinic method only.
0
regul_movstringMoving site penalty or auto-tuning. Parameter use for clinic method only.
-1
degreeintegerPolynomial degree used for age. It depends on method used (1 for pairwise, 2 for clinic)
None- 1
- 2
nuintegerVariance hyperparameter for the moving site. Parameter use for clinic method only.
5
tauintegerCovariate hyperparameter for the moving site. Parameter use for clinic method only.
2
degree_qcintegerQC model degree override (0 reuses the harmonization degree).0

DescriptionDOI
clinical-ComBATMethod for harmonizing MRI data across different sites.10.48550/arXiv.2511.04871


Last updated : 2026-02-12