harmonization_clinicalcombat

ComBAT harmonization of clinical MRI data. It’s a ComBAT implementations for adapting sites to a reference site along with ready-to-run scripts to prepare datasets, fit a model, apply the harmonization and analyze the outputs. Two harmonization methods are available pairwise (linear) and clinic (non-linear).

Keywords : Harmonization, Pairwise, Clinical, ComBAT

Inputs

`Input 1`

Format : tuple path(ref_site), path(move_site)

	Type	Description	Mandatory	Pattern
ref_site	file	CSV for Reference site data for one metric, all bundles and subjects. Must include sid,site,bundle,metric,mean,age,sex,handedness,disease. The Disease column must include at least the label HC.	True	`*.{csv,csv.gz}`
move_site	file	CSV for Moving site data for one metric, all bundles and subjects. Must include sid,site,bundle,metric,mean,age,sex,handedness,disease. The Disease column must include at least the label HC.	True	`*.{csv,csv.gz}`

Outputs

`model`

Format : path(*.model.csv)

	Type	Description	Mandatory	Pattern
*.model.csv	file	Harmonization fitted model.	True	`*.model.csv`

`harmonizedsite`

Format : path(*.harmonized.csv.gz)

	Type	Description	Mandatory	Pattern
*.harmonized.csv.gz	file	Harmonized moving site data.	True	`*.harmonized.csv.gz`

`bdqc`

Format : path(*bhattacharrya.txt)

	Type	Description	Mandatory	Pattern
*bhattacharrya.txt	file	Bhattacharrya distance QC reports for harmonized data. This file contains the Bhattacharrya distance between the reference and moving site distributions (one file for pre-harmonization and one for post-harmonization). Lower values indicate better alignment between the two distributions. The first column indicates the number of HC subjects used to compute the distance. Other columns correspond to the bundles. The first row contains the bundle names and the second row contains the count (of HC subjects) or the Bhattacharrya distances.	True	`*bhattacharrya.txt`

`figures`

Format : path(*.png)

	Type	Description	Mandatory	Pattern
*.png	file	Figures generated to visualize harmonization results.	True	`*.png`

`plot_data_json`

Format : path(*.json)

	Type	Description	Mandatory	Pattern
*.json	file	JSON files used to properly plot the harmonization results in a downstream MultiQC report. These files contain the regression curves and the percentiles computed before and after harmonization for all bundles of a given metric. The files have the following structure: `{ “bundle_name_1”: { “metric1”: { “plot_1”: { … }, “plot_2”: { … }, }, “bundle_name_2”: { “metric1”: { “plot_1”: { … }, “plot_2”: { … }, } }`	True	`*.json`

`versions`

Format : path(versions.yml)

	Type	Description	Mandatory	Pattern
versions.yml	file	File containing software versions	True	`versions.yml`

Arguments (see process.ext)

	Type	Description	Default	Choices
method	string	Harmonization strategy to use clinic (non-linear) or pairwise (linear)	`clinic`
bundles	list	List of bundles subset used to plot. By default all bundles in the csv input file were used to compute model harmonization.	`all`
limit_age_range	boolean	Drop reference subjects outside the moving age range.	`disabled`
ignore_sex	boolean	Remove sex variable when estimating the model. If all subjects have the same value, it will be automatically applied.	`disabled`
ignore_handedness	boolean	Remove handedness variable when estimating the model. If all subjects have the same value, it will be automatically applied.	`disabled`
no_empiral_bayes	boolean	Ignore the empirical Bayesian estimate	`disabled`
regul_ref	integer	Ridge penalty applied to reference regression. Parameter use for clinic method only.	`0`
regul_mov	string	Moving site penalty or auto-tuning. Parameter use for clinic method only.	`-1`
degree	integer	Polynomial degree used for age. It depends on method used (1 for pairwise, 2 for clinic)	`None`	- 1 - 2
nu	integer	Variance hyperparameter for the moving site. Parameter use for clinic method only.	`5`
tau	integer	Covariate hyperparameter for the moving site. Parameter use for clinic method only.	`2`
degree_qc	integer	QC model degree override (0 reuses the harmonization degree).	`0`

Tools

	Description	DOI
clinical-ComBAT	Method for harmonizing MRI data across different sites.	10.48550/arXiv.2511.04871

Authors

@manonedde

Last updated : 2026-02-12