immuneML is a software platform for machine learning analysis of adaptive immune receptors and repertoires (AIRR).
This dataset contains the original specification files and complete results for immuneML use case 3: Benchmarking machine learning methods for AIRR classification on ground-truth synthetic data.
For more information about immuneML, see the documentation: https://docs.immuneml.uio.no/

The immuneML specification files in this dataset (full_simulation_001.yaml and full_benchmarking_001.yaml) are compatible with immuneML version 0.0.91. 
Results (airr.zip, simulation_html_output.zip, benchmarking_instruction_output.zip, benchmarking_html_output.zip) were generated with immuneML version 0.0.91.
For detailed information about this use case, and versions of these specification files compatible with the latest version of immuneML, see the documentation for this use case: https://docs.immuneml.uio.no/usecases/benchmarking_use_case.html


The use case consists of two steps:


- First, immuneML is used to implant 5 synthetic immune signals of different complexity into an existing dataset 
  (for details about this dataset, see: https://docs.immuneml.uio.no/usecases/benchmarking_use_case.html). 
  This was done using the configuration file full_simulation_001.yaml. 
  This resulted in the AIRR dataset which can be found in airr.zip and a summary HTML page in simulation_html_output.zip.


- Second, immuneML was used to benchmark machine learning methods (logistic regression, random forest and support vector machine) 
  with k-mer frequency encoding (with k-mer length of 3 and 4) using the simulated synthetic AIRR dataset generated in step 1. 
  This was done using the configuration file full_benchmarking_001.yaml. 
  The resulting HTML file including all plots can be found in benchmarking_html_output.zip. All raw output files produced by the instruction are available in benchmarking_instruction_output.zip.