# README for 2021-modle-paper-001/data/output/gw_param_optimization

This folder contains the input files obtained by running the [gw_param_optimization](https://github.com/paulsengroup/2021-modle-paper-001-data-analysis/blob/v2.0.1/workflows/gw_param_optimization.nf) Nextflow workflow.

Refer to [paulsengroup/2021-modle-paper-001-data-analysis/README.md](https://github.com/paulsengroup/2021-modle-paper-001-data-analysis/blob/v2.0.1/README.md) for instructions on how to run the workflow.

The workflow produces a folder with hundreds of files with very long and repetitive names.
With the intent of making the data easier to navigate, these files have been renamed and moved into two folders.
For example, file `optimization/modle_sim_param_optimization_tad_plus_loop_000_modle_sim_param_optimization_tad_plus_loop.tar` was renamed to `000_modle_sim_param_optimization_tad_plus_loop.tar` and moved inside folder `optimization/modle_sim_param_optimization_tad_plus_loop`.


## File description
- `checksums.sha256`: SHA256 checksums. Use `shasum -c checksums.sha256` to check file integrity.
- `GRCh38_H1_hESC_microC_4DNFI9GMP2J8_fixed_5000_transformed.cool`: Reference Micro-C contact matrix (bin size: 5kbp) transformed using the difference of Gaussians and discretization procedure described in the paper. This is used as reference during the parameter optimization.
- `GRCh38_H1_hESC_microC_*.tsv`: Extrusion barrier sites used for parameter optimization and validation.
- `mcools/`: Multi-resolution cooler files used to generate Fig. 4D and Supplementary Fig. 11.
- `optimization/`: Folder containing the result of the parameter optimization:
  - `modle_sim_param_optimization_loop_only/`: Result of the parameter optimization using file `modle_sim_param_optimization_search_space1.tsv` as input (see dataset `2021-modle-paper-001-data-input.tar.zst`).
  - `modle_sim_param_optimization_tad_plus_loop/`: Result of the parameter optimization using file `modle_sim_param_optimization_search_space2.tsv` as input (see dataset `2021-modle-paper-001-data-input.tar.zst`).
- `README.md`: this file.
- `stripenn/`: Folder containing the architectural stripes identified by Stripenn using `4DNFI9GMP2J8_fixed.mcool` as input matrix (see dataset `2021-modle-paper-001-data-output-preprocessing.tar.zst`).


Folders `modle_sim_param_optimization_loop_only` and `modle_sim_param_optimization_tad_plus_loop` have the following structure:
- `*.tar`: Each folder contains 400 `tar` archives. Each `tar` archive contains the files produced in a given parameter optimization epoch. The three-digit prefix identifies the epoch that generated the archived files.
- `*.pickle`: Serialized optimizer state at the end of the optimization. We used this file to generate some of the plots shown in the paper.
- `*.tsv`: TSV report of the parameter optimization.
- `*.png *.svg`: Plots summarizing the optimization run.
