# README - Data analysis results for: "MoDLE: High-performance stochastic modeling of DNA loop extrusion interactions"

Data part of this archive was generated as part of the following publication: "MoDLE: High-performance stochastic modeling of DNA loop extrusion interactions" [preprint](https://www.biorxiv.org/content/10.1101/2022.04.13.488157v2)

The data analysis pipeline used to generate the data is hosted on [github.com/paulsengroup/2021-modle-paper-001-data-analysis](https://github.com/paulsengroup/2021-modle-paper-001-data-analysis/tree/v2.0.1) and archived on Zenodo [10.5281/zenodo.7072939](https://doi.org/10.5281/zenodo.7072939).

## Testing file integrity after download
Archives have been checksummed using SHA256.
To compare checksums, run the following command:
```bash
shasum -c checksums.sha256
```
NOTE 1: checksums should be checked before extracting the archives
NOTE 2: use option `--ignore-missing` when computing checksums for a subset of the TARs


## Extracting TAR files
Archived data consists of several compressed TAR files.
Extracting all the TAR files produces the file and folder layout listed in file `2021-modle-paper-001.tree`.

TAR archives are compressed using the [Zstandard (ZSTD)](https://facebook.github.io/zstd/) compression algorithm.

TARs can be extracted as follows:
```bash
zstd -dc --long=31 2021-modle-paper-001-data-containers.tar.zst | tar -xf -
```

This will create a folder named `2021-modle-paper-001` and extract the Docker image files part of the archive inside `2021-modle-paper-001/data/containers/`.

NOTE: Trying to extract archives directly won't work, as TARs were compressed using custom compression options.

## Navigating archived data
Each TAR archive contains a `README.md` file describing the archive content.
Archive `2021-modle-paper-001-readmes.tar` contains a copy of all README files (note: this archive is not compressed).

Archives also contain a `checksums.sha256` file which can be used to check file integrity after extraction (this is usually not necessary).

## Contact information
Inquiries regarding this dataset should be addressed to the corresponding author for "MoDLE: High-performance stochastic modeling of DNA loop extrusion interactions" (Jonas Paulsen).
