First Pipeline
==============

:::{note}
Running AMPLFI out-of-the-box requires access to an enterprise-grade GPU(s) (e.g. P100, V100, T4, A[30,40,100], H[100,200] etc.). There are several nodes on the LIGO Data Grid which meet these requirements.
:::

After [installing](./installation.md) `AMPLFI`, you will have access to the `amplfi-init` command for initializing experiment directories:

```console
> amplfi-init --help
usage: amplfi-init [-h] [--mode {flow,similarity}] [--pipeline {tune,train}] [-n NAME] [-d DIRECTORY] [--s3-bucket S3_BUCKET]

Initialize a directory with configuration files for running end-to-end amplfi training or tuning pipelines

options:
  -h, --help            Show this help message and exit.
  --mode {flow,similarity}
                        Either 'flow' or 'similarity'. Whether to setup a flow or similarity training (default: flow)
  --pipeline {tune,train}
                        Either 'train' or 'tune'. Whether to setup a tune or train pipeline (default: train)
  -n NAME, --name NAME  The name of the run. This will be used to create the run subdirectory. (required, type: str)
  -d DIRECTORY, --directory DIRECTORY
                        The parent directory where the data and subdirectories for runs will be stored. If not provided, the environment variable AMPLFI_RUNDIR will be used. (type: <class 'Path'>, default: null)
  --s3-bucket S3_BUCKET
                        (default: null)
```

For example, let's initialize a run directory at `~/amplfi/my-runs` for training a normalizing flow, and name it `first-flow-run`:

```console
amplfi-init --mode flow --pipeline train --directory ~/amplfi/my-runs --name first-flow-run
```

Alternatively the `--directory` argument can be skipped by defining the `AMPLFI_RUNDIR` environment variable. This will be used as the parent directory for all runs.

```console
export AMPLFI_RUNDIR=~/amplfi/my-runs
amplfi-init --mode flow --pipeline train --name first-flow-run
```

A `run.sh` will be created in the run directory that will look like:

```bash
#!/bin/bash
# Export environment variables
export AMPLFI_DATADIR=/home/albert.einstein/amplfi/my-runs/data/
export AMPLFI_OUTDIR=/home/albert.einstein/amplfi/my-runs/first-flow-run/
export AMPLFI_CONDORDIR=/home/albert.einstein/amplfi/my-runs/data/condor

# launch the data generation pipeline
LAW_CONFIG_FILE=/home/albert.einstein/amplfi/my-first-run/datagen.cfg law run amplfi.data.DataGeneration --workers 5

# launch training pipeline
amplfi-flow-cli fit --config cbc.yaml
```

This bash script consists of two steps:
1. Querying gravitational wave strain data using a [law](https://github.com/riga/law) workflow
2. Training a normalizing flow using [Pytorch Lightning](https://lightning.ai/docs/pytorch/stable/)

The data querying step is controlled by the `datagen.cfg` file configuration. This will query segments of science-mode strain data,
and save them in the directory specified by the `AMPLFI_DATADIR` environment variable. This step uses htcondor for parallelization,
and will save any condor log files to `AMPLFI_CONDORDIR`.

:::{note}
If you already have a data directory consistent with the settings in :code:`datagen.cfg`, you can point :code:`AMPLFI_DATADIR` to it and the data generation
step will automatically be skipped.
:::

Once data querying is complete, training will begin. Training configuration is controlled by the `train.yaml` file. It's imporant to get familiar with the training parameters, but the defaults should suffice for your first run. The training job will look in `AMPLFI_DATADIR` for strain data, and will save checkpoints and other training artifacts in `AMPLFI_OUTDIR`.

Once training has complete, sample corner plots, skymaps, probability-probability and searched area plots can be generated by running the `test` subcommand. 
Remember to pass your trained model weights, which are saved in the `AMPLFI_OUTDIR` directory. In this case,
we pass the weights corresponding to the best validation score, which are automatically saved at `$AMPLFI_OUTDIR/train_logs/best.ckpt`

```console
amplfi-flow-cli test --config /path/to/config.yaml --model.checkpoint=$AMPLFI_OUTDIR/train_logs/best.ckpt
```

Plots will be available in the `$AMPLFI_OUTDIR`