First Pipeline

Note

Running AMPLFI out-of-the-box requires access to an enterprise-grade GPU(s) (e.g. P100, V100, T4, A[30,40,100], H[100,200] etc.). There are several nodes on the LIGO Data Grid which meet these requirements.

After installing AMPLFI, you will have access to the amplfi-init command for initializing experiment directories:

> amplfi-init --help
usage: amplfi-init [-h] [--mode {flow,similarity}] [--pipeline {tune,train}] [-n NAME] [-d DIRECTORY] [--s3-bucket S3_BUCKET]

Initialize a directory with configuration files for running end-to-end amplfi training or tuning pipelines

options:
  -h, --help            Show this help message and exit.
  --mode {flow,similarity}
                        Either 'flow' or 'similarity'. Whether to setup a flow or similarity training (default: flow)
  --pipeline {tune,train}
                        Either 'train' or 'tune'. Whether to setup a tune or train pipeline (default: train)
  -n NAME, --name NAME  The name of the run. This will be used to create the run subdirectory. (required, type: str)
  -d DIRECTORY, --directory DIRECTORY
                        The parent directory where the data and subdirectories for runs will be stored. If not provided, the environment variable AMPLFI_RUNDIR will be used. (type: <class 'Path'>, default: null)
  --s3-bucket S3_BUCKET
                        (default: null)

For example, let’s initialize a run directory at ~/amplfi/my-runs for training a normalizing flow, and name it first-flow-run:

amplfi-init --mode flow --pipeline train --directory ~/amplfi/my-runs --name first-flow-run

Alternatively the --directory argument can be skipped by defining the AMPLFI_RUNDIR environment variable. This will be used as the parent directory for all runs.

export AMPLFI_RUNDIR=~/amplfi/my-runs
amplfi-init --mode flow --pipeline train --name first-flow-run

A run.sh will be created in the run directory that will look like:

#!/bin/bash
# Export environment variables
export AMPLFI_DATADIR=/home/albert.einstein/amplfi/my-runs/data/
export AMPLFI_OUTDIR=/home/albert.einstein/amplfi/my-runs/first-flow-run/
export AMPLFI_CONDORDIR=/home/albert.einstein/amplfi/my-runs/data/condor

# launch the data generation pipeline
LAW_CONFIG_FILE=/home/albert.einstein/amplfi/my-first-run/datagen.cfg law run amplfi.data.DataGeneration --workers 5

# launch training pipeline
amplfi-flow-cli fit --config cbc.yaml

This bash script consists of two steps:

  1. Querying gravitational wave strain data using a law workflow

  2. Training a normalizing flow using Pytorch Lightning

The data querying step is controlled by the datagen.cfg file configuration. This will query segments of science-mode strain data, and save them in the directory specified by the AMPLFI_DATADIR environment variable. This step uses htcondor for parallelization, and will save any condor log files to AMPLFI_CONDORDIR.

Note

If you already have a data directory consistent with the settings in :code:datagen.cfg, you can point :code:AMPLFI_DATADIR to it and the data generation step will automatically be skipped.

Once data querying is complete, training will begin. Training configuration is controlled by the train.yaml file. It’s imporant to get familiar with the training parameters, but the defaults should suffice for your first run. The training job will look in AMPLFI_DATADIR for strain data, and will save checkpoints and other training artifacts in AMPLFI_OUTDIR.

Once training has complete, sample corner plots, skymaps, probability-probability and searched area plots can be generated by running the test subcommand. Remember to pass your trained model weights, which are saved in the AMPLFI_OUTDIR directory. In this case, we pass the weights corresponding to the best validation score, which are automatically saved at $AMPLFI_OUTDIR/train_logs/best.ckpt

amplfi-flow-cli test --config /path/to/config.yaml --model.checkpoint=$AMPLFI_OUTDIR/train_logs/best.ckpt

Plots will be available in the $AMPLFI_OUTDIR