Short tutorial on running FlepiMop on your personal computer using a "Docker" container
See the Before any run section to ensure you have access to the correct files needed to run. On your local machine, determine the file paths to:
the directory containing the flepimop code (likely the folder you cloned from Github), which we'll call <dir1>
the directory containing your project code including input configuration file and population structure (again likely from Github), which we'll call <dir2>
For example, if you clone your Github repositories into a local folder called Github and are using the flepimop_sample as a project repository, your directory names could be _On Mac: ;
<dir1> = /Users/YourName/Github/flepiMoP
<dir2> = /Users/YourName/Github/flepimop_sample On Windows: <dir1> = C:\Users\YourName\Github\flepiMoP
<dir2> = C:\Users\YourName\Github\flepimop_sample\
(hint: if you navigate to a directory like C:\Users\YourName\Github
using cd C:\Users\YourName\Github
, modify the above <dir1>
paths to be .\flepiMoP
and .\flepimop_sample)
\
Note that Docker file and directory names are case sensitive
Docker is a software platform that allows you to build, test, and deploy applications quickly. Docker packages software into standardized units called containers that have everything the software needs to run including libraries, system tools, code, and runtime. This means you can run and install software without installing the dependencies in the local operating system.
A Docker container is an environment which is isolated from the rest of the operating system i.e. you can create files, programs, delete and everything but that will not affect your OS. It is a local virtual OS within your OS ;
For flepiMoP, we have a Docker container that will help you get running quickly ;
Make sure you have the Docker software installed, and then open your command prompt or terminal application ;
Helpful tools
To understand the basics of Docker, refer Docker Basics. The following Docker Tutorial may also be helpful ;
To install Docker for Mac, refer to the following link: Installing Docker for Mac. Pay special attention to the specific chip your Mac has (Apple Silicon vs Intel), as installation files and directions differ
To install Docker for Windows, refer to the following link: Installing Docker for Windows
To find the Windows Command Prompt, type “Command Prompt" in the search bar and open it. This Command Prompt Video Tutorial may be helpful for new users ;
To find the Apple Terminal, type "Terminal" in the search bar or go to Applications -> Utilities -> Terminal ;
First, make sure you have the latest version of the flepimop Docker (hopkinsidd/flepimop)
downloaded on your machine by opening your terminal application and entering:
Next, run the Docker image by entering the following, replace <dir1>
and <dir2>
with the path names for your machine (no quotes or brackets, just the path text):
On Windows: If you get an error, you may need to delete the "\" line breaks and submit as a single continuous line of code.
In this command, we run the Docker container, creating a volume and mounting (-v
) your code and project directories into the container. Creating a volume and mounting it to a container basically allocates space in Docker for it to mirror - and have read and write access - to files on your local machine ;
The folder with the flepiMoP code <dir2>
will be on the path flepimop
within the Docker environment, while the project folder will be at the path `drp. ;
You now have a local Docker container installed, which includes the R and Python versions required to run flepiMop with all the required packagers already installed ;
You don't need to re-run the above steps every time you want to run the model. When you're done using Docker for the day, you can simply "detach" from the container and pause it, without deleting it from your machine. Then you can re-attach to it when you next want to run the model ;
Create environmental variables for the paths to the flepimop code folder and the project folder:
Go into the code directory and do the installation the R and Python code packages
Each installation step may take a few minutes to run.
Note: These installations take place in the Docker container and not the local operating system. They must be made once while starting the container and need not be done for every time you run a model, provided they have been installed once. You will need an active internet connection for pulling the Docker image and installing the R packages (since some are hosted online), but not for other steps of running the model
Everything is now ready 🎉 The next step depends on what sort of simulation you want to run: One that includes inference (fitting model to data) or only a forward simulation (non-inference). Inference is run from R, while forward-only simulations are run directly from the Python package gempyor
.
In either case, navigate to the project folder and make sure to delete any old model output files that are there
An inference run requires a configuration file that has the inference
section. Stay in the $DATA_PATH
folder, and run the inference script, providing the name of the configuration file you want to run (ex. config.yml
;
This will run the model and create a lot of output files in $DATA_PATH/model_output/
;
The last few lines visible on the command prompt should be:
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
NULL
If you want to quickly do runs with options different from those encoded in the configuration file, you can do that from the command line, for example
where:
n
is the number of parallel inference slots,
j
is the number of CPU cores to use on your machine (if j
> n
, only n
cores will actually be used. If j
<n
, some cores will run multiple slots in sequence)
k
is the number of iterations per slots.
You can put all of this together into a single script that can be run all at once ;
Stay in the $DATA_PATH
folder, and run a simulation directly from forward-simulation Python package gempyor,
call gempyor-simulate
providing the name of the configuration file you want to run (ex. config.yml
;
It is currently required that all configuration files have an interventions
section. There is currently no way to simulate a model with no interventions, though this functionality is expected soon. For now, simply create an intervention that has value zero ;
You can put all of this together into a single script that can be run all at once ;
You can avoid repeating all the above steps every time you want to run the code. When the docker run
command creates an container, it is stored locally on your computer with all the installed packages/variables/etc you created. You can leave this container and come back to it whenever you want, without having to redo all this set up ;
When you're in the Docker container, figure out the name Docker has given to the container you created by typing
the output will be something silly like
write this down for later reference. You can also see the container name in the Docker Desktop app's Containers tab ;
To "detach" from the Docker container and stop it, type CTLR
+ c
The command prompt for your terminal application is now just running locally, not in the Docker container ;
Next time you want to re-start and "attach" the container, type
at the command line or hit the play button ▶️ beside the container's name in the Docker app. Replace container_name with the name for your old container ;
Then "attach" to the container by typing
The reason that stopping/starting a container is separate from detaching/attaching is that technically you can leave a container (and any processes within it) running in the background and exit it. In case you want to do that, detach and leave it running by typing CTRL
+ p
then quickly CTRL
+ q
. Then when you want to attach to it again, you don't need to do the part about starting the container ;
If you the core model code within the flepimop repository (flepimop/flepimop/gempyor_pkg/
or flepimop/flepimop/R_packages
) has been edited since you created the contained, or if the R or Python package requirements have changed, then you'll have to re-run the steps to install the packages, but otherwise, you can just start running model code!
Short tutorial on running locally using an "Anaconda" environment.
As is the case for any run, first see the section to ensure you have access to the correct files needed to run. On your local machine, determine the file paths to:
the directory containing the flepimop code (likely the folder you cloned from Github), which we'll call FLEPI_PATH
the directory containing your project code including input configuration file and population structure (again likely from Github), which we'll call DATA_PATH
For example, if you clone your Github repositories into a local folder called Github and are using the flepimop_sample as a project repository, your directory names could be On Mac:
<dir1> = /Users/YourName/Github/flepiMoP
<dir2> = /Users/YourName/Github/flepimop_sample On Windows: <dir1> = C:\Users\YourName\Github\flepiMoP
<dir2> = C:\Users\YourName\Github\flepimop_sample\
(hint: if you navigate to a directory like C:\Users\YourName\Github
using cd C:\Users\YourName\Github
, modify the above <dir1>
paths to be .\flepiMoP
and .\flepimop_sample)
Note again that these are best cloned flat.
conda
environmentOne of simplest ways to get everything to work is to build an Anaconda environment. Install (or update) Anaconda on your computer. We find that it is easiest to create your conda environment by installing required python packages, then installing R packages separately once your conda environment has been built as not all R packages can be found on conda.
You can either use the command line (here) or the graphical user interface (you just tick the packages you want). With the command line it's this one-liner:
Anaconda will take some time, to come up with a proposal that works with all dependencies. This creates a conda
environment named flepimop-env
that has all the necessary python packages.
The next step in preparing your environment is to install the necessary R packages. First, activate your environment, launch R and then install the following packages.
If you'd like, you can install rstudio
as a package as well.
Activate your conda environment, which we built above.
In this conda
environment, commands with R and python will uses this environment's R and python.
First, you'll need to fill in some variables that are used by the model. This can be done in a script (an example is provided at the end of this page). For your first time, it's better to run each command individually to be sure it exits successfully.
First, in myparentfolder
populate the folder name variables for the paths to the flepimop code folder and the project folder:
Go into the code directory (making sure it is up to date on your favorite branch) and do the installation required of the repository:
Each installation step may take a few minutes to run.
Note: These installations take place in your conda environment and not the local operating system. They must be made once while in your environment and need not be done for every time you run a model, provided they have been installed once. You will need an active internet connection for installing the R packages (since some are hosted online), but not for other steps of running the model.
Other environmental variables can be set at any point in process of setting up your model run. These options are listed in ... ADD ENVAR PAGE
For example, some frequently used environmental variables which we recommend setting are:
Everything is now ready. 🎉
The next step depends on what sort of simulation you want to run: One that includes inference (fitting model to data) or only a forward simulation (non-inference). Inference is run from R, while forward-only simulations are run directly from the Python package gempyor
.
In either case, navigate to the project folder and make sure to delete any old model output files that are there.
An inference run requires a configuration file that has an inference
section. Stay in the $DATA_PATH
folder, and run the inference script, providing the name of the configuration file you want to run (ex. config.yml
). In the example data folder (flepimop_sample), try out the example config XXX.
The last few lines visible on the command prompt should be:
[[1]]
[[1]][[1]]
[[1]][[1]][[1]]
NULL
If you want to quickly do runs with options different from those encoded in the configuration file, you can do that from the command line, for example
where:
n
is the number of parallel inference slots,
j
is the number of CPU cores to use on your machine (if j
> n
, only n
cores will actually be used. If j
< n
, some cores will run multiple slots in sequence)
k
is the number of iterations per slots.
Stay in the $DATA_PATH
folder, and run a simulation directly from forward-simulation Python package gempyor
. To do this, call gempyor-simulate
providing the name of the configuration file you want to run (ex. config.yml
). An example config is provided in flepimop_sample/config_sample_2pop_interventions.yml.
It is currently required that all configuration files have an interventions
section. There is currently no way to simulate a model with no interventions, though this functionality is expected soon. For now, simply create an intervention that has value zero.
You can also try to knit the Rmd file in flepiMoP/flepimop/gempyor_pkg/docs
which will show you how to analyze these files.
The following script does all the above commands in an easy script. Save it in myparentfolder
as quick_setup.sh
. Then, just go to myparentfolder
and type source quick_setup_flu.sh
and it'll do everything for you!
using Docker container
Spin up an Ubuntu submission box if not already running. To do this, log onto AWS Console and start the EC2 instance.
Update IP address in .ssh/config file. To do this, open a terminal and type the command below. This will open your config file where you can change the IP to the IP4 assigned to the AWS EC2 instance (see AWS Console for this):
SSH into the box. In the terminal, SSH into your box. Typically we name these instances "staging", so usually the command is:
Now you should be logged onto the AWS submission box. If you haven't yet, set up your directory structure.
Type the following commands:
Note that the repository is cloned nested, i.e the flepiMoP
repository is INSIDE the data repository.
Have your Github ssh key passphrase handy so you can paste it when prompted (possibly multiple times) with the git pull command. Alternatively, you can add your github key to your batch box so you don't have to enter your token 6 times per day.
Start up and log into the docker container, and run setup scripts to setup the environment. This setup code links the docker directories to the existing directories on your box. As this is the case, you should not run job submission simultaneously using this setup, as one job submission might modify the data for another job submission.
To set up the environment for your run, run the following commands. These are specific to your run, i.e., change VALIDATION_DATE
, FLEPI_RUN_INDEX
and RESUME_LOCATION
as required. If submitting multiple jobs, it is recommended to split jobs between 2 queues: Compartment-JQ-1588569569
and Compartment-JQ-1588569574
.
NOTE: If you are not running a resume run, DO NOT export the environmental variable RESUME_LOCATION
.
Additionally, if you want to profile how the model is using your memory resources during the run, run the following commands
Then prepare the pipeline directory (if you have already done that and the pipeline hasn't been updated (git pull
says it's up to date). You need to set $DATA_PATH to your data folder. For a COVID-19 run, do:
for Flu do:
Now for any type of run:
For now, just in case: update the arrow
package from 8.0.0 in the docker to 11.0.3 ;
Now flepiMoP is ready 🎉 ;
Do some clean-up before your run. The fast way is to restore the $DATA_PATH
git repository to its blank states (⚠️ removes everything that does not come from git):
Then run the preparatory data building scripts and you are good
Now you may want to test that it works :
If this fails, you may want to investigate this error. In case this succeeds, then you can proceed by first deleting the model_output:
Assuming that the initial test simulation finishes successfully, you will now enter credentials and submit your job onto AWS batch. Enter the following command into the terminal:
You will be prompted to enter the following items. These can be found in a file you received from Shaun called new_user_credentials.csv
.
Access key ID when prompted
Secret access key when prompted
Default region name: us-west-2
Default output: Leave blank when this is prompted and press enter (The Access Key ID and Secret Access Key will be given to you once in a file)
Now you're fully set to go 🎉
To launch the whole inference batch job, type the following command:
This command infers everything from you environment variables, if there is a resume or not, what is the run_id, etc., and the default is to carry seeding if it is a resume (see below for alternative options).
If you'd like to have more control, you can specify the arguments manually:
We allow for a number of different jobs, with different setups, e.g., you may not want to carry seeding. Some examples of appropriate setups are given below. No modification of these code chunks should be required ;
NOTE: Resume and Continuation Resume runs are currently submitted the same way, resuming from an S3 that was generated manually. Typically we will also submit any Continuation Resume run specifying
--resume-carry-seeding
as starting seeding conditions will be manually constructed and put in the S3.
Carrying seeding (do this to use seeding fits from resumed run):
Discarding seeding (do this to refit seeding again):
Single Iteration + Carry seeding (do this to produce additional scenarios where no fitting is required):
After the job is successfully submitted, you will now be in a new branch of the data repo. Commit the ground truth data files to the branch on github and then return to the main branch:
Send the submission information to slack so we can identify the job later. Example output:
This will run the model and create in $DATA_PATH/model_output/
.
If you still want to use git to clean the repo but want finer control or to understand how dangerous is the command, .
Tutorial on how to install and run flepiMoP on a supported HPC with slurm.
These details cover how to install and initialize flepiMoP
on an HPC environment and submit a job with slurm.
Currently only JHU's Rockfish and UNC's Longleaf HPC clusters are supported. If you need support for a new HPC cluster please file an issue in the flepiMoP
GitHub repository.
flepiMoP
This task needs to be ran once to do the initial install of flepiMoP
.
On JHU's Rockfish you'll need to run these steps in a slurm interactive job. This can be launched with /data/apps/helpers/interact -n 4 -m 12GB -t 4:00:00
, but please consult the Rockfish user guide for up to date information.
Obtain a temporary clone of the flepiMoP
repository. The install script will place a permanent clone in the correct location once ran. You may need to take necessary steps to setup git on the HPC cluster being used first before running this step.
Run the hpc_install_or_update.sh
script, substituting <cluster-name>
with either rockfish
or longleaf
. This script will prompt the user asking for the location to place the flepiMoP
clone and the name of the conda environment that it will create. If this is your first time using this script accepting the defaults is the quickest way to get started. Also, expect this script to take a while the first time that you run it.
Remove the temporary clone of the flepiMoP
repository created before. This step is not required, but does help alleviate confusion later.
flepiMoP
Updating flepiMoP
is designed to work just the same as installing flepiMoP
. Make sure that your clone of the flepiMoP
repository is set to the branch your working with (if doing development or operations work) and then run the hpc_install_or_update.sh
script, substituting <cluster-name>
with either rockfish
or longleaf
.
flepiMoP
EnvironmentThese steps to initialize the environment need to run on a per run or as needed basis.
Change directory to where a full clone of the flepiMoP
repository was placed (it will state the location in the output of the script above). And then run the hpc_init.sh
script, substituting <cluster-name>
with either rockfish
or longleaf
. This script will assume the same defaults as the script before for where the flepiMoP
clone is and the name of the conda environment. This script will also ask about a project directory and config, if this is your first time initializing flepiMoP
it might be helpful to clone the flepimop_sample
GitHub repository to the same directory to use as a test.
Upon completing this script it will output a sample set of commands to run to quickly test if the installation/initialization has gone okay.
When an inference batch job is launched, a few post processing scripts are called to run automatically postprocessing-scripts.sh.
You can manually change what you want to run by editing this script.
A batch job can can be submitted after this by running the following:
This launches a batch job to your HPC, with each slot on a separate node. This command attempts to infer the required arguments from your environment variables (i.e. if there is a resume or not, what is the run_id, etc.). The part after the "2" makes sure this file output is redirected to a script for logging, but has no impact on your submission.
If you'd like to have more control, you can specify the arguments manually:
More detailed arguments and advanced usage of the inference_job_launcher.py
script please refer to the --help
.
After the job is successfully submitted, you will now be in a new branch of the project repository. For documentation purposes, we recommend committing the ground truth data files to the branch on GitHub substituting <your-commit-message>
with a description of the contents:
During an inference batch run, log files will show the progress of each array/slot. These log files will show up in your project directory and have the file name structure:
To view these as they are being written, type:
or your file viewing command of choice. Other commands that are helpful for monitoring the status of your runs (note that <Job ID>
here is the SLURM job ID, not the JOB_NAME
set by flepiMoP):
squeue -u $USER
Displays the names and statuses of all jobs submitted by the user. Job status might be: R: running, P: pending.
seff <Job ID>
Displays information related to the efficiency of resource usage by the job
sacct
Displays accounting data for all jobs and job steps
scancel <Job ID>
This cancels a job. If you want to cancel/kill all jobs submitted by a user, you can type scancel -u $USER
Often you'll need to move files back and forth between your HPC and your local computer. To do this, your HPC might suggest Filezilla or Globus file manager. You can also use commands scp
or rsync
(check what works for your HPC).
If your system is approaching a file number quota, you can find subfolders that contain a large number of files by typing:
For running the model locally, especially for testing, non-inference runs, and short chains, we provide a guide for setting up and running in a conda environment, and provide a Docker container for use. A Docker container is an environment which is isolated from the rest of the operating system i.e. you can create files, programs, delete and everything but that will not affect your OS. It is a local virtual OS within your OS. We recommend Docker for users who are not familiar with setting up environments and seek a containerized environment to quickly launch jobs ;
For longer inference runs across multiple slots, we provide instructions and scripts for two methods to launch on SLURM HPC and on AWS using Docker. These methods are best for launching large jobs (long inference chains, multi-core and computationally expensive model runs), but not the best methods for debugging model setups.