All are welcome to contribute to flepiMoP! The easiest way is to open an issue on GitHub if you encounter a bug or if you have an issue with the framework. We would be very happy to help you out.
If you want to contribute code, fork the flepiMoP repository, modify it, and submit your Pull Request (PR). In order to be merged, a pull request need:
the approval of two reviewers AND
the continuous integration (CI) tests passing.
The "heart" of the pipeline, gempyor, is written in Python taking advantage of just-in-time compilation (via numba
) and existing optimized libraries (numpy
, pandas
). If you would like to help us build gempyor, here is some useful information.
We make extensive use of the following packages:
click for managing the command-line arguments
confuse for accessing the configuration file
numba to just-in-time compile the core model
sympy to parse the model equations
pyarrow as parquet is our main data storage format
xarray, which provides labels in the form of dimensions, coordinates and attributes on top of raw NumPy multidimensional arrays, for performance and convenience ;
emcee for inference, as an option
graphviz to export transition graph between compartments
pandas, numpy, scipy, seaborn, matplotlib and tqdm like many Python projects
One of the current focus is to switch internal data types from dataframes and numpy array to xarrays!
To run the tests suite locally, you'll need to install the gempyor package with build dependencies:
which installs the pytest
and mock
packages in addition to all other gempyor dependencies so that one can run tests.
If you are running from a conda environment and installing with `--no-deps`, then you should make sure that these two packages are installed.
Now you can try to run the gempyor test suite by running, from the flepimop/gempyor_pkg
folder:
If that works, then you are ready to develop gempyor. Feel free to open your first pull request.
If you want more output on tests, e.g capturing standard output (print), you can use:
and to run just some subset of the tests (e.g here just the outcome tests), use:
For more details on how to use pytest
please refer to their usage guide.
We try to remain close to Python conventions and to follow the updated rules and best practices. For formatting, we use black, the Uncompromising Code Formatter before submitting pull requests. It provides a consistent style, which is useful when diffing. To get started with black please refer to their Getting Started guide. We use a custom length of 92 characters as the baseline is short for scientific code. Here is the line to use to format your code:
For those using a Mac or Linux system for development this command is also available for use by calling ./bin/lint
. Similarly, you can take advantage of the formatting pre-commit hook found at bin/pre-commit
. To start using it copy this file to your git hooks folder:
The code is structured so that each of the main classes owns a config segment, and only this class should parse and build the related object. To access this information, other classes first need to build the object.
Below, this page is still underconstruction
The main classes are:
Coordinates:
this is a light class that stores all the coordinates needed by every other class (e.g the time serie
Parameter
Compartments
Modifers
Seeding
,
InitialConditions
a writeDF
function to plot
(TODO: detail pipeline internal API)
Here are some notes useful to improve the batch submission:
Setup site wide Rprofile.
SLURM copies your environment variables by default. You don't need to tell it to set a variable on the command line for sbatch. Just set the variable in your environment before calling sbatch.
There are two useful environment variables that SLURM sets up when you use job arrays:
SLURM_ARRAY_JOB_ID, specifies the array's master job ID number. SLURM_ARRAY_TASK_ID, specifies the job array index number. https://help.rc.ufl.edu/doc/Using_Variables_in_SLURM_Jobs
SLURM does not support using variables in the #SBATCH lines within a job script (for example, #SBATCH -N=$REPS will NOT work). A very limited number of variables are available in the #SBATCH just as %j for JOB ID. However, values passed from the command line have precedence over values defined in the job script. and you could use variables in the command line. For example, you could set the job name and output/error files can be passed on the sbatch command line:
However note in this example, the output file doesn't have the job ID which is not available from the command line, only inside the sbatch shell script.
launch_job.py and runner.py for non inference job
inference_job.py launch a slurm or aws job, where it uses
`inference_runner.sh` and inference_copy.sh for aws
;batch/inference_job.run for slurm