Pysano is a custom service that runs on our HCI interactive linux analysis servers and simplifies the task of running heavy-duty analysis jobs on our compute nodes at CHPC. After submitting your job to pysano, it will transfer your data files and scripts to a cluster at CHPC, submit them to the slurm job manager for execution, watch for the jobs to finish, and transfer the results (new and changed files) back to your originating job directory on our local server. It will notify you of your job submission and completion by email, and some simple monitoring is available through a web URL provided in the submission email.
Pysano is designed as a simple, relatively easy-to-use service for researchers who just want to execute their jobs without dealing with file transfers, slurm commands, software resource and management, and CHPC user accounts. Advanced bioinformaticians familiar with cluster computing environments who need more control may opt to execute jobs directly at CHPC. Contact the Core if you need direct access to our CHPC nodes.
Pysano requires HCI network credentials to log into our analysis servers. For University of Utah investigators not in HCI, network access may be obtained; contact the Core for details.
Simple Start Guide
The following is a simple guide to get you started with pysano. It assumes that you have familiarity with using a Linux command line, that you can connect to the HCI interactive analysis servers through a terminal using SSH, and that you're able to execute applications. You can find resources to such elsewhere.
Pysano job directories must contain at least two items:
- File(s) to be worked on. Often this will be a Fastq file (or two for paired-end) for an alignment job.
cmd.txtpysano command file.
These are explained in detail below. For routine RNASeq differential expression analysis, see the RNASeq page for links on how to create these job directories and command scripts with a simple script.
The job directory contains all the data files needed for execution, one directory per job. Use one job per sample; do not add multiple samples to the same job directory. One job is executed on one cluster node, and multiple nodes may not be used with one job. Users are allowed to execute multiple jobs simultaneously up to a maximum; additional jobs are held in queue until a job finishes. These directories must be writable by the pysano user (group read and write bit set on). Create a directory for yourself in
/scratch, and place your job directories in there. You can also use
/tomato/dev/job (a historical location).
Place your files, for example Fastq files, in your job directories, one sample per job directory. For Fastq files from GNomEx, use the FDT command line application to download your files.
cmd.txt command file is essentially a bash script, with special directives at the top. Be sure to save the file as text with unix line endings. It requires an email directive line at the top, like so:
You may also specify a specific cluster with the
#c directive, for example to direct jobs to the kingspeak nodes
The remaining lines will be interpreted as a standard shell script. Keep in mind that comment lines should begin with # character immediately followed by a space (otherwise it may try to interpret it as a directive line). Do not mix comments with commands on the same line. An example alignment line may look something like this:
/tomato/dev/app/modulesoftware/cutadapt -j 8 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -o trimmed2.fastq.gz test.fastq.gz
Applications and Resources
We have available a number of installed software packages on both our local servers and at CHPC. For historical reasons, these are stored in
/tomato/dev/app. Please specify these commands in your
Genomic resources, such as genomic fasta files, annotation files, and alignment indexes, are stored in the directory
/tomato/dev/data. Please specify these files in your
We also have applications installed in
/home/BioApps and resources in
/home/Genomes. These are not shared with our CHPC nodes and cannot be used in pysano jobs.
Once your job directory is ready, it may be started using the
pstart command. Provide the path to your job directory to the
pstart command. If the job is accepted, you should receive an email at the address specified in your
cmd.txt file. The email will contain a URL to monitor the progress of the job (current state and elapsed time only). Pysano will generate a slurm script,
pbs.sh, based on your
cmd.txt file for execution.
When the job finishes, you will receive a second email announcing the completion. In addition to the result files, there will be two files,
stderr.txt, representing the standard output and standard error, respectively.
If you need to cancel a job for some reason, use the
pstop command. Please wait till you receive confirmation before deleting job files or directory.
If you need to check the command file, you can perform a dry run with the
You may find some advanced information in our user manual. You may also contact the Core with questions.