Pysano is a custom service that runs on our HCI interactive linux analysis servers and simplifies the task of running heavy-duty analysis jobs on our compute nodes at CHPC. After submitting your job to pysano, it will transfer your data files and scripts to a cluster at CHPC, submit them to the slurm job manager for execution, watch for the jobs to finish, and transfer the results (new and changed files) back to your originating job directory on our local server. It will notify you of your job submission and completion by email, and some simple monitoring is available through a web URL provided in the submission email.
Pysano is designed as a simple, relatively easy-to-use service for researchers who just want to execute their jobs without dealing with file transfers, slurm commands, software resource and management, and CHPC user accounts. Advanced bioinformaticians familiar with cluster computing environments who need more control may opt to execute jobs directly at CHPC. Email CBI if you need direct access to our CHPC nodes.
Pysano requires HCI network credentials to log into our analysis servers. For University of Utah investigators not in HCI, network access may be obtained; email CBI for details.
Simple Start Guide
The following is a simple guide to get you started with pysano. It assumes that you have familiarity with using a Linux command line, that you can connect to the HCI interactive analysis servers through a terminal using SSH, and that you're able to execute applications. You can find resources to such elsewhere.
Pysano job directories must contain at least two items:
- File(s) to be worked on. Often this will be a Fastq file (or two for paired-end) for an alignment job.
- A
cmd.txt
pysano command file.
These are explained in detail below. For routine RNASeq differential expression analysis, see the RNASeq page for links on how to create these job directories and command scripts with a simple script.
Job Directory
The job directory contains all the data files needed for execution, one directory per job. Use one job per sample; do not add multiple samples to the same job directory. One job is executed on one cluster node, and multiple nodes may not be used with one job. Users are allowed to execute multiple jobs simultaneously up to a maximum; additional jobs are held in queue until a job finishes. These directories must be writable by the pysano user (group read and write bit set on). Create a directory for yourself in /scratch
, and place your job directories in there. You can also use /tomato/dev/job
(a historical location).
Job Files
Place your files, for example Fastq files, in your job directories, one sample per job directory. For Fastq files from GNomEx, use the FDT command line application to download your files.
Command File
The cmd.txt
command file is essentially a bash script, with special directives at the top. Be sure to save the file as text with unix line endings. It requires an email directive line at the top, like so:
#e your.name@hci.utah.edu
You may also specify a specific cluster with the #c
directive, for example to direct jobs to the kingspeak nodes
#c kingspeak
The remaining lines will be interpreted as a standard shell script. Keep in mind that comment lines should begin with # character immediately followed by a space (otherwise it may try to interpret it as a directive line). Do not mix comments with commands on the same line. An example alignment line may look something like this:
/tomato/dev/app/modulesoftware/cutadapt -j 8 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -o trimmed2.fastq.gz test.fastq.gz
Applications and Resources
We have available a number of installed software packages on both our local servers and at CHPC. For historical reasons, these are stored in /tomato/dev/app
. Please specify these commands in your cmd.txt
file.
Genomic resources, such as genomic fasta files, annotation files, and alignment indexes, are stored in the directory /tomato/dev/data
. Please specify these files in your cmd.txt
file.
We also have applications installed in /home/BioApps
and resources in /home/Genomes
. These are not shared with our CHPC nodes and cannot be used in pysano jobs.
Job Control
Once your job directory is ready, it may be started using the pstart
command. Provide the path to your job directory to the pstart
command. If the job is accepted, you should receive an email at the address specified in your cmd.txt
file. The email will contain a URL to monitor the progress of the job (current state and elapsed time only). Pysano will generate a slurm script, pbs.sh
, based on your cmd.txt
file for execution.
When the job finishes, you will receive a second email announcing the completion. In addition to the result files, there will be two files, stdout.txt
and stderr.txt
, representing the standard output and standard error, respectively.
If you need to cancel a job for some reason, use the pstop
command. Please wait till you receive confirmation before deleting job files or directory.
If you need to check the command file, you can perform a dry run with the pdryrun
command.
Help
You may find some advanced information in our user manual. You may also email CBI with questions.