Pysano

Pysano is a custom service that runs on our HCI interactive linux analysis servers and simplifies the task of running heavy-duty analysis jobs on our compute nodes at CHPC. After submitting your job to pysano, it will transfer your data files and scripts to a cluster at CHPC, submit them to the slurm job manager for execution, watch for the jobs to finish, and transfer the results (new and changed files) back to your originating job directory on our local server. It will notify you of your job submission and completion by email, and some simple monitoring is available through a web URL provided in the submission email.

Pysano is designed as a simple, relatively easy-to-use service for researchers who just want to execute their jobs without dealing with file transfers, slurm commands, software resource and management, and CHPC user accounts. Advanced bioinformaticians familiar with cluster computing environments who need more control may opt to execute jobs directly at CHPC. Contact the Core if you need direct access to our CHPC nodes.

Pysano requires HCI network credentials to log into our analysis servers. For University of Utah investigators not in HCI, network access may be obtained; contact the Core for details.

Simple Start Guide

The following is a simple guide to get you started with pysano. It assumes that you have familiarity with using a Linux command line, that you can connect to the HCI interactive analysis servers through a terminal using SSH, and that you're able to execute applications. You can find resources to such elsewhere.

Pysano job directories must contain at least two items:

  1. File(s) to be worked on. Often this will be a Fastq file (or two for paired-end) for an alignment job.
  2. A cmd.txt pysano command file.

These are explained in detail below. For routine RNASeq differential expression analysis, see the RNASeq page for links on how to create these job directories and command scripts with a simple script.

Job Directory

The job directory contains all the data files needed for execution, one directory per job. Use one job per sample; do not add multiple samples to the same job directory. One job is executed on one cluster node, and multiple nodes may not be used with one job. Users are allowed to execute multiple jobs simultaneously up to a maximum; additional jobs are held in queue until a job finishes. These directories must be writable by the pysano user (group read and write bit set on). Create a directory for yourself in /scratch, and place your job directories in there. You can also use /tomato/dev/job (a historical location).

Job Files

Place your files, for example Fastq files, in your job directories, one sample per job directory. For Fastq files from GNomEx, use the FDT command line application to download your files.

Command File

The cmd.txt command file is essentially a bash script, with special directives at the top. Be sure to save the file as text with unix line endings. It requires an email directive line at the top, like so:

#e your.name@hci.utah.edu

You may also specify a specific cluster with the #c directive, for example to direct jobs to the kingspeak nodes

#c kingspeak

The remaining lines will be interpreted as a standard shell script. Keep in mind that comment lines should begin with # character immediately followed by a space (otherwise it may try to interpret it as a directive line). Do not mix comments with commands on the same line. An example alignment line may look something like this:

/tomato/dev/app/modulesoftware/cutadapt -j 8 -a AGATCGGAAGAGCACACGTCTGAACTCCAGTCA -o trimmed2.fastq.gz test.fastq.gz

Applications and Resources

We have available a number of installed software packages on both our local servers and at CHPC. For historical reasons, these are stored in /tomato/dev/app. Please specify these commands in your cmd.txt file.

Genomic resources, such as genomic fasta files, annotation files, and alignment indexes, are stored in the directory /tomato/dev/data. Please specify these files in your cmd.txt file.

We also have applications installed in /home/BioApps and resources in /home/Genomes. These are not shared with our CHPC nodes and cannot be used in pysano jobs.

Job Control

Once your job directory is ready, it may be started using the pstart command. Provide the path to your job directory to the pstart command. If the job is accepted, you should receive an email at the address specified in your cmd.txt file. The email will contain a URL to monitor the progress of the job (current state and elapsed time only). Pysano will generate a slurm script, pbs.sh, based on your cmd.txt file for execution.

When the job finishes, you will receive a second email announcing the completion. In addition to the result files, there will be two files, stdout.txt and stderr.txt, representing the standard output and standard error, respectively.

If you need to cancel a job for some reason, use the pstop command. Please wait till you receive confirmation before deleting job files or directory.

If you need to check the command file, you can perform a dry run with the pdryrun command.

Help

You may find some advanced information in our user manual. You may also contact the Core with questions.

Contact Us

Bioinformatic Analysis Director
David Nix, PhD
david.nix@hci.utah.edu
801-587-4611

Bioinformatic Analysis Associate Director
Timothy Parnell, PhD
timothy.parnell@hci.utah.edu
801-587-4312

Governance

HCI Senior Director Oversight
Alana Welm, PhD

Faculty Advisory Committee Chair
Katherine Varley, PhD

Faculty Advisory Committee Members
Richard Clark, PhD
Jason Gertz, PhD
Christopher Gregg, PhD
Philip Moos, PhD
Sean Tavtigian, PhD
Katherine Varley, PhD
Joseph Yost, PhD