# Troubleshooting

# The Nextflow Process

You can follow the Nextflow process by following what is printed to stdout. Additionally, the Nextflow Java process creates a .nextflow.log file in the run directory where warnings and errors are logged.

Note

The last executed Nextflow call in the run directory will be in the first few lines of .nextflow.log.
Repeated calls to nextflow run in the same directory renames older output files from the Nextflow process, for example .nextflow.log.3 is from three runs prior to the current one.

# Individual Jobs

A job running a single process inside the pipeline can fail due to inadequate resources, which will trigger a re-run with increased resources. For other failures, you need to look inside the work directory. In the .html reports generated by default (or the trace.txt file) in the run directory you will find for each process run its status (COMPLETED; CACHED if resumed and completed in a prior run; and FAILED if an error occured) and a hash. The hash indicates the subdirectory in which the process was run (for example a4/00365e points to work/a4/00365e9190eca55907746edeb58f77). In this directory you find the following files which are useful for troubleshooting:

.command.run: This is the actual script which sets environment variables and runs .command.sh, Nextflow submits this to LSF using bsub. You can manually resubmit it by running bsub < .command.run.
.command.sh: This contains the command-line calls that are defined in the corresponding process in pipeline.nf.
.command.log: Contains stdout from the process itself and bsub.
.command.out: stdout from the process.
.command.err: stderr from the process.

Additionally, any files used by the process are symlinked in the work directory, and any intermediate and final output files are also left here.

# Standard LSF Errors

When debugging pipeline runs, there are common errors one encounters. An incomplete list of LSF job exit codes is provided below:

error code 0 --- this means the jobs was considered successfully run
error code 1 --- this is a standard error code, which normally could mean something is wrong with the code itself
error code 130 --- this means there is not enough memory for the process to complete
error code 140 --- this means there was not enough time requested via LSF, which is translated from Nextflow into bsub -W as detailed here (opens new window)

# Singularity Errors

The most common error one sees with Singularity is the error Failed to pull singularity image, e.g.

ERROR ~ Error executing process > 'VariantCaller'

Caused by:
  Failed to pull singularity image
  command: singularity pull  --name cmopipeline-variantcaller-1.0.0.img docker://cmopipeline/variantcaller:1.0.0 > /dev/null
  status : 255
  message:
    [33mWARNING: Authentication token file not found : Only pulls of public images will succeed
    INFO:    Starting build...
    Getting image source signatures
    Skipping fetch of repeat blob sha256:g2wi99s7f5ij1buyrr0ep5tf8xjfk05lwc9vr3adlbgxbw1zvxlmx8053n4lvmsm
    Skipping fetch of repeat blob sha256:z1k6kz1157i0vy8r9eutu3cmfzp48wyeoxyusha8r681x725o4vwb468952vaao3
    Skipping fetch of repeat blob sha256:twc7y14h8qub3vi5i8vxvp0qxhtw01mee7nc5j7qjyhol5nx4e22fjl5kawlzf53
    Skipping fetch of repeat blob sha256:ex0gy93dwd19y35433v8n4kcozowo964jx8zt088ltd9edw8a5gob94qc9coyhc6
    Skipping fetch of repeat blob sha256:n54clmgy1tep6l409gdnkf980nvm1607oa6jr34po8q2v7u2l82o3z4rq9k6ctvg
    Copying config sha256:4s26zd3jojt2mch7t41ek56uabitwg91p7b530upbeeoyjmql6uw24wgxr5x6fel

     0 B / 2.55 KiB [--------------------------------------------------------------]
     2.55 KiB / 2.55 KiB [======================================================] 0s
    Writing manifest to image destination
    Storing signatures
    FATAL:   Unable to pull docker://cmopipeline/variantcaller:1.0.0: conveyor failed to get: no descriptor found for reference "3sw08cr0yd460ygwyjn2p29y40lakjnw9y2nj5w20za960059fij5okthwc87l66"

The error occurs when users are downloading pre-built images on Dockerhub via singularity pull for the first time, i.e. "pulling" singularity images for the first time. This situation can be avoided if you set the variable NXF_SINGULARITY_CACHEDIR to the subdirectory containing these images, which have already been downloaded on site. (Please read Juno Setup and Working with Containers for more details on this topic.)

Another option would be to simply execute the command above, i.e.

singularity pull  --name cmopipeline-fastp-1.0.0.img docker://cmopipeline/fastp:1.0.0 > /dev/null

Be aware

The command singularity pull tends to take quite some time to run, and often slows down the login server for everyone. We recommend you don't do this often. Setting the variable NXF_SINGULARITY_CACHEDIR to a location with already-downloaded images would be far more efficient.

← Variant Annotation and Filtering AWS Glossary →