# Troubleshooting
# The Nextflow Process
You can follow the Nextflow process by following what is printed to stdout
. Additionally, the Nextflow Java process creates a .nextflow.log
file in the run directory where warnings and errors are logged.
Note
- The last executed Nextflow call in the run directory will be in the first few lines of
.nextflow.log
. - Repeated calls to
nextflow run
in the same directory renames older output files from the Nextflow process, for example.nextflow.log.3
is from three runs prior to the current one.
# Individual Jobs
A job running a single process inside the pipeline can fail due to inadequate resources, which will trigger a re-run with increased resources. For other failures, you need to look inside the work
directory. In the .html
reports generated by default (or the trace.txt
file) in the run directory you will find for each process run its status (COMPLETED
; CACHED
if resumed and completed in a prior run; and FAILED
if an error occured) and a hash
. The hash indicates the subdirectory in which the process was run (for example a4/00365e
points to work/a4/00365e9190eca55907746edeb58f77
). In this directory you find the following files which are useful for troubleshooting:
.command.run
: This is the actual script which sets environment variables and runs.command.sh
, Nextflow submits this to LSF using bsub. You can manually resubmit it by runningbsub < .command.run
..command.sh
: This contains the command-line calls that are defined in the corresponding process inpipeline.nf
..command.log
: Containsstdout
from the process itself andbsub
..command.out
:stdout
from the process..command.err
:stderr
from the process.
Additionally, any files used by the process are symlinked in the work directory, and any intermediate and final output files are also left here.
# Standard LSF Errors
When debugging pipeline runs, there are common errors one encounters. An incomplete list of LSF job exit codes is provided below:
- error code
0
--- this means the jobs was considered successfully run - error code
1
--- this is a standard error code, which normally could mean something is wrong with the code itself - error code
130
--- this means there is not enough memory for the process to complete - error code
140
--- this means there was not enough time requested via LSF, which is translated from Nextflow intobsub -W
as detailed here (opens new window)
# Singularity Errors
The most common error one sees with Singularity is the error Failed to pull singularity image
, e.g.
ERROR ~ Error executing process > 'VariantCaller'
Caused by:
Failed to pull singularity image
command: singularity pull --name cmopipeline-variantcaller-1.0.0.img docker://cmopipeline/variantcaller:1.0.0 > /dev/null
status : 255
message:
[33mWARNING: Authentication token file not found : Only pulls of public images will succeed
INFO: Starting build...
Getting image source signatures
Skipping fetch of repeat blob sha256:g2wi99s7f5ij1buyrr0ep5tf8xjfk05lwc9vr3adlbgxbw1zvxlmx8053n4lvmsm
Skipping fetch of repeat blob sha256:z1k6kz1157i0vy8r9eutu3cmfzp48wyeoxyusha8r681x725o4vwb468952vaao3
Skipping fetch of repeat blob sha256:twc7y14h8qub3vi5i8vxvp0qxhtw01mee7nc5j7qjyhol5nx4e22fjl5kawlzf53
Skipping fetch of repeat blob sha256:ex0gy93dwd19y35433v8n4kcozowo964jx8zt088ltd9edw8a5gob94qc9coyhc6
Skipping fetch of repeat blob sha256:n54clmgy1tep6l409gdnkf980nvm1607oa6jr34po8q2v7u2l82o3z4rq9k6ctvg
Copying config sha256:4s26zd3jojt2mch7t41ek56uabitwg91p7b530upbeeoyjmql6uw24wgxr5x6fel
0 B / 2.55 KiB [--------------------------------------------------------------]
2.55 KiB / 2.55 KiB [======================================================] 0s
Writing manifest to image destination
Storing signatures
FATAL: Unable to pull docker://cmopipeline/variantcaller:1.0.0: conveyor failed to get: no descriptor found for reference "3sw08cr0yd460ygwyjn2p29y40lakjnw9y2nj5w20za960059fij5okthwc87l66"
The error occurs when users are downloading pre-built images on Dockerhub via singularity pull
for the first time, i.e. "pulling" singularity images for the first time. This situation can be avoided if you set the variable NXF_SINGULARITY_CACHEDIR
to the subdirectory containing these images, which have already been downloaded on site. (Please read Juno Setup and Working with Containers for more details on this topic.)
Another option would be to simply execute the command above, i.e.
singularity pull --name cmopipeline-fastp-1.0.0.img docker://cmopipeline/fastp:1.0.0 > /dev/null
Be aware
The command singularity pull
tends to take quite some time to run, and often slows down the login server for everyone. We recommend you don't do this often. Setting the variable NXF_SINGULARITY_CACHEDIR
to a location with already-downloaded images would be far more efficient.