Nextflow cache and resume
Nextflow maintains a cache directory where it stores the intermediate results and metadata from workflow runs. Workflows executed in Seqera Platform use this caching mechanism to enable users to relaunch or resume failed or otherwise interrupted runs as needed. This eliminates the need to re-execute successfully completed tasks when a workflow is executed again due to task failures or other interruptions.
Cache directory
Nextflow stores all task executions to the task cache automatically, whether or not the resume or relaunch option is used. This makes it possible to resume or relaunch runs later if needed. Platform HPC and local compute environments use the default Nextflow cache directory (.nextflow/cache
) to store the task cache. Cloud compute environments use the cloud cache mechanism to store the task cache in a sub-folder of the pipeline work directory.
To override the default cloud cache location in cloud compute environments, specify an alternate directory with the cache directive in your Nextflow configuration file (either in the Advanced options > Nextflow config file field on the launch form, or in the nextflow.config
file in your pipeline repository).
- AWS Batch and Amazon EKS
- Azure Batch
- Google Cloud Batch and Google GKE
- Kubernetes
To customize the cache location used in your AWS Batch and Amazon EKS compute environments, specify an alternate cache directory in your Nextflow configuration:
cloudcache {
enabled = true
path = 's3://your-bucket/.cache'
}
The new cache directory must be accessible with the credentials associated with your compute environment. An alternate cloud storage location can be specified if you include the necessary credentials for that location in your Nextflow configuration. This is not recommended for production environments.
To customize the cache location used in your Azure Batch compute environments, specify an alternate cache directory in your Nextflow configuration:
cloudcache {
enabled = true
path = 'az://your-container/.cache'
}
The new cache directory must be accessible with the credentials associated with your compute environment. An alternate cloud storage location can be specified if you include the necessary credentials for that location in your Nextflow configuration. This is not recommended for production environments.
To customize the cache location used in your Google Cloud Batch and Google GKE compute environments, specify an alternate cache directory in your Nextflow configuration:
cloudcache {
enabled = true
path = 'gs://your-bucket/.cache'
}
The new cache directory must be accessible with the credentials associated with your compute environment. An alternate cloud storage location can be specified if you include the necessary credentials for that location in your Nextflow configuration. This is not recommended for production environments.
Kubernetes compute environments do not use cloud cache by default. To specify a cloud storage cache directory, include the cloud cache path and necessary credentials for that location in your Nextflow configuration. This is not recommended for production environments.
AWS S3
// Specify cloud storage credentials
aws {
accessKey = '<YOUR S3 ACCESS KEY>'
secretKey = '<YOUR S3 SECRET KEY>'
region = '<YOUR REGION>'
}
// Set the cloud cache path
cloudcache {
enabled = true
path = 's3://your-bucket/.cache'
}
Azure Blob Storage
// Specify cloud storage credentials
azure {
storage {
accountName = '<YOUR AZURE BLOB STORAGE ACCOUNT NAME>'
accountKey = '<YOUR AZURE BLOB STORAGE ACCOUNT KEY>'
}
}
// Set the cloud cache path
cloudcache {
enabled = true
path = 'az://your-container/.cache'
}
Google Cloud Storage
- See these instructions to set up IAM and create a JSON key file for the custom service account with permissions to your Google Cloud storage account.
- If you run the gcloud CLI authentication flow with
gcloud auth application-default login
, your Application Default Credentials are written to$HOME/.config/gcloud/application_default_credentials.json
and picked up by Nextflow automatically. Otherwise, declare theGOOGLE_APPLICATION_CREDENTIALS
environment variable explicitly with the local path to your service account credentials file created in the previous step. - Add the following to the Nextflow Config file field when you launch your pipeline:
// Specify cloud storage credentials
google {
location = '<YOUR BUCKET LOCATION>'
project = '<YOUR PROJECT ID>'
batch.serviceAccountEmail = '<YOUR SERVICE ACCOUNT EMAIL>'
}
// Set the cloud cache path
cloudcache {
enabled = true
path = 'gs://your-bucket/.cache'
}
Relaunch a workflow run
An effective way to troubleshoot a workflow execution is to Relaunch it with different parameters. Select the Runs tab, open the options menu to the right of the run, and select Relaunch. You can edit parameters, such as Pipeline to launch and Revision number before launch. Select Launch to execute the run from scratch.
The Relaunch option is only available for runs launched from the Seqera Platform interface.
Resume a workflow run
Seqera uses Nextflow's resume functionality to resume a workflow run with the same parameters, using the cached results of previously completed tasks and only executing failed and pending tasks. Select Resume from the options menu to the right of the run of your choice to launch a resumed run of the same workflow, with the option to edit some parameters before launch. Unlike a relaunch, you cannot edit the pipeline to launch or the work directory during a run resume.
The Resume option is only available for runs launched from the Seqera Platform interface.
Change compute environment during run resume
Users with appropriate permissions can change the compute environment when resuming a run. The new compute environment must have access to the original run work directory. This means that the new compute environment must have a work directory that matches the root path of the original pipeline work directory. For example, if the original pipeline work directory is s3://foo/work/12345
, the new compute environment must have access to s3://foo/work
.