Alphafold Parameter Reference

Introduction

This reference contains information on the parameters that can be passed to AlphaFold. Most information comes from the AlphaFold github page and the contents of execution scripts and is duplicated here for convinience.

Parameters

fasta_paths

Description

Paths to FASTA files, each containing a prediction target that will be folded one after another. If a FASTA file contains multiple sequences, then it will be folded as a multimer. Paths should be separated by commas.

Values

All FASTA paths must have a unique basename, as this basename is used to name the output directories for each prediction.

data_dir

Description

Path to directory of supporting data. This directory is roughly 2.2 TB in size.

Values

The value set in the submit script points to the current version of the datasets available on the UFS HPC. In most cases, users will not need to change this.

If a historical dataset is required, please see the max_template_date parameter first.

output_dir

Description

Path to a directory that will store the results.

Values

This directory will be created and contain a directory for each fasta file provided.

max_template_date

Description

Maximum template release date to consider. Any template with a release date after this date will be ignored. This parameter is important if folding historical test sets.

Values

The submit script automatically sets the date to the current date unless explicitly changed.

If another date is required, enter the new date in the format yyyy-mm-dd. For example: 2022-01-25

db_preset

Description

Preset for MSA database configuration which can be used to optimize for speed and lower hardware requirements.

Values

The two accepted values are:

  • full_dbs : Use all genetic databases. This is the default value in the submit script and is the value used at CASP14

  • reduced_dbs : Runs with a reduced version of the BFD. This preset is not recommended for the UFS HPC

model_preset

Description

Selects the AlphaFold model to run. The available models are:

  • The monomer model (monomer)
  • The monomer model with extra ensembling (monomer_casp14)
  • The monomer with pTM head (monomer_ptm)
  • The multimer model (multimer)

Values

The accepted values are: * monomer * monomer_casp14 * monomer_ptm * multimer

Note that when selecting multimer*, the is_prokaryote_list parameter needs to be set*

model_preset

Description

Selects the AlphaFold model to run. The available models are:

  • The monomer model (monomer)
  • The monomer model with extra ensembling (monomer_casp14)
  • The monomer with pTM head (monomer_ptm)
  • The multimer model (multimer)

Values

The accepted values are:

  • monomer
  • monomer_casp14 (Recommended for monomer runs)
  • monomer_ptm
  • multimer

Note that when selecting multimer*, a multi-sequence FASTA file must be provided and the is_prokaryote_list parameter needs to be set*

is_prokaryote_list

Description

Note: This is an option for the multimer model and is not used by the single chain system.

These values determine the pairing method for the MSA.

Values

The two accepted values are:

  • true : The target complex is from a prokaryote

  • false : The target complex is not from a prokaryote, or the orgin is not known.