Schedule Schema

SIMBA Scheduler

The schema describes which and how modules are executed during each tick of the driver.

https://raw.githubusercontent.com/NSSAC/SIMBA_driver/master/schema/schedule.json

type

object

properties

  • schedule

type

array

items

allOf

Schedule

Module

  • commonData

Common Data

Common data provided to all modules upon execution.

type

object

Integer

An integer

integer

type

integer

minimum

0

Integer Tuple

A tuple of 2 integers

integerTuple

type

array

items

Integer

maxItems

2

minItems

2

Executor

executor

type

object

properties

  • type

Executor: https://parsl.readthedocs.io/en/stable/stubs/parsl.executors.HighThroughputExecutor.html#parsl.executors.HighThroughputExecutor

const

HighThroughputExecutor

  • label

Label for this executor instance.

type

string

default

HighThroughputExecutor

  • provider

Provider

  • launch_cmd

Command line string to launch the process_worker_pool from the provider. The command line string will be formatted with appropriate values for the following values (debug, task_url, result_url, cores_per_worker, nodes_per_block, heartbeat_period ,heartbeat_threshold, logdir).

type

string / null

default

null

  • address

An address to connect to the main Parsl process which is reachable from the network in which workers will be running. This field expects an IPv4 address (xxx.xxx.xxx.xxx). Most login nodes on clusters have several network interfaces available, only some of which can be reached from the compute nodes. This field can be used to limit the executor to listen only on a specific interface, and limiting connections to the internal network. By default, the executor will attempt to enumerate and connect through all possible addresses. Setting an address here overrides the default behavior.

type

string / null

default

null

  • worker_ports

Specify the ports to be used by workers to connect to Parsl. If this option is specified, worker_port_range will not be honored.

default

null

oneOf

Integer

type

null

  • worker_port_range

Worker ports will be chosen between the two integers provided.

default

[54000, 55000]

oneOf

Integer Tuple

type

null

  • interchange_port_range

Port range used by Parsl to communicate with the Interchange.

default

[55000, 56000]

oneOf

Integer Tuple

type

null

  • storage_access

default

null

oneOf

type

array

items

Storage Access

type

null

  • working_dir

Working dir to be used by the executor.

type

string / null

default

null

  • worker_debug

Enables worker debug logging.

type

boolean

default

False

  • cores_per_worker

cores to be assigned to each worker. Oversubscription is possible by setting cores_per_worker < 1.0.

type

number

default

1.0

  • mem_per_worker

GB of memory required per worker. If this option is specified, the node manager will check the available memory at startup and limit the number of workers such that the there’s sufficient memory for each worker.

type

number / null

default

null

  • max_workers

Caps the number of workers launched per node.

default

infinity

oneOf

type

number

minimum

0

const

infinity

  • cpu_affinity

Whether or how each worker process sets thread affinity. Options include “none” to forgo any CPU affinity configuration, “block” to assign adjacent cores to workers (ex: assign 0-1 to worker 0, 2-3 to worker 1), and “alternating” to assign cores to workers in round-robin (ex: assign 0,2 to worker 0, 1,3 to worker 1). The “block-reverse” option assigns adjacent cores to workers, but assigns the CPUs with large indices to low index workers (ex: assign 2-3 to worker 1, 0,1 to worker 2)

type

string

default

none

  • available_accelerators

Accelerators available for workers to use. Each worker will be pinned to exactly one of the provided accelerators, and no more workers will be launched than the number of accelerators. Either provide the list of accelerator names or the number available. If a number is provided, Parsl will create names as integers starting with 0.

default

[]

oneOf

Integer

type

array

items

type

string

  • prefetch_capacity

Number of tasks that could be prefetched over available worker capacity. When there are a few tasks (<100) or when tasks are long running, this option should be set to 0 for better load balancing.

default

0

Integer

  • heartbeat_threshold

Seconds since the last message from the counterpart in the communication pair: (interchange, manager) after which the counterpart is assumed to be un-available.

default

120

Integer

  • heartbeat_period

Number of seconds after which a heartbeat message indicating liveness is sent to the counterpart (interchange, manager).

default

30

Integer

  • poll_period

Timeout period to be used by the executor components in milliseconds. Increasing poll_periods trades performance for cpu efficiency.

default

10

Integer

  • address_probe_timeout

Managers attempt connecting over many different addresses to determine a viable address. This option sets a time limit in seconds on the connection attempt. Default of None implies 30s timeout set on worker.

default

null

oneOf

Integer

type

null

  • worker_logdir_root

n case of a remote file system, specify the path to where logs will be kept.

type

string / null

default

null

Slurm Provider

slurmProvider

type

object

properties

  • type

Slurm Provider: https://parsl.readthedocs.io/en/stable/stubs/parsl.providers.SlurmProvider.html#parsl.providers.SlurmProvider

const

SlurmProvider

  • partition

Slurm partition to request blocks from. If unspecified or None, no partition slurm directive will be specified.

type

string

  • account

Slurm account to which to charge resources used by the job. If unspecified or None, the job will use the user’s default account.

type

string

  • qos

Slurm queue to place job in. If unspecified or None, no queue slurm directive will be specified.

type

string

  • constraint

Slurm job constraint, often used to choose cpu or gpu type. If unspecified or None, no constraint slurm directive will be added.

type

string

  • nodes_per_block

Nodes to provision per block.

Integer

  • cores_per_node

Specify the number of cores to provision per node. If set to None, executors will assume all cores on the node are available for computation. Default is None.

Integer

  • mem_per_node

Specify the real memory to provision per node in GB. If set to None, no explicit request to the scheduler will be made. Default is None.

Integer

  • init_blocks

Initial number of blocks.

default

1

Integer

  • min_blocks

Minimum number of blocks to maintain.

default

0

Integer

  • max_blocks

Maximum number of blocks to maintain.

Integer

  • parallelism

Ratio of provisioned task slots to active tasks. A parallelism value of 1 represents aggressive scaling where as many resources as possible are used; parallelism close to 0 represents the opposite situation in which as few resources as possible (i.e., min_blocks) are used.

type

number

  • walltime

Walltime requested per block in HH:MM:SS.

type

string

default

00:10:00

  • scheduler_options

String to prepend to the #SBATCH blocks in the submit script to the scheduler.

type

string

default

  • regex_job_id

The regular expression used to extract the job ID from the sbatch standard output. The default is r’Submitted batch job (?P<id>S*)’, where id is the regular expression symbolic group for the job ID.

type

string

default

Submitted batch job (?P<id>\S*)

  • worker_init

Command to be run before starting a worker, such as ‘module load Anaconda; source activate env’.

type

string

  • exclusive

Requests nodes which are not shared with other running jobs.

type

boolean

default

True

  • move_files

should files be moved? by default, Parsl will try to move files.)

type

boolean

default

True

  • launcher

Launcher for this provider. Possible launchers include SingleNodeLauncher (the default), SrunLauncher, or AprunLauncher

Launcher

Local Provider

localProvider

type

object

properties

  • type

Slurm Provider: https://parsl.readthedocs.io/en/stable/stubs/parsl.providers.SlurmProvider.html#parsl.providers.LocalProvider

const

LocalProvider

  • nodes_per_block

Nodes to provision per block.

Integer

  • init_blocks

Initial number of blocks.

default

1

Integer

  • min_blocks

Minimum number of blocks to maintain.

default

0

Integer

  • max_blocks

Maximum number of blocks to maintain.

default

1

Integer

  • parallelism

Ratio of provisioned task slots to active tasks. A parallelism value of 1 represents aggressive scaling where as many resources as possible are used; parallelism close to 0 represents the opposite situation in which as few resources as possible (i.e., min_blocks) are used.

type

number

  • worker_init

Command to be run before starting a worker, such as ‘module load Anaconda; source activate env’.

type

string

  • move_files

should files be moved? by default, Parsl will try to move files.)

type

boolean

default

True

  • launcher

Launcher for this provider. Possible launchers include SingleNodeLauncher

Single Node Launcher

Provider

Provider: https://parsl.readthedocs.io/en/stable/reference.html#providers

provider

oneOf

Slurm Provider

Local Provider

Srun Launcher

srunLauncher

type

object

properties

  • type

srun Launcher: https://parsl.readthedocs.io/en/stable/stubs/parsl.launchers.SrunLauncher.html#parsl.launchers.SrunLauncher

const

SrunLauncher

  • debug

type

boolean

default

True

  • overrides

This string will be passed to the srun launcher. Default: ‘’

type

string

default

Single Node Launcher

singleNodeLauncher

type

object

properties

  • type

Single Node Launcher Launcher: https://parsl.readthedocs.io/en/stable/stubs/parsl.launchers.SingleNodeLauncher.html#parsl.launchers.SingleNodeLauncher

const

SingleNodeLauncher

  • debug

type

boolean

default

True

  • fail_on_any

type

boolean

default

False

Launcher

Provider: https://parsl.readthedocs.io/en/stable/reference.html#launchers

launcher

oneOf

Srun Launcher

Single Node Launcher

Storage Access

storageAccess

type

object

Schedule

The execution schedule of a module.

schedule

type

object

properties

  • priority

Priority of Module Execution

Higher priority is executed first. Priorities must be unique.

anyOf

type

number

enum

-Infinity, Infinity

  • startTick

Start Tick

The tick at which the module is executed first (default: -infinity).

Integer

  • endTick

End Tick

The targetTick for which the module is executed last (default: infinity).

Integer

  • tickIncrement

Tick Increment

The tick increment in which the module is executed (default: 1).

minimum

1

Integer

Module

Description of module specific information

module

allOf

type

object

properties

  • type

type

string

  • name

Module Name

The name of the module.

type

string

  • command

Module Command

The command to execute the module.

type

string

  • updateCommonData

Update Common Data

Specifies whether this module may update common data.

type

boolean

default

False

  • moduleData

Module specific Data

Module specific data provided to the module upon execution.

type

object

oneOf

Parsl Module

Local Module

Local Module

This describes module details required to execute a module locally (without parsl)

localModule

type

object

properties

  • type

const

local

Parsl Module

This describes module details required to execute a module through parsl (https://parsl.readthedocs.io/en/stable/index.html)

parslModule

type

object

properties

  • type

const

parsl

  • executor

Executor

  • provider

Provider

  • launcher

Launcher

  • storageAccess

Storage Access