Home New Browse Search [?Job Scheduling With Slurm
Note You need to log in before you can comment on or make changes to this bug. Attachments Add an attachment proposed patch, testcase, etc. The user had had 2 accounts with us and all his jobs were running OK. We just discovered yet another account with the same problem. All accounts are created through the same procedure, no exceptions, only two demonstrate the problem.
Previously restarting Slurm daemon helped but not this time. I assume they have a default account set, and that you have been running using AccountingStorageEnforce set to assoc or limits for a while? Have there been any changes recently aside from adding this third account for that user?
Can you attach your current slurm. And can you try increasing the debug level on slurmctld and then submitting a job under that problematic account? There should be some hints as to what's happening that we can use to track down the issue.
The relevant command is: "scontrol setdebug debug3" You'll want to reset this with "scontrol setdebug info" afterwards - the log file can be rather verbose at that level.
Meanwhile, to answer your account question - yes, there is a default account and it works fine. We just discovered another user with the similar problem. Put this file on all nodes of your cluster. See the slurm. I doubt that's the problem but it may be worth checking if things clear up without it enabled.
Have there been any connectivity issues with that lately? I usually expect to see a username not a userid number in the debug log slot, although that may be something we changed in a recent version so that may or may not be a symptom of the problem.
If you don't mind, can you attach or email me directly if you don't want it public on the bug report the output for "scontrol show assoc" and "sacctmgr show assoc"? Comment 7 Gene Soudlenkov MST Hi,Tim I checked the filter code and found nothing there that could result of this behaviour - every time the filter declines the request we notify the user with the message. Also, this is the only account that does not work although we discovered one more user with the same problem.
This is the output of the assoc command for this project: pancluster uoa normal pancluster uoa brob normal pancluster uoa fsuz brob also has the same problem with submitting through this account. I already tried deleting it and re-creating it again but the problem persists Gene Comment 8 Tim Wickberg MST Can you do "scontrol show assoc" as well? Can you provide a longer chunk of the log? There may be some slurmdbd communication errors or something else affecting the slurmctld process.
Have there been any other accounts created since this one, or is this the more recent account? Can you add users to other existing accounts without issue and have then run there? I think I found why this happened. We have been having a problem with Slurm picking up new accounts and the only way to resolve the problem was to restart slurm - there was a bug filed about it a little while ago.
For some reason slurm refused to restart on the login and build nodes from where the submit requests were sent. I just forced the restart of slurmd everywhere, killing everything to force it and it finally picked up the new accounts and started working again.Each allocation is associated with a scheduler account.
The latter allows short-term borrowing of unused resources from other groups' accounts. In turn, each user in a group has a scheduler account association. In the end, it is this association which determines which QOSes are available to a particular user. Users with secondary Linux group membership will have associations with QOSes from their secondary groups.
In summary, each HPG user has scheduler associations with group account based QOSes that determine what resources are available to the users's jobs. These QOSes can be thought of as pools of computational CPU coresmemory RAMmaximum run time time limit resources with associated starting priority levels that can be consumed by jobs to run applications according to QOS levels, which we will review below.
The output shows that the user magitz has four account associations and 8 different QOSes. By convention, a user's default account is always the account of their primary group. If a user does not explicitly request a specific account and QOS, the user's default account and QOS will be assigned to the job.
If the user magitz wanted to use the borum group's account - which he has access by virtue of the borum account association - he would specify the account and the chosen QOS in his batch script as follows:. Note that both must be specified. Otherwise scheduler will assume the default ufhpc account is intended, and neither the borum nor borum-b QOSes will be available to the job.
Consequently, scheduler would deny the submission. These sbatch directives can also be given as command line arguments to srun. For example:. This pool of resources is shared among all members of the borum group. There are additional base priority and run time limits associated with QOSes.
To display them run. The investment and burst QOS jobs are limited to 31 and 4 day run times, respectively. It is important to remember that the base priority is only one component of the jobs overall priority and that the priority will change over time as the job waits in the queue. The burst QOS cpu and memory limits are nine times 9x those of the investment QOS up to a certain limit and are intended to allow groups to take advantage of unused resources short periods of time by borrowing resources from other groups.
Thus, if the application allows saving and resuming the analysis it is recommended that instead of running jobs for extremely long times, you utilize checkpointing of your jobs so that you can restart them and run shorter jobs instead. Here are examples we were able to find. Considerations for selecting how many CPU cores and how much memory to request for a job must take into account the QOS limits based on the group investment, the limitations of the hardware compute nodesand the desire to be a good neighbor on a shared resource like HiPerGator to ensure that system resources are allocated efficiently, used fairly, and everyone has a chance to get their work done without causing negative impacts on work performed by other researchers.
HiPerGator consists of many interconnected servers compute nodes.Users may request to reserve nodes for special circumstances. A reservation may be shared by multiple users. The maximum number of nodes allowed for a reservation is half the number of general nodes for the cluster the user is asking for. The maximum duration for reservations is two weeks. Jobs run in this manner are preemptable.
To do so, you will need to specify a special account. To specify the owner-guest, use the partition cluster-guest using the appropriate cluster name and the account owner-guest; on ash the account for guest jobs is smithp-guest.
Your job will be preempted if a job comes in from a user from the group of the owner whose node s your job received. The use of these are described on the Slurm documentation page. The majority of a job's priority is based on a quality of service definition or QOS.
The majority of a job's priority will be set based on a quality of service definition or QOS. Interactive nodes. Access either via lonepeak. There are also owner interactive nodes that are restricted to the owner group. Each cluster has a set of two general login nodes associated with them. For example if you ssh'd into "kingspeak.
The interactive nodes for guests the general CHPC users are ash5 and ash6, and they can be accessed either by specifying the specific node or by using "ash-guest. In addition to the cluster front end nodes, CHPC provides a set of additional login nodes, the frisco nodesfor limited interactive work. For intensive interactive work you must use the batch system. The login nodes are your interface with the computational nodes and are where you interact with the batch system. The policies for acceptable usage of the frisco nodes is more relaxed than the cluster interactive nodes, but is still limited.
Arbiter works with a set of maximum usage limits and threshold levels. Users will receive an email message from the Arbiter script when they enter and exit penalty status; the email will include a listing of their high impact processes, and a graph that shows their cpu and memory usage over a period of time. This is a limit for all processes that a user has on a given node.
Arbiter maintains a history, such that once a user returns to normal status and is no longer exceeding the thresholds, the last penalty status is decreased by 1 level each 3 hours.
What you could do is use the -vvv option to sbatch to see exactly what Slurm sees:. What you can try is remove the array specification from the submission command line and insert it in the submission script, like this:. The next step is to contact the system administrators with the information you will get from running the above tests and ask for help.
This post is a bit old, but in case it happens for other people, I have had the same issue but the accepted answer did not suggest what was the problem in my case. This error sbatch: error: Batch job submission failed: Invalid job array specification can also be raised when the array size is too large. The maximum job array size. The maximum job array task index value will be one less than MaxArraySize to allow for an index value of zero.
Configure MaxArraySize to 0 in order to disable job array use. The value may not exceed The default value is To check the value, the slurm. Learn more. Invalid job array specification in slurm Ask Question. Asked 5 years, 7 months ago. Active 4 months ago. Viewed 4k times. I am submitting a toy array job in slurm. Active Oldest Votes. Sign up or log in Sign up using Google. Sign up using Facebook.
Sign up using Email and Password.
2.1 General HPC Cluster Policies
Post as a guest Name.Familiarity with Slurm's Accounting web page is strongly recommended before use of this document. Note: If limits are defined at multiple points in this hierarchy, the point in this list where the limit is first defined will be used. Consider the following example:. So the default for these 3 types of limits is that they are upper bound by the Partition one.
Scheduling policy information must be stored in a database as specified by the AccountingStorageType configuration parameter in the slurm. For security and performance reasons, the use of SlurmDBD Slurm Database Daemon as a front-end to the database is strongly recommended.
SlurmDBD uses a Slurm authentication plugin e. SlurmDBD also uses an existing Slurm accounting storage plugin to maximize code reuse. SlurmDBD uses data caching and prioritization of pending requests in order to optimize performance. Both accounting and scheduling policies are configured based upon an association. An association is a 4-tuple consisting of the cluster name, bank account, user and optionally the Slurm partition. In order to enforce scheduling policy, set the value of AccountingStorageEnforce.
This option contains a comma separated list of options you may want to enforce. The valid options are: associations - This will prevent users from running jobs if their association is not in the database.
This option will prevent users from accessing invalid accounts. By setting this option, the 'associations' option is also set. QOS values are defined for each association in the database. Without this option set, jobs will be launched as long as their usage hasn't reached the cpu-minutes limit which can lead to jobs being launched but then killed when the limit is reached. By setting this option, both the 'associations' option and the 'limits' option are set automatically. By using this option, the 'associations' option is also set.
The 'TrackWCKey' option is also set to true. NOTE: The association is a combination of cluster, account, user names and optional partition name. Without AccountingStorageEnforce being set the default behavior jobs will be executed based upon policies configured in Slurm on each cluster. The tool used to manage accounting policy is sacctmgr. It can be used to create and delete cluster, user, bank account, and partition records plus their combined association record. See man sacctmgr for details on this tools and examples of its use.
Changes made to the scheduling policy are uploaded to the Slurm control daemons on the various clusters and take effect immediately. When an association is deleted, all running or pending jobs which belong to that association are immediately canceled.
When limits are lowered, running jobs will not be canceled to satisfy the new limits, but the new lower limits will be enforced. When dealing with Associations, most of these limits are available not only for a user association, but also for each cluster and account.
When Grp limits are considered with respect to this flag the Grp limit is treated as a Max limit. These represent the scheduling policies unique to associations. Shared policies and limits a QOS has in common are listed above. The MaxNodes and MaxWall options already exist in Slurm's configuration on a per-partition basis, but the above options provide the ability to impose limits on a per-user basis.
The MaxJobs option provides an entirely new mechanism for Slurm to control the workload any individual may place on a cluster in order to achieve some balance between users. Fair-share scheduling is based upon the hierarchical bank account data maintained in the Slurm database.This document explains how to fix a Slurm job submission script that either does not run or runs incorrectly, and lists common mistakes as well as how to identify and fix them.
To debug job scripts that generate errors, look up the error message in the section below to identify the most likely reason your script received that error message. Once you have identified the mistake in your script, edit your script to correct it and re-submit your job. If you receive the same error message again, examine the error message and the mistake in your script more closely.
Sometimes the same error message can be generated by two different mistakes in the same script, meaning it's possible that you may resolve the first mistake but need to correct a second mistake to clear that particular error message.
When you re-submit your job you may receive a new error message. This means the mistake that generated the first error message has been resolved, and now you need to fix a second mistake. Slurm returns up to two distinct error messages at a time. If your submission script has more than two mistakes, you will need to re-submit your job multiple times to identify and fix all of them.
When Slurm encounters a mistake in your job submission script, it does not read the rest of your script that comes after the mistake. If the mistake generates an error, you can fix it and resubmit your job, however not all mistakes generate errors.
If your script's required elements account, partition, nodes, cores, and wall time have been read successfully before Slurm encounters your mistake, your job will be still be accepted by the scheduler and run, just not the way you expect it to. Scripts with mistakes that don't generate errors still need to be debugged since the scheduler has ignored some of your SBATCH lines.
You can identify a script with mistakes if the output from your job is unexpected or incorrect. To use this reference: search for the exact error message generated by your job.
Some error messages appear to be similar but are generated by different mistakes.
Note that the errors listed in this document may also be generated by interactive job submissions using srun or salloc. In those cases, the error messages will begin with srun error or salloc error. The information about resolving these error messages is the same. With certain combinations of GUI editors and character sets on your personal computer, copying and pasting into Quest job submission scripts may bring in specific hidden characters that interfere with the scheduler's ability to interpret the script.
To resolve this, you may need to type your submission script into a native unix editor like vi and not use copy and paste. Possible mistake: you are not a member of the allocation specified in your job submission script Fix: confirm you are a member of the allocation by typing groups at the command line on Quest.
If the allocation you have specified in your job submission script is not listed, you are not a member of this allocation. Use an allocation that you are a member of in your job submission script. If this generates a new error referencing a different line of your script, the account line is correct and the mistake is elsewhere in your submission script.As with most HPC services, Cirrus uses a scheduler to manage access to resources and ensure that the thousands of different users of system are able to share the system and all get access to the resources they require.
Cirrus uses the Slurm software to schedule jobs. Writing a submission script is typically the most convenient way to submit your job to the scheduler.
Example submission scripts with explanations for the most common job types are provided below. Interactive jobs are also available and can be particularly useful for developing and debugging applications. More details are available below. If you have any questions on how to run jobs on Cirrus do not hesitate to contact the Cirrus Service Desk. You typically interact with Slurm by issuing Slurm commands from the login nodes to submit, check and cancel jobsand by specifying Slurm directives that describe the resources required for your jobs in job submission scripts.
Without any options, sinfo lists the status of all resources and partitions, e. The script will typically contain one or more srun commands to launch parallel tasks.
When you submit the job, the scheduler provides the job ID, which is used to identify this job in other Slurm commands and when looking at resource usage in SAFE. For example:. The output of this is often overwhelmingly large.
Debugging your Slurm submission script
If the job is waiting to run it is simply cancelled, if it is a running job then it is stopped immediately. If you have requirements which do not fit within the current QoS, please contact the Service Desk and we can discuss how to accommodate your requirements. There are different resource limits on Cirrus for different purposes.
There are three different things you need to specify for each job:. Other node resources: memory on the standard compute nodes; memory and CPU cores on the GPU nodes are assigned pro rata based on the primary resource that you request.
On Cirrus, you cannot specify the memory for a job using the --mem options to Slurm e. The amount of memory you are assigned is calculated from the amount of primary resource you request. The primary resource you request on standard compute nodes are CPU cores. Using the --exclusive option in jobs will give you access to the full node memory even if you do not explicitly request all of the CPU cores on the node.
You will not generally have access to the full amount of memory resource on the the node as some is retained for running the operating system and other system processes. The primary resource you request on standard compute nodes are GPU cards.
Using the --exclusive option in jobs will give you access to all of the CPU cores and the full node memory even if you do not explicitly request all of the GPU cards on the node.