Configuring compute nodes¶
There are two requirements for the compute nodes:
A user must be able to SSH to the node of the running job. See SLURM documentation (https://slurm.schedmd.com/pam_slurm_adopt.html)
When executing the application on the node it has to be associated with the same context as the placeholder job.
To solve the second point a special pam module is required to associate the SSH session with the running job. I the following sections a solution to this is configured. It should also be possible to configure this with the pam_slurm_adopt module, but it has not been thoroughly tested yet.
Adding pam_exec to /etc/pam.d/sshd¶
Add the following line to the /etc/pam.d/sshd to the end of this configuration file:
session optional pam_exec.so /sbin/pam_ssh_exec.sh
Adding pam_ssh_exec.sh script to /sbin¶
In the system directory of the distribution the pam_ssh_exec.sh script can be found. Copy this script to /sbin on the nodes. The script contains the following code:
#!/bin/sh
SLURMDIR=/usr/bin
[ "$PAM_USER" = "root" ] && exit 0
[ "$PAM_TYPE" = "open_session" ] || exit 0
squeue=$SLURMDIR/squeue
if [ ! -x $squeue ]; then
exit 0
fi
uidnumber=$(id -u $PAM_USER)
echo "uidnumber = "$uidnumber >> /var/log/pam_exec.log
host=$(hostname -s)
# last job the user started is where these tasks will go
jobid=$($squeue --noheader --format=%i --user=$PAM_USER --node=localhost | tail -1)
[ -z "$jobid" ] && exit 0
for system in freezer cpuset; do
cgdir=/sys/fs/cgroup/$system/slurm
# if the cgdir doesn't exist skip it
[ -d $cgdir ] || continue
# first job step is where we'll put these tasks
cgtasks=$(find $cgdir/uid_$uidnumber/job_$jobid -mindepth 2 -type f -name tasks -print -quit)
[ -f $cgtasks ] && echo $PPID > $cgtasks
[ -f $cgtasks ] && echo $PPID >> /var/log/pam_exec.log
done
exit 0
This script will add the SSH session to the same cgroups task group, enabling the same resource limits as for other jobs on the node. A log file /var/log/pam_exec.log will be created on the node for debugging purposes.