Problem running multi-node MPI application on Cascade
Today I tried to run MPI task (openmpi v3.1.3) on cascade requesting 96 cores (hoping to run on one node). SLURM decided to run this task on several nodes and the program crashed with message starting with
PSM2 was unable to open an endpoint. Please make sure that the network link is
active on the node and the hardware is functioning.
(see attached file slurm-5435). If I request lower number of cores (say 48) and I receive one node, there are no problems. Also I use the same script and program for KNL and again, I don't have any problem.