Quantcast
Channel: Azure Virtual Machines forum
Viewing all articles
Browse latest Browse all 12545

DAPL errors when benchmarking RDMA-enabled SLES cluster

$
0
0

I set up 2 A8s in an availability set running SLES-HPC 12 (following the tutorial here: https://azure.microsoft.com/en-us/documentation/articles/virtual-machines-linux-cluster-rdma/).

When I run the intel MPI pingpong test, I am getting DAPL errors:

azureUser@sshvm0:~> /opt/intel/impi/5.0.3.048/bin64/mpirun -hosts 10.0.0.4,10.0.0.5 -ppn 1 -n 2 -env I_MPI_FABRICS=shm:dapl -env I_MPI_DYNAMIC_CONNECTION=0 -env I_MPI_DAPL_PROVIDER=ofa-v2-ib0 /opt/intel/impi/5.0.3.048/bin64/IMB-MPI1 pingpong
sshvm1:d28:bef0eb40: 12930 us(12930 us):  dapl_rdma_accept: ERR -1 Input/output error
sshvm1:d28:bef0eb40: 12946 us(16 us):  DAPL ERR accept Input/output error
[1:10.0.0.5][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:622] error(0x40000): ofa-v2-ib0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c at line 622: 0
internal ABORT - process 0

Similar errors when running one of the OSU MPI benchmarks:

azureUser@sshvm0:~> /opt/intel/impi/5.0.3.048/bin64/mpirun -hosts 10.0.0.4,10.0.0.5 -ppn 1 -n 2 -env I_MPI_FABRICS=shm:dapl -env I_MPI_DYNAMIC_CONNECTION=0 -env I_MPI_DAPL_PROVIDER=ofa-v2-ib0 /opt/intel/impi/5.0.3.048/bin64/IMB-MPI1 pingpong
sshvm1:d28:bef0eb40: 12930 us(12930 us):  dapl_rdma_accept: ERR -1 Input/output error
sshvm1:d28:bef0eb40: 12946 us(16 us):  DAPL ERR accept Input/output error
[1:10.0.0.5][../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c:622] error(0x40000): ofa-v2-ib0: could not accept DAPL connection request: DAT_INTERNAL_ERROR()
Assertion failed in file ../../src/mpid/ch3/channels/nemesis/netmod/dapl/dapl_conn_rc.c at line 622: 0
internal ABORT - process 0

Any tips on how to start debugging this? Thanks


Viewing all articles
Browse latest Browse all 12545

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>