This section describes how to verify that SuperSockets are working correctly on a cluster.
The general status of SuperSockets can be retrieved via the SuperSockets init script that controls the service dis_supersockets. On Red Hat systems, this can be done like
# service dis_supersockets status
which should show a status of running. If the status shown here is loaded, but not configured, it means that the SuperSockets configuration failed for some reason. Typically, it means that a configuration file could not be parsed correctly. The configuration can be performed manually like
# /opt/DIS/sbin/dis_ssocks_cfg
If this indicates that a configuration file is corrupted, you can verify them according to the reference in Appendix C, Configuration Files, Section 2, “SuperSockets Configuration”. At any time, you can re-create dishosts.conf
using the GUI dis_netconfig or the command line tool dis_mkconf and restore modified SuperSockets configuration files (supersockets_ports.conf
and supersockets_profiles.conf
) from the default versions that have been installed in /opt/DIS/etc/dis
.
Once the status of SuperSockets is running, you can verify their actual configuration via the command dis_ssocks_adm. The output of dis_ssocks_adm -m shows you, which IP address (or network mask) the local Cluster Node's SuperSockets know about. The output should be non-empty and identical on all Cluster Nodes.
Several benchmarks that can be used to validate the functionality and performance of SuperSockets is are installed in /opt/DIS/bin/socket
.
A benchmark that can be used to validate the functionality and performance of SuperSockets is latency_bench. The basic usage requires two machines (n1 and n2). Start the server process on Cluster Node n1 as server:
$ dis_ssocks_run latency_bench -server
On Cluster Node n2, run the client side of the benchmark like:
$ dis_ssocks_run latency_bench -client n1
The latency reported by latency_bench depends on your system, but should starts around 1µs using a modern machine where the PCIe slot is directly attached to the CPU. Using older machines, the latency may be higher. Latencies above 5µs indicate a problem; typical Ethernet latencies start at 20µs and more.
LD_PRELOAD=libksupersockets.so Using address family 2 Testing PINGPONG transfers (back-and-forth) # Size Total time (usec) RTT (usec) Latency (usec) 1 52058 5.21 2.60 Min: 2.00 2 48922 4.89 2.45 Min: 2.00 3 49019 4.90 2.45 Min: 2.00 4 50641 5.06 2.53 Min: 2.07 5 51371 5.14 2.57 Min: 2.08 6 51339 5.13 2.57 Min: 2.12 7 51432 5.14 2.57 Min: 2.09 8 50005 5.00 2.50 Min: 2.03 9 50060 5.01 2.50 Min: 2.03 10 50047 5.00 2.50 Min: 2.04 11 50128 5.01 2.51 Min: 2.03 12 51977 5.20 2.60 Min: 2.13 13 52072 5.21 2.60 Min: 2.13 14 52159 5.22 2.61 Min: 2.11 15 52164 5.22 2.61 Min: 2.13 16 50142 5.01 2.51 Min: 2.05 24 50199 5.02 2.51 Min: 2.04 32 50316 5.03 2.52 Min: 2.06 48 50666 5.07 2.53 Min: 2.07 64 52134 5.21 2.61 Min: 2.16 96 53127 5.31 2.66 Min: 2.19 128 59349 5.93 2.97 Min: 2.27 256 63195 6.32 3.16 Min: 2.98 512 70478 7.05 3.52 Min: 3.34 1024 86276 8.63 4.31 Min: 4.11 2048 114890 11.49 5.74 Min: 5.52 4096 150259 15.03 7.51 Min: 7.23 8192 588301 58.83 29.42 Min: 15.34 16384 645013 64.50 32.25 Min: 18.30 32768 1094996 109.50 54.75 Min: 41.25 65536 1480612 148.06 74.03 Min: 60.67
A benchmark that can be used to validate the functionality and performance of SuperSockets is installed as /opt/DIS/bin/socket/sockperf
. The basic usage requires two machines (n1 and n2). Start the server process on Cluster Node n1 without any parameters:
$ dis_ssocks_run sockperf
On Cluster Node n2, run the client side of the benchmark like:
$ dis_ssocks_run sockperf -h n1
The output for a working setup should look like this:
# sockperf $Revision: 4036 $ - test stream socket performance and system impact # LD_PRELOAD: libksupersockets.so # address family: 2 protocol: 0 # client node: ixchyx-6 server nodes: ixchyx-5 # sockets per process: 1 (total 1) - pattern: sequential # wait for data: blocking recv() # send mode: blocking # buffer type: contiguous iovec config: 0 # client/server pairs: 1 (running on -1 cores) # socket options: nodelay 1 # communication pattern: PINGPONG (back-and-forth) # test loops: 10000 # bytes loops avg_RTT/2[us] min_RTT/2[us] msg/s MB/s 1 10000 2.68 2.03 373365 0.36 4 10000 2.73 2.09 365998 1.40 8 10000 2.59 2.07 386743 2.95 12 10000 2.71 2.15 368751 4.22 16 10000 2.76 2.07 362476 5.53 24 10000 2.53 2.06 394658 9.03 32 10000 2.54 2.08 393329 12.00 48 10000 2.58 2.09 387995 17.76 64 10000 2.69 2.16 371789 22.69 80 10000 2.73 2.18 366275 27.94 96 10000 2.85 2.19 350324 32.07 112 10000 2.91 2.23 343926 36.74 128 10000 3.01 2.38 331691 40.49 160 10000 3.02 2.86 330737 50.47 192 10000 3.34 2.93 299845 54.90 224 10000 3.29 2.96 303767 64.89 256 10000 3.23 3.04 310074 75.70 512 10000 3.66 3.37 273587 133.59 1024 10000 4.35 4.14 229758 224.37 2048 10000 5.81 5.58 172185 336.30 4096 10000 7.52 7.25 133012 519.58 8192 10000 26.22 17.89 38138 297.95 16384 10000 28.89 23.26 34616 540.88 32768 10000 53.24 38.29 18783 586.97 65536 10000 70.69 52.52 14146 884.13
The latency in this example starts around 2µs. On older machines, the latency may be higher. Latencies above 10µs indicate a problem; typical Ethernet latencies start at 20µs and more.
In case of latencies being to high, please verify if SuperSockets are running and configured as described in the previous section. Also, verify that the environment variable LD_PRELOAD
is set to libksupersockets.so
. This is reported for the client in the second line of the output (see above), but LD_PRELOAD
also needs to be set correctly on the server side. See Chapter 4, Initial Installation, Section 3.10, “Making Cluster Application use PCI Express” for more information on how to make generic socket applications (like sockperf) use SuperSockets.