3. SuperSockets Functionality and Performance

This section describes how to verify that SuperSockets are working correctly on a cluster.

3.1. SuperSockets Status

The general status of SuperSockets can be retrieved via the SuperSockets init script that controls the service dis_supersockets. On Red Hat systems, this can be done like

# service dis_supersockets status

which should show a status of running. If the status shown here is loaded, but not configured, it means that the SuperSockets configuration failed for some reason. Typically, it means that a configuration file could not be parsed correctly. The configuration can be performed manually like

# /opt/DIS/sbin/dis_ssocks_cfg

If this indicates that a configuration file is corrupted, you can verify them according to the reference in Appendix C, Configuration Files, Section 2, “SuperSockets Configuration”. At any time, you can re-create dishosts.conf using the GUI dis_netconfig or the command line tool dis_mkconf and restore modified SuperSockets configuration files (supersockets_ports.conf and supersockets_profiles.conf) from the default versions that have been installed in /opt/DIS/etc/dis.

Once the status of SuperSockets is running, you can verify their actual configuration via the command dis_ssocks_adm. The output of dis_ssocks_adm -m shows you, which IP address (or network mask) the local Cluster Node's SuperSockets know about. The output should be non-empty and identical on all Cluster Nodes.

3.2. SuperSockets Benchmarks

Several benchmarks that can be used to validate the functionality and performance of SuperSockets is are installed in /opt/DIS/bin/socket.

3.2.1. latency_bench

A benchmark that can be used to validate the functionality and performance of SuperSockets is latency_bench. The basic usage requires two machines (n1 and n2). Start the server process on Cluster Node n1 as server:

$ dis_ssocks_run latency_bench -server

On Cluster Node n2, run the client side of the benchmark like:

$ dis_ssocks_run latency_bench -client n1

The latency reported by latency_bench depends on your system, but should starts around 1µs using a modern machine where the PCIe slot is directly attached to the CPU. Using older machines, the latency may be higher. Latencies above 5µs indicate a problem; typical Ethernet latencies start at 20µs and more.

            LD_PRELOAD=libksupersockets.so
            Using address family 2
            Testing PINGPONG transfers (back-and-forth)
            #  Size    Total time (usec)    RTT (usec)      Latency (usec)
                  1          52058          5.21            2.60    Min: 2.00
                  2          48922          4.89            2.45    Min: 2.00
                  3          49019          4.90            2.45    Min: 2.00
                  4          50641          5.06            2.53    Min: 2.07
                  5          51371          5.14            2.57    Min: 2.08
                  6          51339          5.13            2.57    Min: 2.12
                  7          51432          5.14            2.57    Min: 2.09
                  8          50005          5.00            2.50    Min: 2.03
                  9          50060          5.01            2.50    Min: 2.03
                 10          50047          5.00            2.50    Min: 2.04
                 11          50128          5.01            2.51    Min: 2.03
                 12          51977          5.20            2.60    Min: 2.13
                 13          52072          5.21            2.60    Min: 2.13
                 14          52159          5.22            2.61    Min: 2.11
                 15          52164          5.22            2.61    Min: 2.13
                 16          50142          5.01            2.51    Min: 2.05
                 24          50199          5.02            2.51    Min: 2.04
                 32          50316          5.03            2.52    Min: 2.06
                 48          50666          5.07            2.53    Min: 2.07
                 64          52134          5.21            2.61    Min: 2.16
                 96          53127          5.31            2.66    Min: 2.19
                128          59349          5.93            2.97    Min: 2.27
                256          63195          6.32            3.16    Min: 2.98
                512          70478          7.05            3.52    Min: 3.34
               1024          86276          8.63            4.31    Min: 4.11
               2048         114890          11.49           5.74    Min: 5.52
               4096         150259          15.03           7.51    Min: 7.23
               8192         588301          58.83           29.42   Min: 15.34
              16384         645013          64.50           32.25   Min: 18.30
              32768        1094996          109.50          54.75   Min: 41.25
              65536        1480612          148.06          74.03   Min: 60.67
          

3.2.2. sockperf

A benchmark that can be used to validate the functionality and performance of SuperSockets is installed as /opt/DIS/bin/socket/sockperf. The basic usage requires two machines (n1 and n2). Start the server process on Cluster Node n1 without any parameters:

$ dis_ssocks_run sockperf

On Cluster Node n2, run the client side of the benchmark like:

$ dis_ssocks_run sockperf -h n1

The output for a working setup should look like this:

            # sockperf $Revision: 4036 $ - test stream socket performance and system
            impact
            # LD_PRELOAD: libksupersockets.so
            # address family: 2 protocol: 0
            # client node: ixchyx-6 server nodes: ixchyx-5
            # sockets per process: 1 (total 1) - pattern: sequential
            # wait for data: blocking recv()
            # send mode: blocking
            # buffer type: contiguous    iovec config: 0
            # client/server pairs: 1 (running on -1 cores)
            # socket options: nodelay 1
            # communication pattern: PINGPONG (back-and-forth)
            # test loops: 10000
            # bytes   loops avg_RTT/2[us] min_RTT/2[us]   msg/s     MB/s
                  1   10000          2.68          2.03  373365     0.36
                  4   10000          2.73          2.09  365998     1.40
                  8   10000          2.59          2.07  386743     2.95
                 12   10000          2.71          2.15  368751     4.22
                 16   10000          2.76          2.07  362476     5.53
                 24   10000          2.53          2.06  394658     9.03
                 32   10000          2.54          2.08  393329    12.00
                 48   10000          2.58          2.09  387995    17.76
                 64   10000          2.69          2.16  371789    22.69
                 80   10000          2.73          2.18  366275    27.94
                 96   10000          2.85          2.19  350324    32.07
                112   10000          2.91          2.23  343926    36.74
                128   10000          3.01          2.38  331691    40.49
                160   10000          3.02          2.86  330737    50.47
                192   10000          3.34          2.93  299845    54.90
                224   10000          3.29          2.96  303767    64.89
                256   10000          3.23          3.04  310074    75.70
                512   10000          3.66          3.37  273587   133.59
               1024   10000          4.35          4.14  229758   224.37
               2048   10000          5.81          5.58  172185   336.30
               4096   10000          7.52          7.25  133012   519.58
               8192   10000         26.22         17.89   38138   297.95
              16384   10000         28.89         23.26   34616   540.88
              32768   10000         53.24         38.29   18783   586.97
              65536   10000         70.69         52.52   14146   884.13
          

The latency in this example starts around 2µs. On older machines, the latency may be higher. Latencies above 10µs indicate a problem; typical Ethernet latencies start at 20µs and more.

In case of latencies being to high, please verify if SuperSockets are running and configured as described in the previous section. Also, verify that the environment variable LD_PRELOAD is set to libksupersockets.so. This is reported for the client in the second line of the output (see above), but LD_PRELOAD also needs to be set correctly on the server side. See Chapter 4, Initial Installation, Section 3.10, “Making Cluster Application use PCI Express” for more information on how to make generic socket applications (like sockperf) use SuperSockets.