GPFS

GPFS Token Management Tuning

I experienced the following error on starting up a GPFS node in a cluster:

unexpected token conflict in recovery: majType 1 minType 7 tokType Inode key F657ED5089ABDE89:00000000000FC191:0000000000000000 node 1 mode xw flags 0x0 seqNum 1141739

The resulting in the node in question asserting and other nodes remote asserting, making the cluster unstable.

IBM states this is related to the token memory (TM) management which can lead to unexpected result when being exhausted. One recommendation is to make sure the cluster able to handle the TM properly by keeping the following condition true.

number of nodes (local and remote) * (maxFilesToCache + maxStatCache) < ( number of manager nodes -1 ) * 1.2M * (512M / tokenMemLimit)

As well as considering the best-case scenario, where all manager nodes are online, we have to consider when manger nodes are off-line either for maintenance or due to failure. This could be up N/2+1 nodes if every node is a manager node.

I wrote a small script that can be used to visualise is the recommended condition is met with varying numbers of local nodes online.

./gpfs-token-mgmt-tuning.py
    #nodes (local and remote): 16
              maxFilesToCache: 1000000
                 maxStatCache: 50000
               #manager nodes: 8
                tokenMemLimit: 536870912


Checking nodes (local and remote) * (MFTC + MSC) < (#manager nodes -1) * 1.2M * (512M/TML)...

9/9 nodes: 2.1 (FAIL)
8/9 nodes: 2.29 (FAIL)
7/9 nodes: 2.57 (FAIL)
6/9 nodes: 2.98 (FAIL)
5/9 nodes: 3.67 (FAIL)

As you can see in the above example, we are not meeting the recommendation even when all nodes are online. Any parameter can be overwritten to visualise how this will change the conditions:

USAGE: ./gpfs-token-mgmt-tuning.py [-n <n>] [-f <n>] [-s <n>] [-m <n>] [-t <n>] [-l <n>]
        -n Override number of nodes (global and local)
        -f Override maxFilesToCache
        -s Override maxStateCache
        -m Override number of manager nodes
        -t Override tokemMemLimit
        -l Override number of local nodes

For example, lowering the maxFilesToCache to 300000:

# ./gpfs-token-mgmt-tuning.py -f 300000
    #nodes (local and remote): 16
              maxFilesToCache: 300000
                 maxStatCache: 50000
               #manager nodes: 8
                tokenMemLimit: 536870912

Checking nodes (local and remote) * (MFTC + MSC) < (#manager nodes -1) * 1.2M * (512M/TML)...

9/9 nodes: 0.7 (OK)
8/9 nodes: 0.76 (OK)
7/9 nodes: 0.86 (OK)
6/9 nodes: 0.99 (OK)
5/9 nodes: 1.22 (FAIL)

The above shows that the recommended tuning conditions will be met most of the time, but will fail if 4 nodes are left (which would still leave the cluster in quorum). Instead we want to consider lowering maxFilesToCache even lower.

# ./gpfs-token-mgmt-tuning.py -f 230000
    #nodes (local and remote): 16
              maxFilesToCache: 230000
                 maxStatCache: 50000
               #manager nodes: 8
                tokenMemLimit: 536870912


Checking nodes (local and remote) * (MFTC + MSC) < (#manager nodes -1) * 1.2M * (512M/TML)...

9/9 nodes: 0.56 (OK
8/9 nodes: 0.61 (OK)
7/9 nodes: 0.69 (OK)
6/9 nodes: 0.8 (OK)
5/9 nodes: 0.98 (OK)

Now we have found a suitable value to set, it can be set using # mmchconfig maxFilesToCache=230000 and restarting GPFS.

GPFS License Designation - Incorrect required license field

GPFS 3.3 Introduced License designations, for both client and server nodes. So after upgrading a cluster from GPFS 3.2, you are required to designate licenses with the mmchlicnse command.

I recently upgraded a GPFS cluster from 3.2 to 3.5 which contained 6 servers and 393 clients. Unfortuantly mmlslicense does not agree with me and has determined it requires 396 server licenses and 7 client licenses.

 Summary information
 ---------------------
 Number of nodes defined in the cluster:                        403
 Number of nodes with server license designation:                 0
 Number of nodes with client license designation:                 0
 Number of nodes still requiring server license designation:    396
 Number of nodes still requiring client license designation:      7

Even using the mmchnode –client did not demote the client.

The GPFS 3.5: Concepts, Planning, and Installation Guide states:

The GPFS server license permits the licensed node to perform GPFS management functions such as cluster configuration manager, quorum node, manager node, and Network Shared Disk (NSD) server. In addition, the GPFS Server license permits the licensed node to share GPFS data directly through any application, service protocol or method such as NFS, CIFS, FTP, or HTTP.

But no details are provided on how GPFS determines the “Required License” or what to do if it reports incorrectly. Digging in to the depths of GPFS a bit more and we can see that the getServerLicenseClass in the /usr/lpp/mmfs/bin/mmglobfuncs uses the following piece of awk to determine if a node is classed as a server or a client.

  $awk -F: -v ignoreNsdServers="$ignoreNsdServers" '                 \
    $'$LINE_TYPE_Field' == "'$VERSION_LINE'" {                       \
      { print $'$PRIMARY_SERVER_Field' >> "'$outfile'" }             \
      if ( $'$BACKUP_SERVER_Field' != "" ) {                         \
        { print $'$BACKUP_SERVER_Field' >> "'$outfile'" }            \
      }                                                              \
      { next }                                                       \
    }                                                                \
    $'$LINE_TYPE_Field' == "'$MEMBER_NODE'" {                        \
      if ( $'$DESIGNATION_Field' == "'$MANAGER'"     ||              \
           $'$CORE_QUORUM_Field' == "'$quorumNode'"  ||              \
           $'$OTHER_NODE_ROLES_Field' != ""          ||              \
           $'$CNFS_IPLIST_Field'      != ""           ) {            \
        { print $'$REL_HOSTNAME_Field' >> "'$outfile'" }             \
      }                                                              \
      { next }                                                       \
    }                                                                \
    $'$LINE_TYPE_Field' == "'$SG_DISKS'" && ! ignoreNsdServers {     \
      { n = split($'$NSD_PRIMARY_NODE_Field', nsdServer, ",") }      \
      { for (i=1; i <= n; i++) print nsdServer[i] >> "'$outfile'" }  \
      { n = split($'$NSD_BACKUP_NODE_Field', nsdServer, ",") }       \
      { for (i=1; i <= n; i++) print nsdServer[i] >> "'$outfile'" }  \
      { next }                                                       \
    }                                                                \
  ' $sdrfs

It turns out that the CNFSIPLISTField (field 23) wasn’t blank and triggered GPFS to believe the nodes were servers. Looking at the data in /var/mmfs/gen/mmsdrfs, I can see the culprit, one space!

%%home%%:20_MEMBER_NODE::141:134:u02n002:10.143.2.2:u02n002.data.cluster:client::::::u02n002.data.cluster:u02n002:1350:3.5.0.21:Linux:N::: :::::

This can easily be changed using the mmchnode command:

mmchnode --cnfs-interface=DEFAULT -N {Node[,Node...] | NodeFile | NodeClass}

Once run against the nodes that were being classed incorrectly, mmlslicense now reports correctly.

 Summary information
---------------------
Number of nodes defined in the cluster:                        403
Number of nodes with server license designation:                 0
Number of nodes with client license designation:                 0
Number of nodes still requiring server license designation:      6
Number of nodes still requiring client license designation:    397