Server Clusters
Chapter 13-2
Server Clusters
Introduction
A cluster is a group of up to six Domino servers interconnected to form a team that cooperates to provide services or resources to clients. Some advantages of clusters are high data availability, tightly synchronized databases, and scalability. Clusters provide failover protection for business-critical databases and servers, including passthru server failover to other servers in the cluster. With failover, users can still access a database when a server goes down. A workload balancing feature further ensures that heavily-used servers can pass requests to other cluster servers and that work is evenly distributed for optimized performance.
Typically, each cluster contains multiple database replicas that are kept tightly synchronized in the cluster by the Cluster Replicator. There are six other cluster components, such as the Cluster Database Directory Manager server task (CLDBDIR), that manage and monitor servers and databases in a cluster. For details about these components, see the Domino Administration Help documentation.
The HCL C API for Domino and Notes provides functions that you can use to programmatically access a server cluster. Any administrative application that lets you view or control a clustered environment can use these functions. The following table describes the types of information available when you use the HCL C API for Domino and Notes:
Information type | Description | API Routine |
Server cluster name | Name of the cluster to which a Domino server belongs | NSPingServer |
Server availability | Determines whether a server is reachable and, if so, retrieves the server availability index (optional). The index value (0-100) defines the current workload of the server. When the server availability value is less than the configured availability threshold value (as specified in the notes.ini Server_Availability_Threshold setting), the server enters a BUSY state. | NSPingServer |
Server cluster members | Names of servers that belong to a specific cluster | NSPingServer NSGetServerClusterMates |
Database availability attributes | Database attributes that indicate availability status. You can programmatically modify a database to be available (marked in service), unavailable (marked out of service), or ready for permanent deletion (marked for delete). | NSFDbMarkForDelete NSFDBMarkInService NSFDBMarkOutOfService NSFDbGetOptions |
Clustered database failover | Provides support for database open failover. | NSFDbOpenExtended |
The remainder of this chapter describes how you use the HCL C API for Domino and Notes toolkit to obtain cluster information and services. It includes code segments from the CLUMON sample program.
Server Cluster Name
The C API routine NSPingServer allows a remote application to retrieve the name of the cluster to which a server belongs. In addition to determining if a specified Domino server is reachable, this function optionally retrieves the specified server's availability index and a list (of type TEXT_LIST) of all members in the server cluster. If the server is in a BUSY state (that is, the server is too busy to accept incoming requests), the return status is set to ERR_SERVER_UNAVAILABLE. If the system administrator has restricted the server, the return status is set to ERR_SERVER_RESTRICTED. In either state, the server continues to return its availability index and list of cluster members if the pdwIndex and phList parameters are not NULL. The function does not return the cluster information if the server is unreachable or not running. For more information about NSPingServer, see the Reference.
The CLUMON sample code below uses NSPingServer in the following algorithm to get the cluster name for a specified server name:
- Call NSPingServer to retrieve the TEXT_LIST handle, of which the first item is the server cluster name. The pdwIndex parameter is set to NULL because the availability index value is not desired.
- Use OSLock /OSLockObject to lock down the text list.
- Use ListGetNumEntries to validate the number of entries and ListGetText to get the cluster name (first item).
- Use OSUnlock/OSUnlockObject to unlock the text list.
Server Cluster Name Retrieval Routine for Sample Program CLUMON (clfunc.c) |
STATUS GetServerCluster ( char FAR pServerName, / server name /
char FAR pClusterName / returned cluster name /
)
{
WORD wNumListEntries = 0;
WORD wBufferLen = 0;
void FAR lpList;
char pBuffer;
char achClusterName[MAXUSERNAME];
HANDLE hList=NULLHANDLE;
char szCanonServerName[MAXUSERNAME]; / Canonicalized Name of Server /
STATUS nError;
/ Canonicalize the servername if it isn't already done.
* The NSPingServer should use a canonicalized servername as input
/
nError = DNCanonicalize( 0L, NULL, pServerName, (char FAR)szCanonServerName,
MAXUSERNAME, NULL);
if (nError != NOERROR)
return (nError);
/ and call NSPingServer - only interested in cluster list /
nError = NSPingServer( (char FAR )szCanonServerName, NULL, &hList);
/ If the server is unavailable, proceed - can still get its cluster name /
if (!nError || (ERR(nError) == ERR_SERVER_UNAVAILABLE))
{
/ if the list handle is NULL, that indicates that this server doesn't
* belong to a cluster
/
if (hList == NULLHANDLE)
return(nError = NPNERR_NOT_CLUSTER_MEMBER);
else
{
/ Lock down the list so we can use it /
lpList = OSLock( void, hList);
wNumListEntries = ListGetNumEntries( lpList, FALSE);
if (wNumListEntries > 0)
{
/ The first entry in the list is the cluster name /
nError = ListGetText( lpList, FALSE, 0, &pBuffer, &wBufferLen);
if (!nError)
{
strncpy( achClusterName, pBuffer, wBufferLen);
achClusterName[wBufferLen] = '\0';
OSUnlock( hList);
lstrcpy( pClusterName, (char FAR *)achClusterName);
}
else
{
nError = NPNERR_GETTING_CLUSTER_NAME;
OSUnlock( hList);
goto Cleanup;
}
}
}
}
else
return(nError);
Cleanup:
if (hList != NULLHANDLE)
OSMemFree( hList);
return( nError);
}
NOTE: The NPNERR_xxx symbols are defined in the sample header file clumon.h and are not returned error codes that the HCL C API for Domino and Notes can process.
Server Availability
You can also use NSPingServer to retrieve the availability index of a clustered server. For a description of NSPingServer, refer to the Reference.
The following code illustrates how the CLUMON sample uses NSPingServer to get the server availability index for a specified server name. The routine calls NSPingServer to retrieve and return the availability (load) index value. The phList parameter is set to NULL because the TEXT_LIST handle is not desired.
Server Cluster Availability Retrieval Routine for Sample Program CLUMON (clfunc.c) |
STATUS GetServerLoad ( char FAR pServerName, / server name /
DWORD FAR dwLoadIndex / returned availability /
)
{
char szCanonServerName[MAXUSERNAME]; / Canonicalized Name of Server /
STATUS nError;
/ Canonicalize the servername if it isn't already done.
* The NSPingServer should use a canonicalized servername as input
/
nError = DNCanonicalize( 0L, NULL, pServerName, (char FAR)szCanonServerName,
MAXUSERNAME, NULL);
if (nError != NOERROR)
return (nError);
/ Call NSPingServer - only interested in load index /
nError = NSPingServer( (char FAR )szCanonServerName, dwLoadIndex, NULL);
/ And return the status /
return( nError);
}
Server Cluster Members
The text list parameter returned by NSPingServer contains the names of all servers that are members of the cluster, including the specified input server. Optionally, the API function NSGetServerClusterMates provides a dedicated, flexible method of retrieving the server cluster members from a remote application.
NSGetServerClusterMates retrieves a handle to a list (of type TEXT_LIST) of server names that belong to the same cluster as the server specified by pServerName. If the pServerName parameter is NULL, the function retrieves the cluster members of the user's home server. The dwFlags parameter controls how the information is retrieved. If you specify the CLUSTER_LOOKUP_NOCACHE flag, the information is retrieved using a NameLookup on the server only. If you specify the CLUSTER_LOOKUP_CACHEONLY flag, the information is only retrieved through the client's cluster name cache.
Both NSPingServer and NSGetServerClusterMates have special advantages. If your application requires that a particular Domino server always be running and that it report all cluster information, use NSPingServer, since it returns the information in a single call. If your application is more administrative and needs to dynamically locate servers in a cluster, NSGetServerClusterMates is a better choice due to its local caching capabilities. For descriptions of NSPingServer and NSGetServerClusterMates, refer to the Reference.
The following code illustrates how the CLUMON sample uses NSGetServerClusterMates to get cluster information for a specified server name. The method calls NSGetServerClusterMates with the passed lookup flag parameter to retrieve the TEXT_LIST handle of the server's cluster members.
Server Cluster Members Retrieval Routine for Sample Program CLUMON (clfunc.c) |
STATUS GetServerClusterMates ( char FAR pServerName, / server name /
DWORD dwLookupFlags, / lookup flags /
HANDLE hRetList / returned clustermates /
)
{
char szCanonServerName[MAXUSERNAME]; / Canonicalized Name of Server /
STATUS nError;
/ Canonicalize the servername if it isn't already done.
* The NSGetServerClusterMates requires a fully canonicalized servername
* as input.
/
nError = DNCanonicalize( 0L, NULL, pServerName, (char FAR)szCanonServerName,
MAXUSERNAME, NULL);
if (nError != NOERROR)
return (nError);
/ And call NSGetServerClusterMates /
nError = NSGetServerClusterMates( (char FAR )szCanonServerName,
dwLookupFlags, hRetList);
/ cleanup if error /
if (nError != NOERROR)
{
if (hRetList != NULLHANDLE)
{
OSMemFree( hRetList);
*hRetList = NULLHANDLE;
}
}
return( nError);
}
NOTE: If there is a successful return, the calling routine is responsible for freeing the allocated text list handle returned by NSGetServerClusterMates. As with the text list returned by NSPingServer, the calling routine should use the OS Memory routines to lock and unlock the text list handle and the Text List Manipulation routines to retrieve the list count and each list item.
Database Availability Attributes
The Domino server cluster environment allows you to manage database access without restricting server level access. You do this by modifying the database option flags to mark whether the file is in service for user open access.
Domino provides three database attributes that let you specify whether a database is available for user access:
Attribute | Description | API Routine |
Out of service | Users cannot open the database. New open database requests fail over to a replica if possible. | NSFDbMarkOutOfService |
In service | Use this attribute if you have marked a database out of service and now want to restore user access to the database. | NSFDbMarkInService |
Pending delete | After all users terminate their connection to the database, Domino pushes any changes to another replica and deletes the database. | NSFDbMarkForDelete |
If a database is marked in service, a database open request is permitted. If a database is marked out of service, open requests are restricted until the database is marked in service again. This way, an administrative task can be performed against an out of service database without restricting access to other databases on the server. The Cluster Database Directory Manager (CLDBDIR) server task manages the databases on the clustered server to enforce the appropriate restrictions based on the current database option marks.
The HCL C API for Domino and Notes provides the following three functions that allow administrative users (with Designer access) to mark a database that the CLDBDIR server task will manage. You can call these functions only against databases that reside on clustered servers.
NSFDbMarkInService - Marks a database file in service. The CLDBDIR task removes any database open access restrictions from a database file that is marked in service.
NSFDbMarkOutOfService - Marks a database file out of service. The CLDBDIR task restricts open access to a database file that is marked out of service.
NSFDbMarkForDelete - Marks a database file for pending deletion. The CLDBDIR task restricts access to a database file that is marked for deletion (the database is also automatically marked out of service) and permanently deletes the file after all active database sessions terminate. Marking a database file for deletion is irreversible, since you cannot programmatically mark a database file back in service. You must have Manager access to the database file to mark it for deletion.
The following database option flags support these functions:
DBOPTION_OUT_OF_SERVICE - Set by a successful NSFDbMarkOutOfService and a successful NSFDbMarkForDelete. Unset by a successful NSFDbMarkInService.
DBOPTION_MARKED_FOR_DELETE -Set by a successful NSFDbMarkForDelete.
NSFDbGetOptions retrieves database option flags. Since you must supply an open database handle when calling NSFDbGetOptions, the function returns a database access error for databases with the DBOPTION_OUT_OF_SERVICE option flag set.
For more information about NSFDbMarkInService, NSFDbMarkOutOfService, NSFDbMarkForDelete, and NSFDbGetOptions, and for DBOPTION_xxx symbol definitions, refer to the Reference.
The following code illustrates how the CLUMON sample calls NSFDbMarkInService, NSFDbMarkOutOfService, and NSFDbMarkForDelete. The routine makes the appropriate function call based on the checkbox control input set from the sample dialog box interface.
Routine to Mark Clustered Databases for Sample Program CLUMON (clfunc.c) |
char FAR *pDBName, /* database file name */
WORD wMarkFlag /* mark flag */
)
{
STATUS nError;
char szCanonServerName[MAXUSERNAME]; /* Canonicalized Name of Server */
char szNetPathName[MAXPATH]; /* Network Path of Database */
/* Canonicalize the Servername */
nError = DNCanonicalize( 0L, NULL, pServerName,
(char FAR *)szCanonServerName, MAXUSERNAME, NULL);
if (nError != NOERROR)
return nError;
/* Construct NetPath */
nError = OSPathNetConstruct (NULL, (char FAR *)szCanonServerName,
pDBName, (char FAR *)szNetPathName);
/* And call relevant NFSMark functions */
/* NSFDbMarkInService() */
if ( wMarkFlag & MARK_IN_SERVICE )
nError = NSFDbMarkInService ( (char FAR *)szNetPathName );
if (nError != NOERROR )
return nError;
/* NSFDbMarkOutOfService() */
if ( wMarkFlag & MARK_OUT_SERVICE )
nError = NSFDbMarkOutOfService ( (char FAR *)szNetPathName );
if (nError != NOERROR )
return nError;
/* NSFDbMarkForDelete() */
if ( wMarkFlag & MARK_DELETE )
nError = NSFDbMarkForDelete ( (char FAR *)szNetPathName );
return nError;
}
NOTE: This routine is called by the Windows message handler routine for the particular dialog box. The calling routine contains the logic that prevents users from having the "in service" checkbox checked at the same time as the "out of service" or "for delete" checkboxes. Also note that the MARK_xxx flags are defined in the sample header file clumon.h.
Clustered Database Failover
Failover is a cluster's ability to redirect requests from one server to another. When a user tries to access a database on a server that is unavailable or in heavy use, Domino can connect the user to a replica of the database on another server in the cluster. For information about user activities that can trigger failover, refer to the Domino Administration Help documentation.
To programmatically leverage the fault resiliency and workload balancing feature of a Domino server cluster, database open failover is supported if you call the API function NSFDbOpenExtended with the DBOPEN_CLUSTER_FAILOVER Options flag set. If one of the above failover conditions is present, the database open request automatically fails over to another server that is a member of the same cluster. This failover process is transparent to the caller. If the database open fails over and the caller wishes to determine the server/database that is actually opened, use the API function NSFDbPathGet function. To determine if failover occurs, the caller can then compare the PathName string specified in NSFDbOpenExtended with the retCanonicalPathName string returned by NSFDbPathGet.
For details about NSFDbOpenExtended and NSFDbPathGet, refer to the Reference.
Note that the same clustered database failover behavior described for NSFDbOpenExtended with the DBOPEN_CLUSTER_FAILOVER Option flag also occurs when a user opens the targeted database from a Notes client (Release 4.5 and later).
The following code illustrates how the CLUMON sample opens a database with failover support, determines whether database open failover has occurred and to which clustered server, and uses the database handle to retrieve the mark option flags.
Routine to Support Database Open Failover Sample Program CLUMON (CLFUNC.C) |
char FAR *pDBName, /* database file name */
DWORD *dwOptionMask, /* open option flags */
BOOL *bFailover /* TRUE if server failover */
)
{
STATUS nError;
HANDLE hDb; /* NSFDbOpenExtended parameters */
TIMEDATE dataNoteMod;
TIMEDATE nonDataNoteMod;
char szCanonServerName[MAXUSERNAME]; /* Canonicalized Name of Server */
char szNetPathName[MAXPATH]; /* Network Path of Database */
char szFailoverServerName[MAXUSERNAME]; /* Failover Server Name */
char szFailoverDBName[MAXUSERNAME]; /* Failover DB Name */
char szFailoverPathName[MAXPATH]; /* Expanded Failover Path Name */
/* Canonicalize the Servername */
nError = DNCanonicalize( 0L, NULL, pServerName,
(char FAR *)szCanonServerName, MAXUSERNAME, NULL);
if (nError != NOERROR)
return (nError);
/* Get the DB Options */
/* 1) Construct NetPath */
nError = OSPathNetConstruct (NULL, (char FAR *)szCanonServerName,
pDBName, (char FAR *)szNetPathName);
if (nError != NOERROR)
return nError;
/* 2) Open the Database via NSFDbOpenExtended (support server failover)*/
nError = NSFDbOpenExtended ((char FAR *)szNetPathName,
DBOPEN_CLUSTER_FAILOVER, NULLHANDLE, NULL,
&hDb, &dataNoteMod, &nonDataNoteMod );
if (nError != NOERROR )
return nError;
/* 3) Check for clustered server failover by getting the path name of the
opened database */
nError = NSFDbPathGet(hDb, (char FAR *)szFailoverPathName, NULL);
if (nError != NOERROR )
return nError;
/* parsing out the server and database names */
nError = OSPathNetParse ((char FAR *)szFailoverPathName, NULL,
(char FAR *)szFailoverServerName,
(char FAR *)szFailoverDBName);
if (nError != NOERROR )
return nError;
/* and comparing them to the specified parameters */
if (lstrcmpi((char FAR *)szCanonServerName, (char FAR *)szFailoverServerName))
{
/* Failover occured -> return new abbreviated server/db name */
*bFailover = TRUE;
nError = DNAbbreviate (0L, NULL, (char FAR *)szFailoverServerName,
pServerName, MAXUSERNAME, NULL);
lstrcpy (pDBName, (char FAR *)szFailoverDBName);
if (nError != NOERROR )
return nError;
}
else
*bFailover = FALSE;
/* 4) Get the Database options via NSFDBGetOptions */
nError = NSFDbGetOptions (hDb, dwOptionMask);
if (nError != NOERROR )
return nError;
/* 5) Close the Database and return */
nError = NSFDbClose (hDb);
return nError;
} ---