A
Understanding Clusters and Queues
A cluster is a set of hosts working together to balance the job load. Each cluster is controlled by a daemon process called cdsqmgr. Jobs from different applications are submitted to the cdsqmgr, which sends the jobs to the hosts in the cluster.
Following are the best practices for farm software:
- All involved machines must use a common path to access file systems. For example, the Cadence software should be available on all machines using the same file path.
- All involved machines must share common user-account information. For a given account name, userId, groupId, and home directory should not vary between machines.
- All machines must be able to access the user's home directory using the same file path.
- It is recommended that the farm machines use common file servers for data, rather than being dependent upon each other. In terms of reliability, this becomes more important as the number of farm machines increases.
cdsqmgr is not used. For more information about the daemon process involved in LSF, see LSF documentation available at www.platform.com.-
In a terminal window, type
cdsqmgr
where configPath is the path to a configuration file that lists the queues and the hosts (available for each queue) on which you want to run the jobs.configPath
Typically, you can startcdsqmgron a machine and all the applications can use thiscdsqmgr.
ssh to start cdsqmgr, access must be enabled using the fully qualified domain name for each host, otherwise ssh will not be able to resolve the name to the correct IP address.How Applications Connect to cdsqmgr
You can set the LBS_CLUSTER_MASTER environment variable to control the cdsqmgr to which your application connects. This variable should be set to the name of the host on which the cdsqmgr resides. This host is also known as the cluster master. The default cluster master is the local host.
The following logic determines how applications connect to the cdsqmgr.
-
The application can connect to
cdsqmgrresiding on the cluster master using login_name. login_name is the login name of the person who attempts to launch the application. -
If there is no instance of
cdsqmgrrunning as login_name, the application attempts to connect tocdsqmgrrunning as root on the cluster master. -
If there is no instance of
cdsqmgrrunning as root, an instance ofcdsqmgris automatically started up on the cluster master, and the application connects to it. Becausecdsqmgrwas started using the login name of the person who attempts to launch the application, it continues to run as login_name.
Becausecdsqmgris started automatically, a configuration file cannot be specified. In this case, only the DEFAULT queue is considered to have been configured in the cluster.
To balance loads across all the users in a cluster, the person who is logged in as root must start up cdsqmgr on a known cluster master. The users should set the LBS_CLUSTER_MASTER environment variable to this cluster master. They will then connect to the same cdsqmgr, which will balance the load across all users’ jobs.
If each user were to connect to a separate cdsqmgr, the load would be balanced only across each user’s jobs.
Return to top