table of contents
pbs_(gpureset) | Local | pbs_(gpureset) |
NAME¶
pbs_ gpureset - reset GPU error counts
SYNOPSIS¶
#include <pbs_error.h>
#include <pbs_ifl.h>
int pbs_ gpureset(int connect, char *mom_node, int gpu_id, int ecc_perm, int ecc_vol)
DESCRIPTION¶
Issue a batch request for the pbs_mom to reset the ECC counts on one of it's Nvidia GPUs. The GPU's error count is reset by sending a GPU Control batch request to the batch server.
The argument, mom_node, specifies the host within the cluster on which the GPU is located. The argument is the name of a host that is a member of the cluster of hosts managed by the server.
The argument, gpu_id, specifies ID of the GPU on the MOM node.
The argument, ecc_perm, specifies whether or not to reset the GPU's permanent ECC error count. Value of 1 resets, value of 0 does not.
The argument, ecc_vol, specifies whether or not to reset the GPU's volatile ECC error count. Value of 1 resets, value of 0 does not.
This call requires PBS Operator or Manager privilege. It also requires that Torque be configured with --enable-nvidia-gpu.
SEE ALSO¶
qgpureset(1B)
DIAGNOSTICS¶
When the batch request generated by the pbs_ gpureset() function has been completed successfully by a batch server, the routine will return 0 (zero). Otherwise, a non zero error is returned. The error number is also set in pbs_errno.
3B |