大部分情况下,更新nvidia gpu驱动不需要重启机器。如果你的驱动成功更新,但是使用nvidia-smi提示有Failed to initialize NVML: Driver/library version mismatch,一般情况下是因为更新的驱动没有被成功加载。 查看当前nvidia driver是否被使用 执行第二条命令可以直接列出正在使用gpu的程序。比如nv-hosten是DCGM的server端,直接kill或者使用nv-hostengine -t将其退出即可
more ...
Introduction NVIDIA Data Center GPU Manager (DCGM) is a suite of tools for managing and monitoring NVIDIA datacenter GPUs in cluster environments. It also provides APIs to let developers integrate it into their own GPU profiling/monitoring tools. Installation If you have
more ...