Announcement

Collapse
No announcement yet.

Can't enable NVIDIA Persistence Mode due to nvidia-smi memory bug (RTX 2070S, Kubuntu 25.04)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

    [Graphics] Can't enable NVIDIA Persistence Mode due to nvidia-smi memory bug (RTX 2070S, Kubuntu 25.04)

    I'm trying to enable NVIDIA persistence mode on my Kubuntu system, but I'm running into issues related to nvidia-smi apparently trying to allocate huge amounts of memory.

    Code:
    OS: Kubuntu
    Distributor ID: Ubuntu
    Description: Ubuntu 25.04
    Release: 25.04
    Codename: plucky
    GPU: NVIDIA GeForce RTX 2070 SUPER

    Code:
    ➜ ~ lspci -k | grep -A 2 -i vga
    01:00.0 VGA compatible controller: NVIDIA Corporation TU104 [GeForce RTX 2070 SUPER] (rev a1)
    Subsystem: ASUSTeK Computer Inc. Device 8708
    Kernel driver in use: nvidia
    My goal is to have NVIDIA persistence mode enabled (Persistence-M: On in nvidia-smi).

    Initially, `nvidia-smi` shows persistence is Off:

    Screenshot: [https://i.imgur.com/B1V2u3G.png](htt...om/B1V2u3G.png)

    When I try to enable persistence mode manually, it fails:

    Code:
    ➜ ~ sudo nvidia-smi -pm 1
    Unable to set persistence mode for GPU 00000000:01:00.0: Unknown Error
    Terminating early due to previous errors.
    While monitoring `sudo journalctl -f`, I see the following errors appear immediately after running `sudo nvidia-smi -pm 1`
    Code:
    Apr 18 17:15:42 pc sudo[5426]: george : TTY=pts/1 ; PWD=/home/george ; USER=root ; COMMAND=/usr/bin/nvidia-smi -pm 1
    Apr 18 17:15:42 pc sudo[5426]: pam_unix(sudo:session): session opened for user root(uid=0) by george(uid=1000)
    Apr 18 17:15:44 pc kernel: __vm_enough_memory: pid: 5428, comm: nvidia-smi, bytes: 51539607552 not enough memory for the allocation
    Apr 18 17:15:44 pc kernel: __vm_enough_memory: pid: 5428, comm: nvidia-smi, bytes: 51539709952 not enough memory for the allocation
    Apr 18 17:15:44 pc kernel: __vm_enough_memory: pid: 5428, comm: nvidia-smi, bytes: 51539742720 not enough memory for the allocation
    Apr 18 17:15:44 pc kernel: __vm_enough_memory: pid: 5428, comm: nvidia-smi, bytes: 51539607552 not enough memory for the allocation
    Apr 18 17:15:45 pc kernel: __vm_enough_memory: pid: 5428, comm: nvidia-smi, bytes: 51539607552 not enough memory for the allocation
    Apr 18 17:15:45 pc kernel: __vm_enough_memory: pid: 5428, comm: nvidia-smi, bytes: 51539607552 not enough memory for the allocation
    Apr 18 17:15:45 pc kernel: __vm_enough_memory: pid: 5428, comm: nvidia-smi, bytes: 51539709952 not enough memory for the allocation
    Apr 18 17:15:45 pc kernel: __vm_enough_memory: pid: 5428, comm: nvidia-smi, bytes: 51539742720 not enough memory for the allocation
    Apr 18 17:15:45 pc sudo[5426]: pam_unix(sudo:session): session closed for user root
    This shows `nvidia-smi` is attempting to allocate \~51GB of memory, which seems like a bug and is likely causing the `-pm 1` command to fail.

    I'm running `nvidia-driver-570-open` as is recommended by `ubuntu-drivers devices`
    Code:
    ➜ ~ ubuntu-drivers devices
    == /sys/devices/pci0000:00/0000:00:01.0/0000:01:00.0 ==
    modalias : pci:v000010DEd00001E84sv00001043sd00008708bc03sc00 i00
    vendor : NVIDIA Corporation
    model : TU104 [GeForce RTX 2070 SUPER]
    driver : nvidia-driver-570 - distro non-free
    driver : nvidia-driver-535-server-open - distro non-free
    driver : nvidia-driver-570-server-open - distro non-free
    driver : nvidia-driver-570-open - distro non-free recommended
    driver : nvidia-driver-535-server - distro non-free
    driver : nvidia-driver-570-server - distro non-free
    driver : xserver-xorg-video-nouveau - distro free builtin
    The command `ps auxww |grep [n]vidia-persistenced` returns the following
    Code:
    nvidia-+ 991 0.0 0.0 5448 2068 ? Ss 16:58 0:00 /usr/bin/nvidia-persistenced --user nvidia-persistenced --persistence-mode --verbose
    And `systemctl status nvidia-persistenced.service` returns the following
    Code:
    nvidia-persistenced.service - NVIDIA Persistence Daemon
    Loaded: loaded (/etc/systemd/system/nvidia-persistenced.service; enabled; preset: enabled)
    Active: active (running) since Fri 2025-04-18 16:58:12 EEST; 2h 39min ago
    Invocation: 7e2cc0656f4b4364998167d9e89b5da2
    Main PID: 991 (nvidia-persiste)
    Tasks: 1 (limit: 38278)
    Memory: 1M (peak: 1.7M)
    CPU: 6ms
    CGroup: /system.slice/nvidia-persistenced.service
    └─991 /usr/bin/nvidia-persistenced --user nvidia-persistenced --persistence-mode --verbose
    Apr 18 16:58:12 pc systemd[1]: Starting nvidia-persistenced.service - NVIDIA Persistence Daemon...
    Apr 18 16:58:12 pc nvidia-persistenced[991]: Verbose syslog connection opened
    Apr 18 16:58:12 pc nvidia-persistenced[991]: Now running with user ID 117 and group ID 122
    Apr 18 16:58:12 pc nvidia-persistenced[991]: Started (991)
    Apr 18 16:58:12 pc nvidia-persistenced[991]: device 0000:01:00.0 - registered
    Apr 18 16:58:12 pc nvidia-persistenced[991]: device 0000:01:00.0 - persistence mode enabled.
    Apr 18 16:58:12 pc nvidia-persistenced[991]: device 0000:01:00.0 - NUMA memory onlined.
    Apr 18 16:58:12 pc nvidia-persistenced[991]: Local RPC services initialized
    Apr 18 16:58:12 pc systemd[1]: Started nvidia-persistenced.service - NVIDIA Persistence Daemon.​
Working...
X