I have an optimus laptop, and after the update to KDE6 optimus-manager stopped working. I needed a second display, and all my display outputs are on the Nvdia GPU, so I needed to switch. I tried many different X11 configs, envycontrol then more X11 configs, but I couldn’t get it working right, it would only be the internal display or the external one, not both. after a few hours I gave up and tried optimus-manager again. This time I checked the error log and it was failing to load the nvidia module, I tried loading it manually but I got a “No such device” error, which is where the title of the post comes in. My GPU has disappeared from linux, it won’t show up in lspci, lshw, nvidia-smi, or anything else it should. The only reference to the thing in dmesg I can find are :

[    0.216410] pci 0000:01:00.0: [10de:1ba1] type 00 class 0x030000
[    0.216419] pci 0000:01:00.0: reg 0x10: [mem 0xde000000-0xdeffffff]
[    0.216427] pci 0000:01:00.0: reg 0x14: [mem 0xc0000000-0xcfffffff 64bit pref]
[    0.216435] pci 0000:01:00.0: reg 0x1c: [mem 0xd0000000-0xd1ffffff 64bit pref]
[    0.216440] pci 0000:01:00.0: reg 0x24: [io  0xe000-0xe07f]
[    0.216445] pci 0000:01:00.0: reg 0x30: [mem 0xdf000000-0xdf07ffff pref]
[    0.216460] pci 0000:01:00.0: Enabling HDA controller
[    0.257300] pci 0000:01:00.0: vgaarb: bridge control possible
[    0.257300] pci 0000:01:00.0: vgaarb: VGA device added: decodes=io+mem,owns=none,locks=none
[    0.270521] pci 0000:01:00.1: D0 power state depends on 0000:01:00.0

and then nothing, it doesn’t even seem to try to load the nvidia module. I tried booting into windows and it shows up there fine, so the GPU didn’t randomly die.
As far as I can tell I’ve rolled back everything I did in my histfile until it stopped working, The only thing I could think is I upgraded my kernel to (6.7.9) from (6.6.10), could that have caused it? I also tried adding pcie_port_pm=off to the kernel params from the archwiki, but still nothing. I’m just at a loss here, anyone have any ideas?

EDIT: I’m using the nvidia-dkms package
EDIT2: one kernel downgrade later and it’s still not appearing, so thats not it.
EDIT3: fixed, see comments

  • nimmo@lem.nimmog.uk
    link
    fedilink
    English
    arrow-up
    1
    ·
    3 months ago

    I have the same issue on my desktop. I’d assumed it was something I’d done (it usually is) but I had to admit defeat and resort to switching to booting into a backup OS so that I could get on with all the tasks I need to get done but I’m assuming it was a problem with the Nvidia-dkms package that’ll be resolved in time as people have reported similar issues in the past.

    • taaz@biglemmowski.win
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 months ago

      I think I had this occur to me once and it was something really dumb but I can’t remember what.

      @thomasdouwes@sopuli.xyz just for the sake of trying everything, you could rebuild the dkms and initrams, then reboot:

      dkms autoinstall -F -a kernel-6.8.5-arch1 # change the kernel version according what you have now (read from uname -a)
      mkinitcpio -P
      

      E: Exhaustive of what I would try

      • check if drivers and modprobe blacklist make sense (this one is broad and requires digging into arch wiki but the optimus laptop I had required blacklisting some drivers from early loading afaik)
      • fiddle with re-scans and power states in the sys bus PCI folders for the GPU
      • check that my mkinitcpio makes sense, additionally look for .pacnew (/etc/mkinitcpio.conf.pacnew) and see if the changes might affect the system
      • downgrade kernel - already tried
      • downgrade dkms packages
      • update BIOS and firmwares from windows
      • cold boot the laptop (shutdown, remove AC and battery, leave it cold for few seconds)
      • on windows, look into ROG Armoury/MSI Center for any kind of toggles that could have impact on the GPUs (iGPU/dGPU) stuff like power states, optimizations etc)
      • Thomas Douwes@sopuli.xyzOP
        link
        fedilink
        arrow-up
        3
        ·
        3 months ago

        Looks like you where right about the udev rules earlier, I ran a pacman command to find all untracked files in /usr and I found /usr/lib/udev/rules.d/50-remove-nvidia.rules was there. Contents:

        # Automatically generated by EnvyControl
        
        # Remove NVIDIA USB xHCI Host Controller devices, if present
        ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c0330", ATTR{power/control}="auto", ATTR{remove}="1"
        
        # Remove NVIDIA USB Type-C UCSI devices, if present
        ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x0c8000", ATTR{power/control}="auto", ATTR{remove}="1"
        
        # Remove NVIDIA Audio devices, if present
        ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x040300", ATTR{power/control}="auto", ATTR{remove}="1"
        
        # Remove NVIDIA VGA/3D controller devices
        ACTION=="add", SUBSYSTEM=="pci", ATTR{vendor}=="0x10de", ATTR{class}=="0x03[0-9]*", ATTR{power/control}="auto", ATTR{remove}="1"
        
        

        looks like EnvyControl left some extra files after uninstalling.
        Personally, I think it’s pretty weird that it put runtime files in /usr/lib, if they where in /etc I would have found them quickly.
        The GPU is back on the bus now and I can run optimus-manager to get my extra screen. Thank you for the help troubleshooting this issue.

    • Thomas Douwes@sopuli.xyzOP
      link
      fedilink
      arrow-up
      0
      ·
      3 months ago
      [ 1501.764754] nvidia-nvlink: Nvlink Core is being initialized, major device number 234
      [ 1501.764761] NVRM: No NVIDIA GPU found.
      [ 1501.765791] nvidia-nvlink: Unregistered Nvlink Core, major device number 234