I have a computer with:

  • System: Ubuntu 14.04
  • GPU: NVIDIA GTX1080ti

Around one year ago I installed the system and then installed CUDA8.0 with NVIDIA drivers on this computer. The GPU and CUDA has been working correctly until today when I tried to install a higher version of CUDA.

Because of some reasons I tried to install CUDA10.0 to substitute the current installed CUDA8.0. First I uninstalled the old drivers using nvidia-uninstall. And then uninstalled the old CUDA using /usr/local/cuda-8.0/bin/uninstall_cuda_8.0.pl. After these I installed CUDA10.0 along with the new driver, using the runfile installer downloaded from this page. However the installation was failed. After several unsuccessful debugging, I gave up, uninstalled the new drivers and new CUDA, and reinstall CUDA8.0 with the runfile installer downloaded from this page. The installation was successful. But I can't get anything about CUDA launched anymore, including pycuda, pyopencl and tensorflow. All these packages reported that they cannot find a GPU device.


Update:

I have tried to uninstall all the NVIDIA components by sudo apt-get --purge remove nvidia-*, as well as nvidia-uninstall and uninstall_cuda_8.0.pl. But the problem still remains. While the error reports and the system logs became different. Following are the current system logs:


Here are some of my system logs:

In python CLI, pycuda failed:

Python 2.7.6 (default, Nov 23 2017, 15:49:48) 
[GCC 4.8.4] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import pycuda.driver as cuda
>>> import pycuda.autoinit
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/pycuda/autoinit.py", line 5, in <module>
    cuda.init()
pycuda._driver.RuntimeError: cuInit failed: no CUDA-capable device is detected
>>> 

nvidia-smi reports:

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.26                 Driver Version: 375.26                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  ERR!                Off  | 0000:01:00.0      On |                  N/A |
| 28%   52C    P8    15W / 300W |     43MiB / 11168MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0      1868    G   /usr/lib/xorg/Xorg                              40MiB |
+-----------------------------------------------------------------------------+

dmesg | grep nvidia reports:

[    2.370841] nvidia: loading out-of-tree module taints kernel.
[    2.370844] nvidia: module license 'NVIDIA' taints kernel.
[    2.374116] nvidia: module verification failed: signature and/or required key missing - tainting kernel
[    2.380809] nvidia-nvlink: Nvlink Core is being initialized, major device number 242
[    2.383631] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms  375.26  Thu Dec  8 18:04:14 PST 2016
[    2.385803] [drm] [nvidia-drm] [GPU ID 0x00000100] Loading driver
[    2.717844] init: nvidia-prime main process (1094) terminated with status 127
[    7.447032] nvidia-modeset: Allocated GPU:0 (GPU-3727ccd9-f1fc-78c9-f908-5e1edf205194) @ PCI:0000:01:00.0
[   72.737634] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 241

nvidia-smi -a reports (NOTE that the Product Name column is Unknown Error):

==============NVSMI LOG==============

Timestamp                           : Thu Sep 27 10:16:41 2018
Driver Version                      : 375.26

Attached GPUs                       : 1
GPU 0000:01:00.0
    Product Name                    : Unknown Error
    Product Brand                   : GeForce
    Display Mode                    : Enabled
    Display Active                  : Enabled
    Persistence Mode                : Disabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 1920
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-3727ccd9-f1fc-78c9-f908-5e1edf205194
    Minor Number                    : 0
    VBIOS Version                   : 86.02.40.00.2E
    MultiGPU Board                  : No
    Board ID                        : 0x100
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : G001.0000.01.04
        OEM Object                  : 1.1
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization mode         : None
    PCI
        Bus                         : 0x01
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x1B0610DE
        Bus Id                      : 0000:01:00.0
        Sub System Id               : 0x11117377
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays since reset         : 0
        Tx Throughput               : 0 KB/s
        Rx Throughput               : 0 KB/s
    Fan Speed                       : 0 %
    Performance State               : P8
    Clocks Throttle Reasons
        Idle                        : Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
        Sync Boost                  : Not Active
        Unknown                     : Not Active
    FB Memory Usage
        Total                       : 11168 MiB
        Used                        : 43 MiB
        Free                        : 11125 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 5 MiB
        Free                        : 251 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 2 %
        Encoder                     : 0 %
        Decoder                     : 0 %
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending                     : N/A
    Temperature
        GPU Current Temp            : 43 C
        GPU Shutdown Temp           : 96 C
        GPU Slowdown Temp           : 93 C
    Power Readings
        Power Management            : Supported
        Power Draw                  : 14.68 W
        Power Limit                 : 300.00 W
        Default Power Limit         : 300.00 W
        Enforced Power Limit        : 300.00 W
        Min Power Limit             : 125.00 W
        Max Power Limit             : 330.00 W
    Clocks
        Graphics                    : 240 MHz
        SM                          : 240 MHz
        Memory                      : 405 MHz
        Video                       : 544 MHz
    Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Default Applications Clocks
        Graphics                    : N/A
        Memory                      : N/A
    Max Clocks
        Graphics                    : 1999 MHz
        SM                          : 1999 MHz
        Memory                      : 5505 MHz
        Video                       : 1708 MHz
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 1868
            Type                    : G
            Name                    : /usr/lib/xorg/Xorg
            Used GPU Memory         : 40 MiB

I can't figure out what's wrong with it, and how to solve this. Could anyone help me?

Your Answer

 

By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Browse other questions tagged or ask your own question.