My video card crashes from time to time. It's quite annoying but I live with it -- usually I just restart the graphics with sudo systemctl restart lightdm.service
, or if needed reboot the whole system.
In this particular instance the systemctl
call hangs, and I don't want to reboot since I have a long-running job on the machine.
The crash is logged in dmesg
as
[944520.212254] Call Trace:
[944520.212256] [<ffffffff818384d5>] schedule+0x35/0x80
[944520.212257] [<ffffffff8183b625>] schedule_timeout+0x1b5/0x270
[944520.212280] [<ffffffffc0235244>] ? dce_v6_0_program_watermarks+0x514/0x720 [amdgpu]
[944520.212282] [<ffffffffc0196d2c>] kcl_fence_default_wait+0x1cc/0x260 [amdkcl]
[944520.212287] [<ffffffff815b4f50>] ? fence_free+0x20/0x20
Clearly the amdgpu
module crashed. I would like to restart it, so I tried
sudo modprobe -r amdgpu
modprobe: FATAL: Module amdgpu is in use.
And when I try to find out who is using amdgpu
I get
lsmod | grep amdgpu
amdgpu 2129920 7
amdttm 102400 1 amdgpu
amdkcl 32768 1 amdgpu
i2c_algo_bit 16384 1 amdgpu
drm_kms_helper 155648 1 amdgpu
drm 364544 10 drm_kms_helper,amdgpu,amdkcl,amdttm
Basically there is 7 "things" using the module and I have no idea how to find them and remove the amdgpu
module.
Question: Is there any reasonable way to reload the module, without rebooting the system? Or is there a better way to get my video back?