• After 15+ years, we've made a big change: Android Forums is now Early Bird Club. Learn more here.

Just had a scary Linux moment!

nickdalzell

Extreme Android User
I had left my gaming rig on for a month this Summer just in case I want to quickly play some games to de-stress after work (I also keep it offline to avoid updates, and don't play multiplayer PC games anyway). It's a CyberPower PC that I wiped Windows and put Linux (Ubuntu GamePack 20.04) onto it. Other than one game (Flight Simulator 2020) everything works well.

I recently shut it down to clean the dust out, and it was a proper shutdown (sudo shutdown -h now) and it indeed powered off fine. A little over a month later I felt like de-stressing with some Farming Simulator 17, First bootup, black screen, nothing, not even the CyberPower PC logo. That's fine, it's done this before, something about DisplayPort. Turned it off, back on again. Saw the logo this time, GREAT! But oh, it's not. Now it's stuck at the spinning boot logo. Hit ESC. It's showing a ton of messages about MDATM (RAID?! I ain't got no raid!) and then it freezes after showing that my /dev/sdb2 filesystem is 'clean' right after a message saying 'recovering journal' and the message before that 'Gave up waiting for suspend/resume device'

Huh? I know I did a halt command, not a sleep command! That's gotta be a mistake! I couldn't have 'broken' Linux just by a darned shutdown could I?

Tried the noresume boot option. Well, now that message is gone, but it still freezes 'recovering journal' and 'filesystem clean'.

Try recovery boot. OK so now I can get a prompt. Tried an FSCK. NOPE, it gives up outright ain't gonna have none of that! Odd, I've done this before and it worked fine with the force flag, ain't gonna touch my SSD huh? OK I'll try something else.

Read some posts online that this type of boot freeze can be a full or almost full filesystem. Huh? The SSD is 250GBs and I know I ain't used all that up yet! Plus this system has never gone online since I installed all my Steam contents over a year ago! Ain't nothing changed bruh!

But I went along and humored the Ask Ubuntu post. "df -h". Sure enough my '/' which is /dev/sdb2 is showing as ZERO bytes free. HOW?! I didn't do nothin! I try deleting everything in /var/log and /tmp. Got 1GB back. But still showing as 'zero bytes free' (0% of 16GB on /) What is going on here?

A few hours later, and long past my game time (about time to go to bed! so much for de-stressing, now I'm seriously worried that many hours of gameplay and tons of mods are now gone!) I stumble on a post that uses tune2fs to give root 0 bytes 'reserved' whatever that is. I try it, and WOW now it boots normally! Quickly run BleachBit as admin and remove any packages that I don't need, and now it's back to the 3/4 used state.

What happened? This system never goes online and there's nothing running that should be eating space up. Nothing changes except me saving games which only affects my /home partition. I don't have any clue what filled up root or if I should worry it's gonna happen again? Maybe I should just keep this system on a UPS and on forever?
 
This is a puzzler. Odd that fsck wasn't able to do anything but you were able to use tune2fs to fix the problem. As for shutdown being the source of the problem, I doubt that it was. The problem was already present and didn't become evident until you eventually booted the system back up, or in other words shutdown was incidental but something did corrupt the file system table.
Looking through man tune2fs the -m option flag shows:

"-m reserved-blocks-percentage
Set the percentage of the filesystem which may only be allocated by privileged processes. Reserving some number of filesystem blocks for use by privileged processes is done to avoid filesystem fragmentation, and to allow system daemons, such as syslogd(8), to continue to function correctly after non-privileged processes are prevented from writing to the filesystem. Normally, the default percentage of reserved blocks is 5%."

For several years now I've always used tune2fs to set the reserve down to 2-3%, which for today's multi-gigabyte drives adds a few GBs of free storage and still saves 2 or 3 percent of reserved blocks. But zero does seem risky, leaving nothing for when bad blocks will occur through normal usage.

I'd still boot up from a boot disc/media and run fsck a couple of times. And if you have smartmontools installed, run a long test with smartctl.and see if that might reveal anything out of sorts.
Could be just some one-off glitch though. I recall my /tmp filled up completely because of some Adobe Flash plugin issue several years ago. So glad Flash is dead. But that wasn't as dramatic as your story, I just had to clean out /tmp.
 
Fsck wouldn't run because there literally was no space left on the drive to allow it to scan. It needed some free space to do its thing and there was none. I mean zero bytes free. It's still holding somewhere around 3/4 used up, which is where it should be but something is running amok filling it up slowly. I just can't locate them. I can't delete anything in / without breaking the system since that's all critical system services, and everything in /home where I live (and where the games run) is on a separate partition and it's not even 85% used yet. Deleting stuff from /home or /var does not even affect root at all. BleachBit found and deleted some 200MiB of data but I can't do that in a terminal since I can't run that without a GUI.

I just have to watch the space widget in my panel. I used to get a lot of pop-ups about 'WARNING Root filesystem at "/" is almost full" and dismissed it as a bug since nothing should be filling it up, and turned that annoyance off.

I still don't know what is filling it up though. It's gradual but cumulative. Only thing I could find online refers to 'old Kernels' but since it never goes online, that isn't possible since it will never download updates. logs are only taking up up to 1GiB of space, which I got back in terminal, but don't affect the percentage of space on /dev/sdb

I did notice when attempting to boot last night that I get some hang where it shows:

a start job is running for dev/disk by uuid (1m 30s) for something that I never saw before that suddenly increased boot time from a mere second or two to 5 minutes, but nothing is out of whack in /etc/fstab that I can find.

There's tons of 'core' files, but I am unsure if I can safely delete those. They look like important stuff. Running Synaptic and removing pre-installed clutter (ain't gonna use LibreOffice on a gaming rig or the GIMP) doesn't seem to affect the root filesystem. It's set to the standard 16GiB and shouldn't be filling up like this. But the system, games and all, ran fine prior to shutdown. The only odd thing that happened was my monitor kept going to sleep and I specifically set the system to NOT sleep the monitor, but it would still wakeup on a keypress.

There was nothing showing up in /tmp.

Output of df -h as of now: (sorry for bad text alignment can't get it to cooperate)

Code:
Filesystem    Size     Used     Avail    Use %     Mounted on
udev              16G          0        16G       0%        /dev
tmpfs             3.2G      3.5M     3.2G      1%        /run
/dev/sdb2      16G       14G      1.6G     90%       /
tmpfs             16G      206M     16G      2%        /dev/shm
tmpfs             5.0M      4.0K     5.0M     1%        /run/lock
tmpfs             16G           0       16G      0%        /sys/fs/cgroup
/dev/loop0     128K     128K        0        100%    /snap/bare5
/dev/loop2      62M       62M        0        100%    /snap/core20/1361
/dev/loop1      56M       56M        0        100%    /snap/core18/2284
/dev/loop3      219M     219M      0        100%    /snap/gnome-3-34-1804/72
/dev/loop5      62M        62M       0        100%    /snap/core20/1518
/dev/loop4      56M        56M       0        100%    /snap/core18/2409
/dev/loop7      219M     219M      0        100%    /snap/gnome3-34-1804/77
/dev/loop10    82M        82M       0        100%    /snap/gtk-common-themes/1534
/dev/loop12    51M        51M       0        100%    /snap/snap-store/547
/dev/loop13    47M        47M       0        100%    /snap/snapd/16010
/dev/loop11    249M      249M     0        100%    /snap/gnome-3-38-2004/99
/dev/loop6      255M      255M     0        100%    /snap/gnome-3-38-2004/106
/dev/loop9      55M          55M     0        100%    /snap/snap-store/558
/dev/loop8      66M          66M     0        100%    /snap/gtk-common-themes/1519
/dev/loop14    44M          44M     0        100%    /snap/snapd/14978
/dev/sdb3       397G        308G   69G    82%     /home  *(on a separate HDD)
tmpfs              3.2G         40K     3.2G     1%     /run/user/1000
 
Last edited:
When a computer fails to boot after sitting for a period of time, the first thing I look at is the CMOS/CMOS battery. If it's marginal or dead it could corrupt or reset your CMOS making it read memory and/or storage incorrectly - especially if you had changed any settings from the default.
 
Ignoring warning messages and not updating are generally not going to help keep any operating system stable and reliable.

Do you have your /home in a separate partition? Might be time to just do a clean install, with a current OS version.
 
Not only is my /home on a separate partition, it's on a separate hard drive. I think I heard that recommendation from our very own MoodyBlues.

I want to find out what's filling up root first. I don't worry about updates because I don't believe in them (long story that goes well outside the scope of Linux or this thread) but the system is NEVER on any network or the internet. There is no risk of security or stability as it remains forever standalone.

The boot time is increased, but it boots. I don't know what 'start job for /dev/disk by uuid' means and it's done this since February this year. It does it about three times during booting, and then goes to the login screen. When it was freezing it was due to a root filesystem overflow from lack of space. It never kernel panicked, it would still respond to the keyboard and reboot, and the only screen output referred to the filesystem being 'clean' which should be a good thing.

I could resize the partition for / to over 100GiB if I wanted, but that doesn't solve the problem of it eventually filling up with god knows what. Just delays it further into the future.

I've had this Linux system installed since 2019 when I bought the PC. I stand by the fact that nothing has changed since. First thing I did when I got the fresh system installed then was download Steam from apt and get all my library downloaded, installed, and all the video card tools installed, then pulled the plug on the internet. I just wanted something I could fire up to play games. That's all it does. Boots up, can launch Steam offline, and play my entire library. I kept it as minimalist as possible to allow all my resources to be for gaming. I don't twitch, or Discord or multiplayer. I just enjoy offline titles.
 
UPDATE:

I found the culprit. After removing snaps (WTF Ubuntu?!) my root filesystem is only HALF used. NOT 90%+.
 
Back
Top Bottom