In the mid nineties I worked in a research institute. There was a large shared Novell drive which was always on the verge of full. Almost every day we were asked to clean up our files as much as possible. There were no disc quota for some reason.<p>One day I was working with my colleague and when the fileserver was full he went to a project folder and removed a file called balloon.txt which immediately freed up a few percent of disk space.<p>Turned out that we had a number of people who, as soon as the disk had some free space, created large files in order to reserve that free space for themself. About half the capacity of the fileserver was taken up by balloon.txt files.
For everyone saying "This isn't a real solution!" I'd like to explain why I think you're wrong.<p>1) It's not intended to be a Real Solution(tm). It's intended to buy the admin some time to solve the Real Issue.<p>2) Having a failsafe on standby such as this will save an admin's butt when it's 2am and PagerDuty won't shut up, and you're just awake enough to apply a temp fix and work on it in the morning.<p>3) Because "FIX IT NOW OR ELSE" is a thing. Okay, sure. Null the file and then fill it with 7GB. Problem solved, for now. Everybody is happy and now I can work on the Real Problem: Bob won't stop hoarding spam.<p>That is all.
Same idea as this game development legend<p><a href="https://www.dodgycoder.net/2012/02/coding-tricks-of-game-developers.html" rel="nofollow">https://www.dodgycoder.net/2012/02/coding-tricks-of-game-dev...</a><p>> he had put aside those two megabytes of memory early in the development cycle. He knew from experience that it was always impossible to cut content down to memory budgets, and that many projects had come close to failing because of it. So now, as a regular practice, he always put aside a nice block of memory to free up when it's really needed.
A lot of tips in this thread are about how to better alert when you get low on disk space, how to recover, etc. but I'd like to highlight the statement: "The disk filled up, and that's one thing you don't want on a Linux server—or a Mac for that matter. When the disk is full nothing good happens."<p>As developers, we need to be better at handling edge cases like out of disk space, out of memory, pegged bandwidth and pegged CPU. We typically see the bug in our triage queue and think in our minds "Oh! out of disk space: Edge case. P3. Punt it to the backlog forever." This is how we get in this place where every tool in the toolbox simply stops working when there's zero disk space.<p>Especially on today's mobile devices, running out of disk space is common. I know people who install apps, use them, then uninstall them when they're done, in order to save space, because their filesystem is choked with thousands of pictures and videos. It's not an edge case anymore, and should not be treated as such.
The author seems to forget that ext-based filesystems keep 5% of disk space available for root at all times by default, known as "reserved blocks".[0] That means if a non-root user uses all of the available space, it wasn't really all of the space -- root still has access to 5% free space within that partition. That's exactly the same as the useless 8GB file but in an officially-supported manner. If you run out of disk space, you actually have 5% left for root. So log in as root and fix the issue. Simple.<p>Also:<p>> On Linux servers it can be incredibly difficult for any process to succeed if the disk is full. Copy commands and even deletions can fail or take forever as memory tries to swap to a full disk and there's very little you can do to free up large chunks of space.<p>Why would memory swap to disk when the disk is full? I feel like the author is conflating potential memory pressure issues with disk issues.<p>How many serious production-grade servers even use swap, which usually just causes everything to grind to a halt if memory becomes full?<p>[0] <a href="https://ma.ttias.be/change-reserved-blocks-ext3-ext4-filesystem-linux/" rel="nofollow">https://ma.ttias.be/change-reserved-blocks-ext3-ext4-filesys...</a>
One other option is increasing the reserved block count ( <a href="https://ma.ttias.be/change-reserved-blocks-ext3-ext4-filesystem-linux/" rel="nofollow">https://ma.ttias.be/change-reserved-blocks-ext3-ext4-filesys...</a> ). This has the nice side effect of increasing the space available for critical daemons.<p>If you haven't customised this, in a pinch you can still lower it down a bit to buy some time.
> On Linux servers it can be incredibly difficult for any process to succeed if the disk is full. Copy commands and even deletions can fail or take forever as memory tries to swap to a full disk and there's very little you can do to free up large chunks of space.<p>This reasoning doesn't make sense. On Linux, swap is preallocated. This is true regardless of whether you're using a swap partition or a swap file. See man swapon(8):<p>> The swap file implementation in the kernel expects to be able to write to the file directly, without the assistance of the filesystem. This is a problem on files with holes or on copy-on-write files on filesystems like Btrfs.<p>> Commands like cp(1) or truncate(1) create files with holes. These files will be rejected by swapon.<p>I just verified on Linux 5.8.0-48-generic (Ubuntu 20.10) / ext4 that trying to swapon a sparse file fails with "skipping - it appears to have holes".<p>Now, swap is horribly slow, particularly on spinning rust rather than SSD. I run my systems without any swap for that reason. But swapping shouldn't fail on a full filesystem, unless you're trying to create & swapon a new swapfile after the filesystem is filled.
Ah, the classic 'speed-up loop' approach: <a href="https://thedailywtf.com/articles/The-Speedup-Loop" rel="nofollow">https://thedailywtf.com/articles/The-Speedup-Loop</a><p>About the blogpost itself:<p><i>The disk filled up, and that's one thing you don't want on a Linux server—or a Mac for that matter. When the disk is full nothing good happens.</i><p>I had this happen few times on a Mac and every time I was shocked that if disk gets full you cannot even delete a file and the only option is to do a full system reboot. I was also unable to save any open file, even to external disk and suffered minor data loss every time due to that.<p>What is the proper way of dealing with such issue on macOS? (or other systems, if they behave the same way)
One thing that many Linux/Unix users do not know is that all commonly used filesystems have a "reserved" amount of space to which only "root" can write. The typical format (mkfs) default is to leave 5% of the disk reserved. The reserved space can be modified (by root) any time, and it can be specified as a block count or a percentage.<p>As long as your application does not have root privileges, it will hit the wall when the free+reserved space runs out. Instead of the clumsy "spacer.img" solution, one could simply (temporarily) reduce the reserved space to quickly recover from a disk full condition.
This reminds me of Perl’s esoteric $^M variable. You assign it some giant string, and in an out-of-memory condition, the value is cleared to free up some emergency space for graceful shutdown.<p>“To discourage casual use of this advanced feature, there is no English long name for this variable.”<p>But the language-build flag to enable it has a great name: -DPERL_EMERGENCY_SBRK, obviously inspired by emergency brake.
I happen to run a couple of small servers myself and here's a better version of this approach. Create a cron job that will run a simple self-testing script once every few hours. My self-test does this:<p>1. Checks that all domains can be accessed via HTTP and HTTPS. If not, DNS might have died.<p>2. Checks that a few known CMS-generated pages contain some phrases they should contain. If not, SQL might have died.<p>3. Checks that the HTTPS certificate has enough runway left. If not, certbot might have died.<p>4. Sends a basic email message from my domain to a gmail account. Receives it via IMAP and sends a reply. Then, verifies the reply. This catches a whole bunch of mail-related issues.<p>5. Checks the free RAM and disk space. Updates an internal "dashboard" page and sends me an email if they are off.<p>It only took a couple of hours to hack this together and I must say, I get a much better night time sleep ever since.
Lots of comments assailing this approach as a poor replacement for monitoring miss the point. Of course monitoring and proactive repair are preferable - but those are systems that can also fail!<p>This is a low cost way to make failure of your first line of defense less painful to recover, and seems like a Good Idea for those managing bare-metal non-cattle systems.
This reminds me of an old gamedev story that I have no idea how to find. The project was getting near to shipping, they had cut all the space they could cut, but they still needed another megabyte of space. After a week of this, the senior dev told the narrator to meet him in his office, and he closed the door. He opened one of the project files and deleted a 1 MB static array. "At the beginning of development I always reserve space for just this occasion," he said. Shortly afterwards he emerged from his office, announced that he had been able to find some extra space, and was lauded as a hero.
When I worked at SevOne, we had 10x500 MB files on each disk that were called ballast files. They served the same purpose, but there were a couple nice tools built in to make sure they got repopulated when disk space was under control, plus alerting you whenever one got "blown." IIRC it could also blow ballast automatically in later versions, but I don't remember it being turned on by default.
This is why the invention of LVM was such a good idea even for simpler systems (where some people claimed it was useless overhead). In my old sysadmin days I <i>never</i> allocated a full disk. The "menace" of an almost full filesystem was usually enough to incentivize cleanups but, when necessity came, the volume could be easily expanded.<p>I guess a big file is not a bad idea either.
Since the late 90s, this was always my solution:<p><pre><code> tune2fs -m 2 /dev/hda1
</code></pre>
That sets the root reserve on a disk. It's space that only root can use, but also you can change it on the fly. So if you run out of userland space you can make it smaller, and if a process running as root fill your disk, well, you probably did something real bad anyway. :)<p>But yeah, this is a pretty good hack.
How is this better than sounding alarms when free disk space drops below 8GB? If you’re going to ignore the alarms, then you’re going to have the same problem after you remove your spacer file and the disk fills up again!
An architect once told me that he always plans for a solid gold block hidden away in the cellar.<p>Once the project invariably goes over budget, he drops the plans for the gold and frees up extra funds.<p>Edit: I think it was a large marble slab. Same thing.
My understanding is this is why one should partition a drive. If you have a data partition, a swap partition, and an OS partition, you can get around issues where a server’s lack of disk space hoses the whole system.
If you happen to use ext as your default filesystem, check the output of tune2fs; it's possible your distro has conveniently defaulted some 2-5% of disk space as "reserved" for just such an occasion. As the root user, in a pinch, you can set that to 0% and immediately relieve filesystem pressure, buying you a little bit more time to troubleshoot whatever the real problem is that filled the disk in the first place.
This points to a much more serious problem. This is 2021 and the technology is from the 90s, with a really poor user experience design. Your car warns you when you're low on fuel, but your server doesn't if you're low on critical resources.
No. Careful partitioning is the solution to this problem. Monitor the growth of your partitions and make sure nothing on rootfs or other sensitive partitions grow significantly.
To extend space in any filesystem in the root volume group on AIX you need space in /tmp. Years ago while working for some major bank I proposed to create such dummy file in /tmp exactly for the reason of extending filesystem. It saved us several times :)
Back in my early university days the disks always seemed to be full at inconvenient times on the shared Unix systems we used. Some students resorted to "reserving" disk space when available. Which of course made the overall situation even worse.
All my servers have an alarm when disk space goes above 70%. It sends an email every hour once the disk usage goes above 70%. Never had a server go down because of disk space issue after adopting this practise.<p>Also one of the main reasons server disks go full is generally log files. Always remember to "logrotate" your log files and you will not have this issue that much.<p>Yes one more thing, for all user uploaded files use external storage like NFS or S3.
This really goes to show, there is more than one way to skin a cat. Yeah the guy could probably overhaul his entire approach to system administration, but also...this works. Well-placed hacks are maybe my favorite thing.
This won't work with ZFS, as it may be impossible to delete a file on ZFS when disk is full. The equivalent in ZFS is to create an empty dataset with reserved space.
It's interesting to me that linux doesn't natively reserve a little space to allow basic commands like directory listing and file deletion to function even with a full disk.<p>Because really the biggest problem when I've had a partition get full, is I sometimes can't even delete the offending log file.
Linux has this built in...<p>By default, only root can use the last 5% of disk space.<p>That means you can fire up a root shell and know you have a buffer of free space to resolve the issue.
This is an old trick for when you need to deploy to media with a fixed size - floppy/CD-ROM/etc. Make a file that is 5-10% the size of your media and don't remove unless you're running out of space in crunch time.
An alternative approach here... make sure (all) your filesystems are on top of LVM. This reduces the steps needed to grow your free space. Whether you have a 8gb empty file laying around, or an 8gb block device to attach...LVM will happily take them both as pv's, add them to your vg's, and finally expand your lv's.<p>some reading if LVM is new and you want to know more: <a href="https://opensource.com/business/16/9/linux-users-guide-lvm" rel="nofollow">https://opensource.com/business/16/9/linux-users-guide-lvm</a><p>edit to add: pv=physical volume, vg=volume group, lv=logical volume
if you are on an ext filesystem, reducing the reserved percentage on the full filesystem can save the day. its more or less this same trick built in to the filesystem<p>IIRC 5% is reserved when the filesystem os created, and if it gets full you can run:<p>tune2fs -m 4 /dev/whatever<p>which will instantly make 1% of the disk available.<p>of course should be used sparingly and restored when finished
A great idea, but it still leaves the possibility for performance issues prior to an admin's ability to address is. Some like two 4gb blocks might work better: if you get within, say, 200mb of storage limits you remove the first one and trigger an email/text/whatever to the admin, that way they can address it before it goes further. It's an early warning and automated solution. Then, if the situation continues, the second 4gb block is also automatically removed with another message send to the admin. Nothing fails silently.
This is why I insist on data and root partitions on all the machines I administer. Go ahead and kill the data partition, at least the root partition will keep the system up and running.
For ext* filesystems, you can use tune2fs to change the reserved block percentage to accomplish this in what might - depending on your preferences - be a more graceful way.<p>Basically it lets you knock 8 GB or more (although it's a percentage instead, 5% by default) off of the disk space available to non-root users.<p>When it hits 100% and things start breaking, that reserve can be used by root to do compression safely, move things around, and so on. Alternatively the reserve percentage can be changed with a single command (by root), to allow non-root processes more space while the admin contemplates what do do next.<p>One nice aspect of using the reserve instead of a file is that it prevents runs of "du" for including the file in their results. Another is that it's pretty much impossible to accidentally remove the reserve (or for some other admin to find it and decide it's superfluous).<p>This is less effective at sites that have a lot of services running as root, in which case only your approach is fully effective. I want to say "But who <i>does</i> that nowadays...", but it happens.<p>tune2fs apparently also supports allowing members of a certain unix group or user to have access instead of solely root.<p>The core command for all this is:<p><pre><code> tune2fs -m <reserved-percent> <device>
</code></pre>
One other thing you might want to worry about: inode exhaustion. tune2fs has an inode reserve % as well - and trying to emulate this by creating a few hundred thousand files instead would be... inelegant.
The real question... Why does Linux or at least the common filesystems get stuck so easily running out of disk space? Surely normal commands like `rm` should still function.
Once upon a time, I wanted to cache large and expensive to pull files on many thousands of servers. Problem is the disk space on these servers was at premium and meant to be sold for customer use. The servers did have scratch space on small disks, but that was used by the OS.<p>So I wrote an on-disk cache system that would monitor disk usage, and start to evict and shrink its disk space usage. It would take up to N gigabytes of disk (configurable) for the purpose of caching, and maintain an M gigabytes free-disk-space buffer.<p>Say you had a 100 GiB total space on a partition, with 8 GiB used for cache with a 2 GiB headroom. As legitimate/regular (customer) space usage increased and reached 91 GiB, the cache would see 9 GiB available, and removing the 2 GiB buffer, would start to evict items to resize to 7 GiB, and so on until it had evicted everything.<p>When this system deployed, it started to trigger low-disk-space alerts earlier than before. At first that seemed like a problem, but the outcome is that we were now getting low-disk-space alerts with more advance warning, and the cache bought some time as it kept resizing down to free up space. It kind of, in a way, served the same purpose as described in this blog post.<p>Overall this cache was pretty neat and still is, I bet. There's probably ways to do similar things with fancy filesystems (or obscure features) but this was a quick thing to deploy across all servers without having to change any system setting or change the filesystem.<p>I sometime wish I had done this in open-source, because it would be convenient to use locally on my laptop, or on many servers.
hope you're not running -o compress=lz4 , because you are going to be in for a big surprise when you try to pull this emergency lever! you may be shocked to see you don't actually get much space back!<p>i do wonder how many FS would actually allocate the 8GB if you, for example, opened a file, seeked to 8GB mark, and wrote a character. many file systems support "sparse files"[1]. for example on btrfs, i can run 'dd if=/dev/zero of=example.sparse count=1 seek=2000000' to make a "1GB" file that has just one byte in it. btrfs will only allocate a very small amount in this case, some meta-data to record an "extent", and a page of data.<p>i was expecting this article to be about a rude-and-crude overprovisioning method[2], but couldn't guess how it was going to work. SSDs notably perform much much better when they have some empty space to make shuffling data around easier. leaving a couple GB for the drive to do whatever can be a colossal performance improvement, versus a full drive, where every operation has to scrounge around to find some free space. i wasn't sure how the author was going to make an empty file that could have this effect. but that's not what was going on here.<p>[1] <a href="https://wiki.archlinux.org/index.php/sparse_file" rel="nofollow">https://wiki.archlinux.org/index.php/sparse_file</a><p>[2] <a href="https://superuser.com/questions/944913/over-provisioning-an-ssd-does-it-still-hold/944915" rel="nofollow">https://superuser.com/questions/944913/over-provisioning-an-...</a>
Oh man, reminds me of a Game Dev war story I read years ago. This purportedly happened in those console days with very limited memory capabilities.<p>In some game studio, as a project neared its release, the team was still struggling with memory issues. No matter what they did, they had a surplus of just about 2MB. The artists have reduced their polygon counts drastically, the programmers have checked every possible leak, have optimized algorithms and buffers the best they could but the 2MB surplus just kept haunting them.<p>That's when the VP of Engineering stepped-in. Calling the TL of the project into a closed-doors optimization code-review, they had the source code on a large screen and the TL talked the VP through everything the team has done so far to stay within the memory budget.<p>As the TL finished the walkthrough, the VP opens some mother-of-all files and deletes a cryptic variable declaration to the effect of:<p><pre><code> int toLiveBuffer[2000000];
</code></pre>
The VP then explains that he hid this declaration in their codebase after a project that had to optimize drastically late into the development cycle. But first he wanted to make sure that the team did their homework.<p>And poof. They emerge from the closed-doors meeting jubilant and victorious. The game is ready for prime time!
A fun problem on a Mac is that if you're using APFS for your filesystem, if it fills up, you can't delete any files. It's caught me out a handful of times, and each time, the only way to recover is to reboot, and thankfully I've had more free disk space after a reboot.<p>I'm not going to try to understand the logic as to why APFS requires free space in order to delete files (via any method, including dd)
In theory, this is a good idea, but doesn't protect you in all cases. I have had instances on a few of my application servers where an event happened that dumped GB's worth of log data to the log files in a matter of a couple of minutes and filled up the drive (Thanks fast SSDs!). If I employed the strategy in the article, it would have only bought me a couple of more minutes worth of time, if that!
> even deletions can fail or take forever<p>> in a moment of full-disk crisis I can simply delete it and buy myself some critical time to debug and fix the problem<p>Uhh...
> <i>On Linux servers it can be incredibly difficult for any process to succeed if the disk is full. Copy commands and even deletions can fail or take forever as memory tries to swap to a full disk</i><p>I don't understand this. Swap is either a swap partition, or a specific swap file, all of which allocated in advance, so the fullness of the storage should have no bearing.
I keep my databases on a separate filesystem from root, var, or anything system critical for this reason. Even with the 8GB space waster, if you aren't on top of your disk usage you'd have down time when you fill up the filesystem containing the DB. I might be missing something here, but this does not seem like a good solution to this problem.
I have an empty leader on my hard drive so that I can recover if I accidentally nuke the front of it with dd while making a live usb. So it's not a bad idea, and it's super effective so far it hasn't been tested, and hopefully I never will need to.
>The disk filled up, and that's one thing you don't want on a Linux server—or a Mac for that matter. When the disk is full nothing good happens.<p>I found a bug with time machine where it wouldn't delete local copies properly and filled my hard drive until I couldn't do anything. The OS slowly stopped working. At first I couldn't copy or save anything, then deleting files made more files. It was so bad that the `rm` command eventually wouldn't work from recovery or the local OS. I could do nothing. I had to format.<p>It happened again and I learned to manually delete the time machine local snapshots, but it was crazy how hard it was to recover once it took all my storage. That bug is fixed now.
With virtual servers this should not be necessary, as it is easy enough to add some disk space. After all, this should not be a common issue in production environments, but more like a once in a decade problem.<p>With physical servers it might be a different story and might be a good idea. I tend to size filesystems to the requirements I have and enlarge them when required (it gives you a periodic reminder to think about what waste you have accumulated). That way, I can still add space even if the filesystem has been filled up. However, if you do it just to have some space when you need it, it probably is overkill and to have an empty buffer file is a lot easier to handle.
So far, my first stop to temporarily get more disk space was to reduce the size of the swapfile which on a lot of servers seems to be allotted >1x the requirement.<p>Will be switching to this hack! Perfect illustration of the KISS principle (Keep it simple, stupid).
This reminds me of a similar story in a classic Gamasutra article[1] (the section is "The Programming Antihero", and I'd recommend the other pages of the article for a few good chuckles). Apocryphal or not, it makes for a good story.<p>> I can see how sometimes, when you're up against the wall, having a bit of memory tucked away for a rainy day can really make a difference. Funny how time and experience changes everything.<p>[1] <a href="https://www.gamasutra.com/view/feature/132500/dirty_coding_tricks.php?page=4" rel="nofollow">https://www.gamasutra.com/view/feature/132500/dirty_coding_t...</a>
As hacks go, it's a good one. I also like it because you don't have to be root to implement it and you don't have to reconfigure your file system params in ways that might or might not be great for other reasons.
This reminded me of embedded Java project that I worked 20 years ago. The VM had only 10MB of RAM and properly dealing with out-of-memory exceptions was a must. The most effective strategy was to preallocate like 200K array. Then on any memory exception the code released that array and set a global flag. The flag was queried through out the code to aggressively minimize memory usage until it drops to tolerable limit.<p>The preallocated buffer was essential. Without it typical result was recursive out-of-memory that eventually deadlocked/crashed the VM with no recovery.
I have a dual-boot laptop with windows and linux, and use the ntfs partition to share data between them<p>Recently, I extracted a large archive with Linux on the ntfs, and the partition was full<p>Then Windows did not start anymore<p>Linux would only mount the partition as read-only, because it was marked dirty after the failed start. Finally I found a tool to reset the mark, and delete the files.<p>Now Windows starts again, but my user account is broken. It always says "Your Start menu isn't working. We'll try to fix it the next time you sign in.", then I sign out, and it is still broken<p>I had to make a new user account
This sounds like you should, instead, use the "Filespace Reserved for Root" functionality of your filesystem, which exists specifically for this contingency. The default for ext3 is 5%.
Just because HN likes to bashes Windows. I tell you that Windows runs pretty much normal if the disk is full. Had that happen many times and intentionally did this for tests as well.<p>Even disconnecting the disk technically doesn't break the OS. Because of the "Windows To Go" feature, the OS can detects this and pauses.<p>(Note: Windows To Go is officially removed from current versions but the code that freezes is still there. However, whether that works with your hardware is basically a gambling... so yeah dont try at home/work.)
Dumb idea. Read the man page for tunefs. The file system has some thing called min free which does the same thing. However this does not interfer with wear leveling. Dummy data does.
Isn’t using LVM and holding some space back a better solution for this?<p>Also I keep databases on their own partition so that nothing else can accidentally fill up the space and lead to data loss.
Would be better to leave 8GB unpartitioned and then expand the partition. An 8GB file on an SSD is removing 8GB worth of blocks from being able to participate in wear leveling.
Reminds me of the cron task I set up once, long time ago, on a bare metal server. It would kill and relaunch a web service every 4 hours.<p>The service in question didn't require high availability (it was a mailing list processing/interface thing, if I remember correctly) but it had some memory leak which would eventually devour all the memory in the server, in about 2 days.<p>This hack served its purpose well, until the service was eventually replaced by something else.
What I don’t understand about this approach is why you think it actually does anything for you ? What you do instead of this is to setup an alert to monitor disk space at the right threshold for you, and then have a contingency plan for how to add more space to your environment.<p>It seems like you have sort of done that, but in this case you are actually allowing your system to get into a bad state before you react.<p>Perhaps it’s better to be proactive instead of reactive.
Tell all your SecDevOps friends how this file can also pull double duty as a ransomware canary.<p><a href="https://blog.urbackup.org/371/ransomware-canary" rel="nofollow">https://blog.urbackup.org/371/ransomware-canary</a><p><a href="https://support.huntress.io/article/136-ransomware-canaries-technical-details" rel="nofollow">https://support.huntress.io/article/136-ransomware-canaries-...</a>
This reminded me of that joke about two guys who meet in the middle of the Savanna. One is carrying a phone booth and the other one an anvil. So, the one with the anvil asks:<p>- Why do you carry a phone booth around?
- Oh, you see, it's for the lions. If I see a lion, I drop the booth, step inside and I'm safe. What's with the anvil?
- It's for the lions too. If I see a lion, I drop the anvil and I can run way faster!<p>Good trick anyway!
I've been doing this for years too. I also learned instead of one big file, halving several files is also useful so you can release space in "chunks". If you need everything at once you can wildcard the delete.<p>Such flexibility has been invaluable over the years. Thankfully with block storage and modern operating systems/file systems growing volumes can be significantly easier for most servers.
I maintain a small fleet of CI machines (mostly macs) and run into this issue as well from time to time. The free space idea is nice but I ram into the problem that under very critical disk space I can’t even shh or when delete a file because there is simply not enough space to execute the simple command. A reboot to get rid of some temp files helps me in these situations to get some control back.
On the subject of "inverted" thinking like this, I recently added a test to a test suite that is intended to fail some day. The test will eventually fail when a bug (for which we developed a workaround and added the aforementioned test to confirm the fix) is fixed in one of our open source dependencies. When the test fails, we'll know to remove the workaround (and the test)!
Please use LVM (Logical Volume Manager) if you really are afraid of filling up disks.<p>If the disk would ever fill up:<p><pre><code> 1. Buy an additional virtual disk
2. Add the disk to the LVM volume group
3. Expand the Logical volume
</code></pre>
A really good primer on LVM:<p><a href="https://wiki.archlinux.org/index.php/LVM" rel="nofollow">https://wiki.archlinux.org/index.php/LVM</a>
In the days of minicomputers, Data General's first 16-bit operating system, RDOS, required that the main file be "contiguous". Not only that, there was some model of disk they sold where the OS file had to be close to the edge for speed in loading. Prudent sysadms would create empty contiguous files in the favored space against the next upgrade.
This is an old technique. For example, some game developers back in the early days used to put dummy files in the game data space, and code the entire game with less space so that if later more space was needed, it was just a matter of deleting the dummy files. In that context, it kinda forces you to be smarter about your game assets and code.
Proof that the future is here, but just unevenly distributed -- we have technology for dynamic disk expansion, but implementation & integration just isn't present/slick enough to make it available to even tech-inclined hosting consumers just yet.<p>Guess this is another place of differentiation that some of these platforms could offer.
Full disc problem in linux macine has been a problem in partialy solved in past many decades. We have had seperated partition /home, /tmp, /var, /usr in each its own partition. This is reduce problem if not completly removing. This is small desadvantage: there is reducion in fungability for a disc space.
Another trick you can use is to adjust the size of the FS tables, NFS4 can do this very quickly and free up space.<p>However on a sketchy drive this is obviously not a wise move.<p>Actually wait, NFS2/3/4 has reserved block counts you can free.<p><pre><code> # tune2fs -m 3 /dev/md2
# Setting reserved blocks percentage to 2%</code></pre>
I keep my databases on a separate file system from root, var, or anything system critical for this reason. Wouldn't you still have down time when you fill up the filesystem with the 8GB space waster in place? I might be missing something here, but this doesn't seem like a good solution.
This is not the right solution. It's like setting your clock 5 minutes ahead, to trick yourself into thinking it's 9:00 am, when it's really 8:55 am. It doesn't work.<p>The better solution is simple monitoring. Alert when limit is passed. Increase limit to 16gb disk space remaining if paranoid.
We write a data intensive desktop app, and when you are close to disk full, we reduce functionality so you can’t make the problem worse, or lose work because of the disk full situation. The thing is that we know that more than half of that user’s data is ours, so our data is often the cause.
I've seen in Cockroach db documents last month : <a href="https://www.cockroachlabs.com/docs/v20.2/cockroach-debug-ballast" rel="nofollow">https://www.cockroachlabs.com/docs/v20.2/cockroach-debug-bal...</a><p>This technique even has a name : ballast file
Showing off that i’m not a sysadmin, but wouldn’t a monitoring daemon work? Once disk usage grows past a certain uncomfortable threshold you get an email/notification to see what’s up. I mean you obviously are monitoring other server vitals anyway right?
> Copy commands and even deletions can fail or take forever as memory tries to swap to a full disk<p>That's only a problem if your memory is full as well, and even then, I've never encountered a server that uses a swapfile instead of a swap partition.
I am reminded of a tweet that suggested adding a sleep() call to your application that makes some part of it needlessly slow, so that you can give users a reason to upgrade when there's a security fix (it's 1 second faster now)!
Doesn't ext[2,3,4] reserve 5% of the space on the disc for this very reason?<p>This can be adjusted with tune2fs -m <percentage> /dev/sda1<p>You can check the reserved blocks with sudo tune2fs -l /dev/sda1 | grep 'Reserved block count'
I remember a discussion here about a dude who did this with memory in game development. People didn't like the idea very much.<p>To me it has a taste of domain squatting or GPU scalping, but you don't do it with strangers, but your team.
In most VMware clusters that use resource pools extensively I've always maintained a small emergency CPU reservation on a pool that would never use it, just in case I had to free up some compute without warning.
That's a dumb idea?<p>Iirc some filesystems allow you to reserve a percentage of blocks for this particular use case (recovery by root).<p>Ext2/3 for sure, ext4 probably too.<p>Not sure you can do that on linode on the rootfs, since the filesystem is mounted, tho.
Really good idea. After looking at the linked article about dd, I guess this wouldn't work as well if one was using a file system with compression. In that case maybe /dev/urandom would be better?
That’s what tune2fs is for <a href="https://www.unixtutorial.org/commands/tune2fs" rel="nofollow">https://www.unixtutorial.org/commands/tune2fs</a>
I had used this technique in Dev and ist servers precisely 11 years back. Get storage would be a days task which would stall current activity. This helped. 1.5gb of 5 files.
I thought this was gonna be about the obscenely large sparse file /var/log/last.<p>I really wish they would move it from a sparse memmap() file to a btree or something.
mkfs has an option to reserve a %age or # of blocks/inodes for root of a file system. It's the file system equivalent of empty files.<p>Usually when free space is exhausted, it's for non-root users. You get that same "time to fix stuff by deleting the file" by using tunefs to change that root reserved space to zero.<p>Plus have /var/log on a separate file system and make sure that your log rotations are based on size as well as time.
I am surprised WIN did not make the short list for OS-lost when disk space is almost gone.<p>I have been there a couple of times and it is a land of crazy unpredictable behavior.
On a Mac, I'm often puzzled why the OS says both "you're low on disk space" *and* "you have 23GB available" on thr same disk.
This is like carrying around a pound of beef because you refuse to look up the address of a McDonald's 7 minutes away.<p>Setup quotas or implement some damn monitoring -- if you're not monitoring something as simple and critical as disk usage, what else are you not monitoring?
On the chat team at Twitch in the early days after the Twitch Plays Pokemon crisis [1], we started artificially doubling all chat traffic through our systems, then dropping the doubles just before they would be sent to users. [2]<p>Not only did it give us a "big red button" to press during emergencies like OP, but it revealed important logical scaling issues before they became real problems.<p>[1]: tldr; 1 million people playing a single game of Pokemon Red by using chat to send button presses<p>[2]: <a href="https://www.twitch.tv/videos/93572955" rel="nofollow">https://www.twitch.tv/videos/93572955</a>
having an 8gb file you know you can delete isn't really all that helpful if everything has already gone disk-full-fracked. you should really have an alarm on free space, especially if you're an indie.
If you don’t have monitoring to tell you when the disk is more than X% full, then you’re at risk for more failure scenarios than just a full disk (usually trivial to buy time by deleting old logs).