"Test your backups" is so easy to say, but quite difficult for many to do. There are a lot of shops that probably don't know how to recreate a machine from scratch. How many systems are developed as balls of clay. Little bits added and smeared in over time until the ball just gets bigger, but each piece lost in the process. How many folks can go through their local config files and explain all of entries, how many can even tell which ones they have changed, or why? Especially when they were changed by Frank, but he left 2 years ago.<p>You'd like to think you can just restore the system from backup and it'll just light back up. But how do you test this without cratering your existing system? Like a boat in a basement, many system are built in-situ and can be very rigid.<p>Modern environments like cloud computing and creation scripts can mitigate this a bit organically, but how many of these systems are just a tower running Windows w/SQL Server and who knows what else? Plus whatever client software is on the client machines.<p>How do you test that in isolation?<p>At least read the media to see if it can be read (who doesn't love watching a backup tape fail halfway through the restore).<p>Simply, it takes a lot of engineering to make a system that can be reliably restored, much less on a frequent basis. And this is all engineering that doesn't impact the actual project -- getting features to users and empowering the business. Which can make the task even more difficult.
One thing I've always wondered: How do you prevent ransomware from ruining your backups, too?<p>Lots of ransomware tends to be "delayed" – it can't encrypt everything instantly, so it encrypts a little bit each minute. During those minutes, isn't it conceivable that the now-encrypted files replace your old backups?<p>I suppose this isn't really a "backup" but rather a "mirroring strategy." But for certain kinds of data -- photos, video, large media -- can you really afford to do any other kind of strategy?<p>The other question I have is related to that first point: since ransomware can't encrypt everything all at once, how is it possible for your system to continue to function? As you can tell, I'm a ransomware noob, but it's quite interesting to me from an engineering standpoint. Does the system get into a "half encrypted" state where, if you rebooted it, it would fail to boot at all? Or does ransomware targeted at businesses tend to be more of a surgical strike, where it targets and wipes out specific datastores before anyone notices?<p>(It's the "before anyone notices" part that I'm especially curious about. Isn't there some kind of alarm that could be raised more or less instantly, because something detects unexpected binary blobs being created on your disk?)
This really hits home, decades ago - I was working at this place that did daily tape backups. I remember thinking, this is unreal - there's literally a room filled with tapes.<p>One day, I asked if they ever had performed a recovery off of the tapes, as I questioned if the tapes were even being written to. (NOTE: Backups was not my job at all. )<p>Why had I brought this up? I would be in the server room and never saw the blinky lights on the tape...well.. blink. Everyone literally laughed at me, thought was a grade A moron.<p>A year later, servers died... Pop'ed in the tape... Blank. No worries, they had thousands more of these tapes. Sadly, they were all MT. They had to ship hard drives to a recovery shop, and it was rather expensive.<p>I left shortly after this.
To be fair, it seems like a lot of backup systems were (properly) designed to recover data for when a single computer or drive or database fails or gets overwritten or specifically attacked -- but not for an wide-ranging attack where <i>every networked computer gets wiped</i>.<p>All the stuff in this article is great scenarios to think about (recovery time, key location, required tools), but it's still all at the backup <i>design</i> phase. The headline of "test your backups" seems misleading -- you need to design all these things in <i>before</i> you even try to test them.<p>It seems like a real problem here is simply that backup strategies were often designed before Bitcoin ransomware became prevalent, and execs have been told "we have backups" without probing deeper into whether they're the right kind of backup.<p>In other words, there's no such single thing as "having backups", but rather <i>different types</i> of backup+recovery <i>scenarios</i> that either are or aren't covered. (And then yes, test them.)
Yes, test your backups regularly.<p>When I worked for a large insurance firm, we would run drills every 6 months to perform off-site disaster recover and operational recovery tests to validate our recovery processes. Everything was tested from WAN links, domain controllers, file backups, mainframe recovery and so much more. We were more or less ready for a nuke to drop.<p>Obviously this costs money, but if you're an insurance firm, not being able to recover would cost way more than running DR and OR recovery drills every 6-12 months.
Not really just a backup and restore. You need to be able to rebuild from zero. I think of it more as a disaster recovery exercise, and for those… you are only as good as your last _real_ rehearsal. That may mean a suitcase of tapes, a sheet of paper, and a rack of blank servers.
Then you have the problem of release of confidential information. For this reason, the sweetest target for ransomware is the company who can neither recover their data, nor can they afford to have it publicly posted or monetised by the gang.
Oh and you do store those backups offline dont you? Ransomware gangs have been known to loiter and observe their target for weeks to learn how to sabotage backups when the time comes.
One thing that has irked me about everyone's flippant comments about moving to the cloud is that the "devops as a recovery mechanism" generally only works for single-app startups or small shops with only a few dozen simple VMs at most.<p>Some of my customers have <i>thousands</i> of VMs in their cloud, and they aren't cloned cattle! They're pets. Thousands upon thousands of named pets. Each with their individual, special recovery requirements. This then has a nice thick crust of PaaS and SaaS layered on top in a tangle of interdependencies that no human can unravel.<p>Some resources were built using ARM templates. Some with PowerShell scripts. Some with Terraform. A handful with Bicep. Most with click-ops. These are kept in any one of a <i>dozen</i> source control systems, and deployed mostly manually by some consultant that has quit his <i>consulting company</i> and can't be reached.<p>Most cloud vendors "solve" this by providing snapshots of virtual machines as a backup product.<p>Congratulations big vendors! We can now recover exactly one type of resource out out of <i>hundreds</i> of IaaS, PaaS, and SaaS offerings. Well done.<p>For everything else:<p><pre><code> WARNING: THIS ACTION IS IRREVERSIBLE!
</code></pre>
Fantastic. No worries though, I can... err... export the definition, right? Wrong. That doesn't work for something like 50% of all resource types. Even if it "works", good luck restoring inter-dependent resources in the right order.<p>Okay, you got things restored! Good job! Except now your DNS Zones have picked a different random pool of name servers and are inaccessible for days. Your PaaS systems are now on different random IP addresses and noone can access them because legacy firewall systems don't like the cloud. All your managed identities have reset their GUIDs and lost their role assignments. The dynamically assigned NIC IP addresses have been scrambled. Your certificates have evaporated.<p>"But, but, the cloud is redundant! And replicated!" you're just itching to say.<p>Repeat after me:<p><pre><code> A synchronous replica is not a backup.
A synchronous replica is not a backup.
A synchronous replica is not a backup.
</code></pre>
Repeat it.<p>Do you know what it takes to <i>obliterate</i> a cloud-only business, permanently and irreparably?<p>Two commands.<p>I won't repeat them here, because like Voldemort's name it simply <i>invites trouble</i> to speak them out loud.
I’m a novice and am dealing with data that isn’t too complicated, large, or important. My approach is to build restore directly into the normal workflow. I test my backups by using them each week.<p>A stack is spawned from a database backup and once it passes tests, replaces the previous one.<p>Not sure how smart this all is but my goal is to learn through application.
3-2-1 Backup Rule:<p>Three copies of your data. Two "local" but on different mediums (disk/tape, disk/object storage), and at least one copy offsite.<p>Then yes, absolutely perform a recovery and see how long it takes. RTOs need to be really low. Recovering from object storage is going to take at least a magnitude more time than on-prem.<p>Also, storage snapshots/replications are not backups, stop using them as such. Replicating is good for instant failover, but if your environment is hacked they are probably going to be destroyed as well.
Don't just test your backups. Make sure your automation can't clobber or tamper with your backups. This includes both local and disaster recovery sites. Give your pen-test team super-user privs on your automation and give them Amazon gift cards if they can tamper with your backups. If they can't mess with the backups, give the gift cards to whoever designed and hardened your infrastructure.
There is another approach. Scrub old data you don't need.<p>2-3 year email retention on corp email.<p>Paper files for sensitive client info (or don't keep it).<p>We can reinstall office / windows / active director etc.<p>Mandatory 2FA on google suite?<p>Git codebases on github etc for LOB apps (we can rebuild and redeploy).<p>We use the lock features in S3 for copies of data that must be kept. Not sure I can even unlock to delete as account owner without waiting for timeouts.
They also threaten to leak your data if you don't pay. They know a lot of orgs can restore the data and won't need it decrypted. There's no real defense against this (other than good security practices).
There's no glamour in back ups.<p>Even less glamour in great back up.<p>Even less in testing back-ups.<p>And there's a lot of glory in "slashing the IT budget with no disruptions in operations, cutting the fat is good for business".
Isn't this more of a "We don't want our client / customer information released to The World At Large" question? I would think most business entities have backups of some kind (Scripps being the only exception I can think of), and will pay the ransom to keep any sensitive information off the market.<p>Edit: Should have added that I find it hard to believe that companies have PB of data backed up. I could believe GB, and maybe even TB, but PB is pretty hard to swallow. The past three companies I've worked for (25 year span) had, at most, a couple of gigs of sensitive information that couldn't be easily replicated.
Reminiscing - I once worked on a "backup server" that ran NetWare 3.11 and QIC80 drive.<p>The customer was quite convinced their backup was fine. They could hear the QIC going with the standard <i>weeeeee, tick, tick, tick, weeeeeee, tick, tick, tick</i> sound every night as they were leaving.<p>When I ejected the drive, the tape was nearly clear. All the material have been wiped clean off of it, and was sitting at the bottom of the server in a neat gray, brown pile (at least that is how I remember it, & I am sticking to it).<p>Since they never had to restore, they never checked.
How does one learn how to do proper backups? Using my throwaway as I suspect my company doesn't do them (and even if they do, I don't know where they are or what to do with them as the main engineer left on my piece of software).
That's not how modern ransomware works.<p>It's now common to see data extracted and for the ransom to cover not disclosing your corporate data.<p>But yes, agree sky backups in terms of restoring operations.
"Test your backups" is very good advice, but it will do almost noting to protect you against ransom attacks.<p>A ransom attack works because one of the first things attackers do when they gain entry to the system is locate and encrypt the backups.<p>Having tested backups is great, but it will not protect you from ransom attacks.
I heard from companies using ZFS, that suddenly see a massive increase in disk space usage... A results from copy-on-write for all the new encrypted files. Then they restored to a snapshot before the encryption started and were able to resume business (after a purge and decoupling of all suspicious machines).<p>Could it be that simple? Sure snapshots are not backups, but this is not a hardware failure.<p>In my home situation I do a periodic rsync over ssh to a Pi3 at my parents place. It's super simple but I can just browse the sshfs to check if stuff is there and working. And when I need it, I drive there and get the disk itself. Sure, this is not relevant for a large business, but for us self-hosters it is a nice solution. The backup is manual, by choice, so I never overwrites accidentally.
Lots of good info here - it's also worth pointing out that if you're compromised, you may not have all the backups you think you do.<p>A lot of the attackers out there are adding the step of disabling and deleting local snapshot-style backups as part of their attack, because they don't want all their hard work to get thrown out the window with a simple OS-level rollback (side note - if your endpoint security vendor tries to sell you rollback as a ransomware protection feature, run).<p>For this reason, data backed up to tape or some other physical media that gets removed is much more likely to survive a breach than volume shadow copies and snapshots. Test the hard stuff!
Depressingly few organisations I've worked in or with have a clearly defined set of RPOs and RTOs for their essential systems, and similarly few have regular processes to test their backups and archives to either confirm the process works, or to determine how long a recovery would take.<p>This stuff is all <i>conceptually</i> very easy to do - but politically extremely difficult to agree on the definitions, and then obtain the resources on the ops side, presumably because it's yet another of those things that fits in the category of being hidden and non-urgent, and therefore a low priority, right up until the moment it isn't.
If your disaster recovery process isn't tested, you actually don't have any disaster recovery. It's not only about 'how long it takes' it's also about whether or not it works at all. Can you rebuild from scratch? What happens if your entire infrastructure goes down at the same time? What happens if a datacenter you rely on just disappears? What happens if you lose access to your systems? Can you lose access to your systems? IMHO one of the only silver lining of these attacks is that organizations are starting to ask these questions more often.
There's been a lot of good advice here about backups and disaster recovery.<p>But there's also a lot of other stuff to consider:<p>Compartmentalization. Finance and Engineering and Sales only need to interact in limited ways. How about some firewalls between them, limiting types of access?<p>Location isolation. Why does something that happens in Peoria affect Tuscaloosa? Once a ransomware gang breaches a perimeter, why is it allowed countrywide (or worldwide) access to a company?<p>Monitoring. Aren't there tools that can alert on various anomalous patterns? All of a sudden, gigabytes of data start being exfiltrated? All of a sudden, processes fire up on a multitude of servers? Monitoring these things is hard to do at scale, but surely possible?<p>Microsoft. In 2002, Bill Gates "Finally Discovers Security". How much longer will Microsoft be given a free pass? How many more "critical" vulnerabilities will their software have? <a href="https://www.wired.com/2002/01/gates-finally-discovers-security/" rel="nofollow">https://www.wired.com/2002/01/gates-finally-discovers-securi...</a><p>I could go on and on. But why should I? Why can't MBA-type CEOs take IT seriously? Why can't they hire competent people and fund them and listen to them?
"A backup not restored is a backup not made." Should be on the wall in every IT department. Together with "Snapshots are NOT backups".
My very first mentor when I started my IT career 30 years ago told me “your job is to make sure the backups are working. Everything else is icing on the cake.” Still true.<p>But one caveat: your back ups can get screwed up if you back up data that has already been encrypted by ransomware. One easy way to defend against this is to have a tier 2 back up with a time delay, eg it backs up the backup files from a week ago.<p>Tear 1 backup is just normal back up. You test it frequently of course to make sure that you can restore it but you don’t have to do any extraordinary work to detect ransomware in tier 1.<p>Tier 2 backs up the back up, but only after a certain number of days have passed. That number of days is the window that you have to detect that ransomware has infected you. If you ever find a ransomware infection, you isolate and turn off tier 2 immediately to preserve a known clean state, and once you have everything rebuilt clean and patched, you restore from tier 2.<p>You use tier 1 for restores in non ransomware situations because it’s necessarily more up-to-date.
A huge step forward was using zfs with snapshots for my system and backup devices... zfs local fs might be still a bit dangerous, because theoretically ransomware could delete snapshots, but if you use a server / share that underlies zfs, you are pretty save.<p>There are some decent guides for Arch:<p>Official: <a href="https://wiki.archlinux.org/title/Install_Arch_Linux_on_ZFS" rel="nofollow">https://wiki.archlinux.org/title/Install_Arch_Linux_on_ZFS</a><p>Encrypted: <a href="https://forum.level1techs.com/t/manjaro-root-on-zfs-with-encryption/170428" rel="nofollow">https://forum.level1techs.com/t/manjaro-root-on-zfs-with-enc...</a>
Plainly the best defense against ransomware. Had two attacks that probably could not have been avoided. Some employee will at some point open that funny looking mail. Scammers know our names and mail signatures (the colorful text, not the crypt sig for mails). Maybe they get some detail wrong, like department or a contact address, but otherwise the mails looked genuine.<p>A sensible backup solution made us come back within 30 minutes of severe attacks. The police tried to negotiate with the attackers but sadly couldn't get them. But at least they didn't get any money.<p>We have a bought backup solution that is quite expensive that does snapshots every 15 minutes I believe. Worth it.
I always thought paying the Ransom was about the customer, PII, financial, and HR records/systems that are breached, less about getting the business back online. What a sorry state of affairs that it's both.
What if the ransomware has been hiding for a few months before it activated? You'd either have overwritten the last clean backup by then or you'd restore the ransomware too. Or am I missing something?
Don't test your backups. Test your recovery.<p>This may seem like the same thing, but one of the reasons why those ransomware gangs are so successful is because paying the ransom promises to get you back into business <i>now</i>. You probably still want to do a full remediation afterwards, but paying means that you can do that while your business is running.<p>To make it unattractive to pay the ransom, you don't have to only be able to recover, you have to be able to do so quickly.
I see the point of testing backups and it's good advice. But the real problem is that preparing for an emergency that never comes is very expensive in both productivity and costs. You can be 99.99% ready for a ransomware attack that would cost a lot but it would hit your organization's productivity hard. Yet there's a large possibility that the preparedness will go to waste because it will never be used.<p>We need to find solutions that are very inexpensive but effective. I can only think of it being a cloud based solution where it will be trivial to reset and start over. I suspect that disaster recovery as a service(RaaS) should be part of any cloud based service. I get that some companies are so large and complicated that it would be impossible to provide that service for them but there are plenty of small to mid-size companies for which the service would be possible. So it's possible to offer it as part of any service package.<p>RaaS has the great advantage that the costs can be shared among many companies so no one company needs to deal with the large costs that may never be used. It will also solve the problem associated with the constant up keep. It's hard to prepare for a disaster but it's even harder to keep it going for as long as it's needed. In addition, the increase in complexity for bad actors would decrease the incentives for ransomware in general.<p>This won't happen now but given time it would largely fix the current situation for all.
If you have a cloud, it's not "test your backup".<p>It's "have an automated restore" ready. Maybe on a different cloud. With clouds, you can test standing up your entire infrastructure/system stack across even a couple hundred machines or more in automated fashion, and then tear the whole thing down.
I think we need to treat such threats as a rear disease which could happen whenever you ready or not. We need to check for it but be prepared to have it anyway.<p>Industry should start thinking more about insurance, negotiation and investigation services.
Is this the death of ops?<p>Common ops mantra used to be “A backup does not exist until you’ve restored it”. Having a blob of data means nothing- being able to continually restore it and integrity check it is everything.
By the way, that includes the "ransom" paid to the good guys who provide hard drive recovery services.<p>It runs into to the hundreds of dollars.
Unfortunately this doesn’t cover the case where the ransomware group is threatening to leak your data unless they pay :/<p>It’s also good to consider—if a ransomware allows attackers to access your network—whether there’s anything stopping them from accessing (and encrypting/overwriting/deleting) your backups.