Funny seeing this here now, as I _just_ finished archiving an old MyBB PHP forum. Though I used `wget` and it took 2 weeks and 260GB of uncompressed disk space (12GB compressed with zstd), and the process was not interruptible and I had to start over each time my hard drive got full. Maybe I should have given HTTrack a shot to see how it compares.<p>If anyone wanna know the specifics on how I used wget, I wrote it down here: <a href="https://github.com/SpeedcubeDE/speedcube.de-forum-archive">https://github.com/SpeedcubeDE/speedcube.de-forum-archive</a><p>Also, if anyone has experience archiving similar websites with HTTrack and maybe know how it compares to wget for my use case, I'd love to hear about it!
One time I was trying to create an offline backup of a botanical medicine site for my studies. Somehow I turned off depth of link checking and made it follow offsite links. I forgot about it. A few days later the machine crashed due to a full disk from trying to cram as much of the WWW as it could on there.
This saved me a ton when back in college in rural India without Internet in 2015. I would download whole websites from a nearby library and read at home.<p>I've read py4e, ostep, Pgs essays using this.<p>I am who I am because of httrack. Thank you
I recommend to try also <a href="https://crawler.siteone.io/" rel="nofollow">https://crawler.siteone.io/</a> for web copying/cloning.<p>Real copy of the netlify.com website for demonstration: <a href="https://crawler.siteone.io/examples-exports/netlify.com/" rel="nofollow">https://crawler.siteone.io/examples-exports/netlify.com/</a><p>Sample analysis of the netlify.com website, which this tool can also provide: <a href="https://crawler.siteone.io/html/2024-08-23/forever/x2-vuvb0oi6qxkr-ku79.html" rel="nofollow">https://crawler.siteone.io/html/2024-08-23/forever/x2-vuvb0o...</a>
oh wow that brings back memories. I have used httrack in the late 90s and early 2000's to mirror interesting websites from the early internet, over a modem connection (and early DSL)<p>Good to know they're still around, however, now that the web is much more dynamic I guess it's not as useful anymore as it was back then
I don't get it: last release 2017 while in github I see more releases...<p>so, did developer of the github repo took over and updating/upgrading? very good!
I have tried the windows version 2 years ago. The site I copied was our on-prem issue tracker (fogbugz) that we replaced.
HTTrack did not work because of too much javascript rendering, and I could not figure out how to make it login.
What I ended up doing was embedding a browser (WebView2) in a C# Desktop app. You can intercept all the images/css, and after the Javascript rendering was complete, write out the DOM content to a html file.
Also nice is that you can login by hand if needed, and you can generate all urls from code.
I use it to download sites with layouts that I like and want to use for landing pages and static pages for random projects. I strip all the copy and stuff and leave the skeleton to put my own content. Most recently link.com, column.com and increase.com. I don't have the time nor the youth to start with all the JavaScript & React stuff.
The archive saved in HTTrack Website Copier can be opened in <a href="https://replayweb.page" rel="nofollow">https://replayweb.page</a> locally or they have different save formats?