TechEcho

15 comments

Felk8 months ago

Funny seeing this here now, as I _just_ finished archiving an old MyBB PHP forum. Though I used `wget` and it took 2 weeks and 260GB of uncompressed disk space (12GB compressed with zstd), and the process was not interruptible and I had to start over each time my hard drive got full. Maybe I should have given HTTrack a shot to see how it compares.If anyone wanna know the specifics on how I used wget, I wrote it down here: <a href="https://github.com/SpeedcubeDE/speedcube.de-forum-archive">https://github.com/SpeedcubeDE/speedcube.de-forum-archive</a>Also, if anyone has experience archiving similar websites with HTTrack and maybe know how it compares to wget for my use case, I'd love to hear about it!

评论 #41736293 未加载

评论 #41742563 未加载

评论 #41741356 未加载

评论 #41740328 未加载

corinroyal8 months ago

One time I was trying to create an offline backup of a botanical medicine site for my studies. Somehow I turned off depth of link checking and made it follow offsite links. I forgot about it. A few days later the machine crashed due to a full disk from trying to cram as much of the WWW as it could on there.

评论 #41737191 未加载

suriya-ganesh8 months ago

This saved me a ton when back in college in rural India without Internet in 2015. I would download whole websites from a nearby library and read at home.I've read py4e, ostep, Pgs essays using this.I am who I am because of httrack. Thank you

jregmail8 months ago

I recommend to try also <a href="https://crawler.siteone.io/" rel="nofollow">https://crawler.siteone.io/</a> for web copying/cloning.Real copy of the netlify.com website for demonstration: <a href="https://crawler.siteone.io/examples-exports/netlify.com/" rel="nofollow">https://crawler.siteone.io/examples-exports/netlify.com/</a>Sample analysis of the netlify.com website, which this tool can also provide: <a href="https://crawler.siteone.io/html/2024-08-23/forever/x2-vuvb0oi6qxkr-ku79.html" rel="nofollow">https://crawler.siteone.io/html/2024-08-23/forever/x2-vuvb0o...</a>

xnx8 months ago

Great tool. Does it still work for the "modern" web (i.e. now that even simple/content websites have become "apps")?

评论 #41735539 未加载

dark-star8 months ago

oh wow that brings back memories. I have used httrack in the late 90s and early 2000's to mirror interesting websites from the early internet, over a modem connection (and early DSL)Good to know they're still around, however, now that the web is much more dynamic I guess it's not as useful anymore as it was back then

评论 #41736377 未加载

oriettaxx8 months ago

I don't get it: last release 2017 while in github I see more releases...so, did developer of the github repo took over and updating/upgrading? very good!

superjan8 months ago

I have tried the windows version 2 years ago. The site I copied was our on-prem issue tracker (fogbugz) that we replaced. HTTrack did not work because of too much javascript rendering, and I could not figure out how to make it login. What I ended up doing was embedding a browser (WebView2) in a C# Desktop app. You can intercept all the images/css, and after the Javascript rendering was complete, write out the DOM content to a html file. Also nice is that you can login by hand if needed, and you can generate all urls from code.

chirau8 months ago

I use it to download sites with layouts that I like and want to use for landing pages and static pages for random projects. I strip all the copy and stuff and leave the skeleton to put my own content. Most recently link.com, column.com and increase.com. I don't have the time nor the youth to start with all the JavaScript & React stuff.

zazaulola8 months ago

The archive saved in HTTrack Website Copier can be opened in <a href="https://replayweb.page" rel="nofollow">https://replayweb.page</a> locally or they have different save formats?

Alifatisk8 months ago

Good ol' days

subzero068 months ago

i use this to double check which of my web app folder/files are publicly accessible.

j0hnyl8 months ago

Scammers love this tool. I see it used in the wild quite a bit.

alberth8 months ago

I always wonder if this gives false positives for people just using the same WordPress template.

woutervddn8 months ago

Also known as: static site generator for any original website platform...