TE
科技回声
首页24小时热榜最新最佳问答展示工作
GitHubTwitter
首页

科技回声

基于 Next.js 构建的科技新闻平台,提供全球科技新闻和讨论内容。

GitHubTwitter

首页

首页最新最佳问答展示工作

资源链接

HackerNews API原版 HackerNewsNext.js

© 2025 科技回声. 版权所有。

Ask HN: How to manage a large static HTML site?

20 点作者 scwoodal超过 3 年前
I was given an 11GB zip file that contains the static HTML output from a CMS system. Edits to content were made in the CMS and then it would output static files to be copied to a web server.<p>Due to cost, that CMS will no longer be used. A replacement CMS has been identified but until that system can be procured and brought online there needs to be a temporary (6-12 months) solution that manages the existing content.<p>What tools&#x2F;solutions would you recommend to manage this?

20 条评论

anamax超过 3 年前
[1] What&#x27;s their plan for moving data from the old CMS (which will be unavailable) to the replacement CMS?<p>If they&#x27;re not moving data, then throw out the zip file and put up a &quot;we don&#x27;t have any content&quot; page (because that&#x27;s what they&#x27;ll have when the new system comes up).<p>If they are moving data, where is that data coming from when the new CMS arrives? (The answer better not be said zip file because that&#x27;s an ETL problem from hell. It might be easy to get to 80-90% but the last 5% will never happen and they won&#x27;t understand why. Start looking for a new job, because you&#x27;ll be fired.)<p>[2] Are they happy with NO changes to the site while they get and bring up the new CMS? If the answer is no, run away. If the answer is yes, they&#x27;re either lying or wrong. Regardless, run away.<p>&quot;run away&quot; is because you&#x27;re being asked to do an ad hoc CMS because they&#x27;re cheap bastards who can&#x27;t plan ahead.<p>If you don&#x27;t want to run away, migrate to an open-source CMS which will be the real CMS until they get around to buying&#x2F;installing said replacement CMS.<p>Hint - they won&#x27;t.
评论 #29501899 未加载
codingdave超过 3 年前
My first answer would be that this is not a tech question, this is a business problem - let leadership know that the consequence of getting rid of the original CMS is vastly increased time and energy to maintain the content. They will need to be flexible on turnaround time for changes.<p>After that, without implementing a CMS, your best bet is just to extract that all to a folder and hand-edit the HTML using VS code. Hopefully all the design work on each page will stay static, so you don&#x27;t need to worry about templating, and can just edit content. Then push changes to S3, and all should work.<p>Basically, you are going back in time to 1997, before CMSes were common, and webmasters hand crafted all changes. It will suck, but hopefully will enlighten the leadership of why CMSes exist in the first place so maybe they can speed up getting the new one.
mattl超过 3 年前
Are the pages templated? See if you can identify common areas of templating and replace them such that you can get the site into some kind of static site generator, or even using server side includes to make things easier going forward.
smackeyacky超过 3 年前
Put it in github.<p>Add a github action that pushes the file updates to production.<p>If you are using something like AWS S3 this is very simple but very neat.<p>I have a tiny S3&#x2F;github based CMS for my static content and the workflow is simple:<p>Edit page(s) and update images, css files, javascript stuff.<p>Test locally. I have a python script that acts like a web server so i can verify the changes. I have a few other scripts that update menus and other housekeeping items as well. Python is great for that kind of stuff.<p>Once verified, push to the production branch.<p>Pushing there fires off the github action that syncs the files with S3 in my AWS account.<p>Now you have a complete history of all edits to the content system.
iKnowKungFoo超过 3 年前
&quot;there needs to be a temporary (6-12 months) solution that manages the existing content.&quot;<p>Famous last words. I once found an egregious security violation at a previous employer. Was told it was a &quot;temporary fix&quot; that&#x27;s been going on 7 years.<p>You might look into putting all of the static files into source control, but ignore the folders full of uploaded documents (the videos, etc.). Then you can at least track changes and deploy updated HTML via a managed process. You should back up the large files and figure out how you&#x27;re going to serve those.
IceDane超过 3 年前
How much can this CMS system have cost you? 11 GBs of storage doesn&#x27;t cost jack, so that can&#x27;t be it.<p>The only real solution to this is to set up the CMS system again, then set up the replacement one and then do a proper export while the old one is running until you&#x27;re up and running with the new one.<p>It also doesn&#x27;t take 6-12 months to set up a CMS system. Hire some bored college student to set it up on digital ocean for $20&#x2F;h or something.<p>Then finally you need to fire the moron who decided this was a good idea.
评论 #29490053 未加载
Aspos超过 3 年前
If the static HTML works as-is and requires no server-side work, then just deploy it to AWS S3. Rock-solid, cheap, fast, automatically scalable and there is literally nothing to manage other than pointing your domain to S3 once.<p><a href="https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;userguide&#x2F;WebsiteHosting.html" rel="nofollow">https:&#x2F;&#x2F;docs.aws.amazon.com&#x2F;AmazonS3&#x2F;latest&#x2F;userguide&#x2F;Websit...</a>
评论 #29487723 未加载
评论 #29487468 未加载
_the_inflator超过 3 年前
I migrated several huge CMSes in Financial Services and know you pain. Here is what I did and I hope you can use it.<p>2 things: - knowledge about the structure - math, math, math<p>You will need both exercises later on for the other CMS as well.<p>Do the math: how many changes over time? Which department&#x2F;person needs to do changes and why? Which files and content is affected? Prioritise: if there is no live actualization needed, postpone certain changes. Is there critical content? Do you need a workflow for certain changes (legal approval etc.)?<p>You might need to mix different approaches: - using some scripts - using a deployment pipeline - using a lightweight CMS for parts of the content - using a dev team who does the changes manually (speed and costs do usually not warrant hacking a CMS for 6 month, so manual changes can be sufficient)<p>For some frequently changed pages it can make sense, to have some sort of mini CMS or even simply have a small team of business and dev people who do the changes manually.<p>Do not overengineer at this point, since this can lead to a fragile solution which backfires: &quot;Why do we need the new CMS? It works currently.&quot;<p>Have fun! :)
nitwit005超过 3 年前
This feels a bit paradoxical. The solution to managing content would be some sort of content management system, which is what you got rid of.
cweagans超过 3 年前
It seems like a huge problem for leadership to have decided that you&#x27;re just going to go 6-12 months without a CMS. If your website is critical to the business, then it seems (perhaps) worthwhile to just pay for another year on your current CMS while you procure a new one.<p>If you really can&#x27;t change that, then forget about the HTML output. Dump the data from the current CMS system in some machine readable format (there&#x27;s presumably a database, right?), export to Markdown, and bring it into something like Hugo (<a href="https:&#x2F;&#x2F;gohugo.io&#x2F;" rel="nofollow">https:&#x2F;&#x2F;gohugo.io&#x2F;</a>). You&#x27;ll have to do some work to re-assemble the site, but if you don&#x27;t have a system like that, you&#x27;re gonna have a bad time if you have to adjust the design of the site or something. A giant pile of HTML is really not useful to anyone except as output from a build IMO.
rchaud超过 3 年前
Do the static pages work if uploaded to an FTP server? As in, do links in between pages connect to the right page? How about menus?<p>If they do work, you can create a shortlist of the most important pages that require dynamic content, and edit those to include the dynamic scripts or CMS functions (such as a nav menu), so they can be managed by non-technical staff.
评论 #29488048 未加载
tyingq超过 3 年前
You might consider doing to work to at least break out common sections that historically changed often, like navigation menus, footer text, etc.<p>Even if you just inject them with old school server side includes, you might save a lot of heartache.<p>Devil is in the details though. What tools to use would have a lot to do with how the pages are structured and how they work.
literallyaduck超过 3 年前
Replace:<p>Deploy as is in read only mode, and strangle the pages with the new system. Keep the urls the same so you don&#x27;t brake links.<p>Reduce:<p>Collect logs of the traffic and suggest culling pages unused for 1 year, 6 months etc.<p>Reuse:<p>Move the media items first and keep the links the same.<p>Keep a separation between content and application code.<p>Blogspam:<p>Open source your product and list yourself as the expert.<p>Be your own boss:<p>Now that you are the expert sell consulting services.
space-chess-com超过 3 年前
Any changes made to the HTML data will need to reflect in the new CMS system.<p>To keep track of those changes will he the hardest part.<p>I would suggest exporting the CMS data now and importing to a temporary CMS solution in the mean time.<p>Then updating HTML fles and CMS when necessary.
sigg3超过 3 年前
Congratulations!<p>You are become teh webmaster.
cpach超过 3 年前
11 GB for only HTML…? How many documents are in there…?
评论 #29487706 未加载
sandreas超过 3 年前
How would the migration to the new CMS be done? How many changes are made per day?
ksec超过 3 年前
Who will be using the CMS &#x2F; system? Non- Tech users?<p>Roughly how many page of HTML?
technobabbler超过 3 年前
I don&#x27;t think this is as intractable a problem as some of the doomsayers here make it out to be. It just takes work and tedium. Working for small businesses, I&#x27;ve had to do this several times and it wasn&#x27;t a huge deal, just a few weeks of work. If you&#x27;re getting paid to do it, why not do it, and then brag about it on your resume later?<p>Here&#x27;s how I would approach the problem:<p>First, evaluate the source content (your big zip file). Did the old CMS output relatively clean HTML? Are certain sections (header, footer, main menu) etc. relatively consistent and thus easily templatized? Is the body text all in one div that can be parsed out? If you&#x27;re lucky, the CMS was structured to begin with, meaning the title is always one field that produces one consistent HTML output, the hero image is another field, etc. If you&#x27;re NOT lucky, the old CMS might&#x27;ve allowed rich text fields which become custom HTML. That may be trickier, but still not unsolvable.<p>Once you understand the source content, investigate your use cases. Who needs to be able to edit this stuff, and how often? What&#x27;s their level of technical expertise (e.g. can they work with raw HTML, or do they need WYSIWYGs)? Do they need a proper publication workflow (drafting, user roles, previous versions, etc.)? You know, of all the stuff a good CMS does, what is mission-critical to retain during this &quot;temporary&quot; period -- but keep in mind that chances are likely that new CMS will keep getting pushed back, because if it&#x27;s not a priority now, it never will be. So basically, you&#x27;re being asked, despite what they say, to design a system that will likely BE the new CMS.<p>Here&#x27;s the fun part. Now you get to compare and choose a new architecture for all this content. Your options, in no particular order:<p>* A bunch of flat HTML files that only devs can edit, by hand, with no modularization or templatization. Bad idea if you want to maintain any sort of consistency between pages. And you have to really love regex if you ever need to do bulk updates (e.g. updating a broken link, or changing a sponsor logo, across all the files).<p>* Move it into ANOTHER, cheaper cloud CMS. It would be dirt-cheap to host a Wordpress instance, for example, especially if you were coming from an overpriced &quot;enterprise&quot; CMS. If your bosses can&#x27;t afford $1000&#x2F;mo, can they afford $80&#x2F;mo for a good Wordpress host (Pantheon, WPEngine etc.)? There are also headless options like DatoCMS, GraphCMS, Sanity, Prismic.io, etc. Headless means you get to design the frontend to your spec, and importantly for your use case, it means you detach # of visitors from the CMS costs (because you&#x27;re caching the frontend with a CDN, and the backend CMS never knows or cares how many hits you get). Prismic, Sanity, and Graph offer free plans up to thousands of records; if you do a static frontend against that and cache it, you may be able to keep using that headless CMS for free or very cheap. Even if you have to upgrade, the starter plans can be affordable. There is a LOT of variance in CDN pricing, from ridiculous ripoffs targeting enterprise managers to very reasonable options targeting small-time devs who know such things shouldn&#x27;t cost so ridiculously much. Negotiate with the wallet people to find a good balance between dollars and hours.<p>* If they really won&#x27;t spring for that, how about a self-hosted CMS like Strapi, Ghost, Grav, Pico? Some of these use databases, others use flat-file Markdown folders, but in general they cost zero dollars (just a lot of time to configure and deploy)?<p>* If you really can&#x27;t use a CMS in the meantime, you can make your own &quot;janky CMS&quot; by templatizing the reusable parts of the HTML code (headers, footers, etc.) into plain ol&#x27; PHP serverside includes, like the 90s. Then process all the other pages to replace those parts with the new templates, such that each individual page only has its unique content and not all the site structure boilerplate around it (now handled by your includes).<p>Next is the hard part. Once you decide on an architecture, you have to do the semi-automated extract-transform-load work, bulk processing your flat HTML files and either stripping them down to unique content and&#x2F;or importing that content into your architecture, but it a cloud CMS (repeated API calls) or a self-hosted flat-file system (lots of new files, possibly HTML to Markdown conversions). But either way, you will end up in a much better place than with a billion flat HTML files. In other words, even if you don&#x27;t CALL it a CMS, you should make yourself a CMS of some sort.<p>Assuming you get all the data in someplace, then you have to build a new frontend on top of it, reusing those HTML headers, footers, etc. that you previously stripped out. If it&#x27;s Wordpress, you&#x27;re building a Wordpress theme. If it&#x27;s a headless CMS, I&#x27;d recommend using Next.js along with Vercel or Gatsby or Netlify. If it&#x27;s a flat-file CMS, it&#x27;s entirely up to you... whatever language or framework you&#x27;re comfortable with. Symfony and Laravel are still good choices, if you like PHP. Other languages have different frameworks.<p>Eh... that&#x27;s a simplified overview of the workflow. Feel free to ask for more details. This really isn&#x27;t an impossible problem, just an annoying one. I&#x27;d also do it for&#x2F;with you if you pay me...
prirun超过 3 年前
I just had to do a similar thing with the HashBackup web site. It was hosted on Google Sites for about 10 years, but Google came out with Sites v2 last year and everyone had to convert by Sep 2021. Of course I ignored it too long, Sep came along, and Google froze the HashBackup site, forcing me to convert to v2.<p>Okay, so I did a preview migration. The new site looked terrible. They lost all kinds of formatting, the menus were confusing - I couldn&#x27;t figure them out myself! I sent several notes to Google&#x27;s &quot;Report a problem with the conversion&quot; link, but never heard anything from them. Publishing the v2 site was a non-starter.<p>The HashBackup site is nowhere near this scale. I got a dump from Google Sites in a zip file that was about 20MB and included all of Google&#x27;s JavaScript. I could have hosted it somewhere like that, but the HTML was sort of a mess. Sites lets you edit content by highlighting it and clicking Bold or whatever. But the problem is, if you change the font of some text then go back later and extend it, it&#x27;s easy for the text fragments to become disjoint, so the resulting HTML doesn&#x27;t look anything like what a person would write. So while it would have worked for a while to edit existing pages, there&#x27;s no way I could have added or deleted a page.<p>I ended up using Antora, on recommendations from a recent HN topic, mainly because I liked the professional look of it. Static generators like Hugo tend to have lots of themes, but many of them look amateurish to me. Antora uses Asciidoc for content, which was my goal when Sites v2 was announced: get the site in some kind of markdown.<p>To do the conversion to Asciidoc, I started out doing some hand editing to get familiar with the kinds of HTML Google Sites generated. There are <i>a lot</i> of variations. Once I started seeing some patterns, I wrote a Python script to handle some of the conversions. So the loop was:<p>1. Run Python script on all html files to convert to adoc<p>2. Pick an adoc file to review. I did this from smallest file to largest, so that by the time I got to the larger files, more of the conversion would be automated.<p>3. Review the adoc file, see what trash was still there that might be easily fixed in the Python script; make changes to the script, re-run it. Loop in this step until the easy automated stuff is done.<p>4. Finish editing the adoc file by hand, run antora to generate the site, review the new page using the old page as a reference if necessary.<p>5. Once the new adoc file looks right, delete the .html file. This step is very important to prevent the automatic converter from overwriting your new adoc file. I did this more than once of course, but it was still in an Emacs buffer so I was okay.<p>I never got the converter to the point where it would work 100%, but it did maybe 80% of the tedious editing and allowed me to focus on what was left. And it was a lot easier to see some erroneous HTML in an adoc text file than it was to try to edit a verbose HTML file that Sites created.<p>Here&#x27;s the script I used. It&#x27;s a big ugly hack, but maybe it will help somebody.<p><a href="http:&#x2F;&#x2F;www.hashbackup.com&#x2F;gstoad.py" rel="nofollow">http:&#x2F;&#x2F;www.hashbackup.com&#x2F;gstoad.py</a><p>I&#x27;m really happy with the new HashBackup site, and the best part is, it&#x27;s now just 1MB of Asciidoc text that will be easy to maintain. It took about 2 weeks to do the conversion, adjust the Antora UI layout, and adjust the links.