TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Using Microsoft Word with Git

216 pointsby the_dripperover 4 years ago

26 comments

codingdaveover 4 years ago
My SaaS deals primarily with legal documents that for years had been maintained with Word. The pain of emailing documents is real, but the comfort level with how Word works is also real. Over the years, most organizations have developed internal workflows to share and send documents around that bypass the pains, and while they may not be perfect, they work.<p>The funny thing is that the document authors like these ways of working. It is the tech people who don&#x27;t. I&#x27;ve seen &quot;Git for Word&quot; proposed many times a year for a while now. And all of the ideas are interesting, but none of them appeal to my audience because they don&#x27;t care about git&#x27;s feature set. Nobody wants to branch and merge. Nobody wants a straight version history. (&quot;Nobody&quot; meaning nobody in my market, not nobody in the world.)<p>They want a storytelling experience. They want to know the why, not the what. And the workflow tends to be unidirectional, not with collaborative changes coming back together, but with expanding changes as each person adds their ideas and makes change for a specific instance of using a document. The experience we build for them bring in pieces of version history, pieces of comments, pieces of telling the story of why something was done, so people down the line can have more context to decide whether to accept or reject the changes.<p>It isn&#x27;t that &quot;Git for Word&quot; is a bad idea - on the contrary, it would be great if someone pulls it off. My point is that building something that improves on Word isn&#x27;t actually about the software, it is about the document workflows. If you find groups who work like software devs do, where documents receive small updates from a team, and bring all changes together for a final product, there is probably a market. But when evaluating such ideas, there has to be a reality check of whether the actual use of the documents truly matches the use case for git.
评论 #24303976 未加载
评论 #24303524 未加载
评论 #24304399 未加载
评论 #24304049 未加载
评论 #24303934 未加载
评论 #24303998 未加载
评论 #24304012 未加载
评论 #24308899 未加载
评论 #24306364 未加载
评论 #24305024 未加载
评论 #24309031 未加载
评论 #24303821 未加载
评论 #24308309 未加载
tomashubelbauerover 4 years ago
I&#x27;ve mentioned this in a similar thread a few months back, but it looks like it could be relevant here, too:<p><a href="https:&#x2F;&#x2F;github.com&#x2F;TomasHubelbauer&#x2F;modern-office-git-diff" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;TomasHubelbauer&#x2F;modern-office-git-diff</a><p>I&#x27;ve made this script which automatically extracts the Office file format (which is a ZIP archive of XML documents) and versions the XML documents and their extracted text contents alongside the binary Office file. This is done using a Git hook and it seems to work pretty well. If you&#x27;re in need of versioning Office documents, this might be a good enough solution for you.<p>Edit: I should also address why not use the built-in Office versioning feature? The reason I don&#x27;t use it is because I like to be able to view the diffs in Git. I don&#x27;t want to have to use Office just to see the changes. My solution offers that. By doubling-up the way the original is versioned in the way of tracking the extracted XML and text contents as well, each commit&#x27;s diff will have the binary change as well as the textual diff which in my experience is good enough to tell the gist of changes. And you&#x27;re using standard Git &#x2F; text manipulation tools you would use with any other diff.
评论 #24307027 未加载
评论 #24306848 未加载
评论 #24305092 未加载
unnahover 4 years ago
On Windows you can just use TortoiseGIT, it can do diff and even merge by calling Word&#x27;s internal compare tools. I can attest that diff works fine (differences show up as if you had used track changes within word), but I haven&#x27;t had occasion to try merging Word documents with TortoiseGIT yet. The same functionality was already available in TortoiseSVN.
评论 #24308716 未加载
bugmen0tover 4 years ago
FWIW, if you&#x27;re using libreoffice write you can save your file as a flat odt file (.fodt), which gives you a version-controllable format
评论 #24303577 未加载
评论 #24303542 未加载
binbagover 4 years ago
I don’t understand why this 6 year old article has been posted when current Microsoft 365 versions of Word et al have built in version control and real time collaboration.
评论 #24304033 未加载
评论 #24305811 未加载
评论 #24306238 未加载
评论 #24308479 未加载
MarcScottover 4 years ago
I really don&#x27;t understand how Word remains so popular. It was created at a time when few people had internet access, and was designed to produce printed documents. It was the perfect tool to write newsletters, flyers, articles, academic papers and manuscripts. The world has moved on though, and I fail to see Word&#x27;s relevance today, other than the sheer number of people that are familiar with it.<p>Word is expensive, proprietary and the XML it generates is unfathomable. There are so many better FOSS tools and systems that we could be using. If you&#x27;re collaborating on a document then markdown or LaTeX has you covered. You get version control though git and multiple people can contribute. If you&#x27;re writing a book or article, then the graphic designers and typesetters are going to make the design decisions, not the author, so why bother messing around with fonts and colours and the infuriating placement of images and tables.<p>I authored a kid&#x27;s book on coding, and the process was a nightmare. I authored in markdown, used pandoc to convert and then further edited in libreoffice, to be able to send stuff through in docx format. Then revisions were sent back in docx and I had to reverse the whole process, so I could maintain my plain-text version of the book. Then the proofs were sent through as PDFs, which I then had to markup for corrections. Many of the mistakes were due to the crappy way Word places images. In the end I just bought a copy of Word, and submitted to the way my publisher wanted me to work, which disrupted the authorial process.<p>It&#x27;s time we ditched Word, in the same way we ditched VHS and DVD. It&#x27;s an outdated technology that remains dominant just because everyone uses it at school, and then refuses to move on. If schools insisted that all homework was submitted in something like markdown, we&#x27;d see a dramatic change in a very short period of time. (BTW when I was teaching CS, my kids authored in markdown and submitted on GitHub)<p>Right, rant over - but I&#x27;ve been talking about this for years -<a href="http:&#x2F;&#x2F;coding2learn.org&#x2F;blog&#x2F;2014&#x2F;04&#x2F;14&#x2F;please-stop-sending-me-your-shitty-word-documents&#x2F;" rel="nofollow">http:&#x2F;&#x2F;coding2learn.org&#x2F;blog&#x2F;2014&#x2F;04&#x2F;14&#x2F;please-stop-sending-...</a>
评论 #24303773 未加载
评论 #24306616 未加载
评论 #24304820 未加载
评论 #24303815 未加载
评论 #24305960 未加载
评论 #24308908 未加载
评论 #24308231 未加载
评论 #24309360 未加载
bovermyerover 4 years ago
I gave up using Word to write manuscripts when I switched to Markdown documents in git.<p>In the last few months, though, I gave up on Markdown to switch to a more robust format - LaTeX. Before I switched, I didn&#x27;t know LaTeX at all, but I knew from my reading that it had the features I needed.
评论 #24303505 未加载
评论 #24307115 未加载
评论 #24303638 未加载
rhn_mk1over 4 years ago
Not long ago I read some article here on HN that the world is still waiting for a git equivalent for documents. This seems like a good start.<p>Now we need a native diff viewer for structured files, where the changes are presented with attribution either side by side, or alongside (like gitk, or like gitlab diff viewer).<p>Then we need an editor that supports doing the gitty stuff natively, so that the non-technical writer doesn&#x27;t have to worry about creating repos and committing the changes from the command line.
评论 #24304217 未加载
评论 #24303351 未加载
PaulHouleover 4 years ago
You do know that a Word document is really a ZIP file? The text content is inside an XML document that, in principle, Github would work on. All you have to do is unzip the document, store the directory in GitHub and repack it for Word to use.
评论 #24304860 未加载
josteinkover 4 years ago
&gt; since earlier this month Pandoc can read Word documents in docx format.<p>Given this line, I think it&#x27;s fair to add (2014) to the title.<p>This is pretty old news by now :)
jacobmischkaover 4 years ago
Before Word integrated its own improved version tracking in more modern versions, during my undergrad I participated in a research project to add version tracking to Word documents by abusing its zip file format[1]. My research partner created a plugin to manage the versions, and my main contribution was a Java tool that attempted version merging[2].<p>It wasn&#x27;t fleshed out or usable, but it was an interesting project. I was impressed at how open the Word&#x2F;Office format was, this was before Microsoft&#x27;s reemergence into openness and open source.<p>[1]: <a href="https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;2723147.2723152" rel="nofollow">https:&#x2F;&#x2F;dl.acm.org&#x2F;doi&#x2F;10.1145&#x2F;2723147.2723152</a><p>[2]: <a href="https:&#x2F;&#x2F;github.com&#x2F;jacobmischka&#x2F;Vvord" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;jacobmischka&#x2F;Vvord</a>
usrusrover 4 years ago
Ages ago I wrote a little Word VBA that exported a plaintext copy to go along with the .doc every time I hit save. Worked quite well for eyeballing the changes in a diff. Obviously you don&#x27;t get merge support for .doc but since that was still running on SVN where workflows tend to be less merge-heavy (or was it still CVS? I feel old..) and I was working solo anyways the human-readable diff worked well enough.
jksmithover 4 years ago
I&#x27;m using Fossil for my book. My book is about business systems simplicity so it&#x27;s a great fit. If I hadn&#x27;t started using sqlite for a project, I would have never even heard of Fossil. What a great, beautifully simple combination. Don&#x27;t add complexity unless the complexity is worth the dysfunction it addresses.
Gaelanover 4 years ago
It&#x27;s currently badly broken—see the issues, someone points out what needs fixing—but I have a tool that uses Word&#x27;s built in track changes functionality as git diff backend: <a href="https:&#x2F;&#x2F;github.com&#x2F;Gaelan&#x2F;WordGit" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;Gaelan&#x2F;WordGit</a>
ivan_ahover 4 years ago
Wow nice. I&#x27;m a big fan of git&#x27;s `--word-diff` option for text edits. The output is almost as good as `latexdiff` and so much faster.<p>Another useful trick is to pipe the ANSI-colored terminal output through `aha` (<a href="https:&#x2F;&#x2F;github.com&#x2F;theZiz&#x2F;aha" rel="nofollow">https:&#x2F;&#x2F;github.com&#x2F;theZiz&#x2F;aha</a> or `brew install aha`) which produces HTML output, e.g.<p><pre><code> git wdiff | aha &gt; ~&#x2F;Desktop&#x2F;mydiff.html </code></pre> You can then send the file mydiff.html to collaborators by email or add to CI build script.
bisrigover 4 years ago
Back in the bad old days of version control (thinking of VSS here), I was overall pretty satisfied with how the check-in&#x2F;check-out mechanics worked for Word docs and the like. In this case you have the benefit of the sequential workflow, in fact enforced or hinted by the tool itself, while also getting rid of the recurrent weakness of email-based document storage. There were plenty of other things to dislike about VSS (like, pretty much the rest of them) but it wasn&#x27;t so bad for maintaining documents.
noyesnoover 4 years ago
I work with telco standards and the organizations that I follow use Word documents. The way we keep a paper trail of all the changes to a new standard’s draft is by separating the change proposals into their own documents (using change marks against the latest agreed draft) and only allowing a named editor to actually implement the agreed change proposals back to the master document. The change proposal documents, together with the meeting minutes create a perfect history of who proposed what changes and when.
tmalyover 4 years ago
I implemented something just like this but in CVS for legal documents back in 2007.<p>I am finally replacing it with a sharepoint solution. Its a headache to have to maintain versions for non-technical people.
erichdongublerover 4 years ago
Has anybody used SimulDocs[0], which sells itself as a &quot;version control for Microsoft Word documents&quot;? I&#x27;ve been really curious if it&#x27;s a decent solution in this space, but I tend to keep myself away from Word docs in my life recently.<p>[0]: <a href="https:&#x2F;&#x2F;www.simuldocs.com&#x2F;features&#x2F;version-control-for-microsoft-word" rel="nofollow">https:&#x2F;&#x2F;www.simuldocs.com&#x2F;features&#x2F;version-control-for-micro...</a>
formercoderover 4 years ago
On a related note - anyone have a good pdf comparer?
vivekkalyanover 4 years ago
My solution is inspired by this blog post but creates a global attributes file. I documented it here:<p><a href="https:&#x2F;&#x2F;www.vivekkalyan.com&#x2F;using-git-for-word" rel="nofollow">https:&#x2F;&#x2F;www.vivekkalyan.com&#x2F;using-git-for-word</a><p>I tend to prefer markdown for most things, but find it hard to beat Word in terms of simplicity of elegant designs for, say, resumes.
lovetocodeover 4 years ago
.docx is just an archive format. If I remember correctly the contents inside the .docx archive are plain text. Can’t we just use version control inside of there? We would have to of course figure out a way to have git unpack and pack the archive each time.
评论 #24303331 未加载
评论 #24303632 未加载
评论 #24303319 未加载
Apofisover 4 years ago
At first, I&#x27;m like... but it&#x27;s just a zipped archive of XML and other content files which can be used with git successfully, but yeah there&#x27;s a mess in there. It&#x27;s not really meant to be human readable.
greenie_beansover 4 years ago
yes! i have a collection of tweets where fiction&#x2F;non-fiction writers joke about naming their versions different things. i&#x27;m like, use git?<p>i wrote a novella using a folder system + text editor + git. i&#x27;m trying to put that into a web app. don&#x27;t know how useful it would be for other people though. and don&#x27;t know if it will ever be finished because i need to write.
评论 #24306536 未加载
chromedevover 4 years ago
Just use Markdown (or similar markup language) and a tool like pandoc to convert to word if necessary.
winridover 4 years ago
This is also covered in the book Pro Git, if you have the patience to read 400 pages about Git.