My SaaS deals primarily with legal documents that for years had been maintained with Word. The pain of emailing documents is real, but the comfort level with how Word works is also real. Over the years, most organizations have developed internal workflows to share and send documents around that bypass the pains, and while they may not be perfect, they work.<p>The funny thing is that the document authors like these ways of working. It is the tech people who don't. I've seen "Git for Word" proposed many times a year for a while now. And all of the ideas are interesting, but none of them appeal to my audience because they don't care about git's feature set. Nobody wants to branch and merge. Nobody wants a straight version history. ("Nobody" meaning nobody in my market, not nobody in the world.)<p>They want a storytelling experience. They want to know the why, not the what. And the workflow tends to be unidirectional, not with collaborative changes coming back together, but with expanding changes as each person adds their ideas and makes change for a specific instance of using a document. The experience we build for them bring in pieces of version history, pieces of comments, pieces of telling the story of why something was done, so people down the line can have more context to decide whether to accept or reject the changes.<p>It isn't that "Git for Word" is a bad idea - on the contrary, it would be great if someone pulls it off. My point is that building something that improves on Word isn't actually about the software, it is about the document workflows. If you find groups who work like software devs do, where documents receive small updates from a team, and bring all changes together for a final product, there is probably a market. But when evaluating such ideas, there has to be a reality check of whether the actual use of the documents truly matches the use case for git.
I've mentioned this in a similar thread a few months back, but it looks like it could be relevant here, too:<p><a href="https://github.com/TomasHubelbauer/modern-office-git-diff" rel="nofollow">https://github.com/TomasHubelbauer/modern-office-git-diff</a><p>I've made this script which automatically extracts the Office file format (which is a ZIP archive of XML documents) and versions the XML documents and their extracted text contents alongside the binary Office file. This is done using a Git hook and it seems to work pretty well. If you're in need of versioning Office documents, this might be a good enough solution for you.<p>Edit: I should also address why not use the built-in Office versioning feature? The reason I don't use it is because I like to be able to view the diffs in Git. I don't want to have to use Office just to see the changes. My solution offers that. By doubling-up the way the original is versioned in the way of tracking the extracted XML and text contents as well, each commit's diff will have the binary change as well as the textual diff which in my experience is good enough to tell the gist of changes. And you're using standard Git / text manipulation tools you would use with any other diff.
On Windows you can just use TortoiseGIT, it can do diff and even merge by calling Word's internal compare tools. I can attest that diff works fine (differences show up as if you had used track changes within word), but I haven't had occasion to try merging Word documents with TortoiseGIT yet. The same functionality was already available in TortoiseSVN.
I don’t understand why this 6 year old article has been posted when current Microsoft 365 versions of Word et al have built in version control and real time collaboration.
I really don't understand how Word remains so popular. It was created at a time when few people had internet access, and was designed to produce printed documents. It was the perfect tool to write newsletters, flyers, articles, academic papers and manuscripts. The world has moved on though, and I fail to see Word's relevance today, other than the sheer number of people that are familiar with it.<p>Word is expensive, proprietary and the XML it generates is unfathomable. There are so many better FOSS tools and systems that we could be using. If you're collaborating on a document then markdown or LaTeX has you covered. You get version control though git and multiple people can contribute. If you're writing a book or article, then the graphic designers and typesetters are going to make the design decisions, not the author, so why bother messing around with fonts and colours and the infuriating placement of images and tables.<p>I authored a kid's book on coding, and the process was a nightmare. I authored in markdown, used pandoc to convert and then further edited in libreoffice, to be able to send stuff through in docx format. Then revisions were sent back in docx and I had to reverse the whole process, so I could maintain my plain-text version of the book. Then the proofs were sent through as PDFs, which I then had to markup for corrections. Many of the mistakes were due to the crappy way Word places images. In the end I just bought a copy of Word, and submitted to the way my publisher wanted me to work, which disrupted the authorial process.<p>It's time we ditched Word, in the same way we ditched VHS and DVD. It's an outdated technology that remains dominant just because everyone uses it at school, and then refuses to move on. If schools insisted that all homework was submitted in something like markdown, we'd see a dramatic change in a very short period of time. (BTW when I was teaching CS, my kids authored in markdown and submitted on GitHub)<p>Right, rant over - but I've been talking about this for years -<a href="http://coding2learn.org/blog/2014/04/14/please-stop-sending-me-your-shitty-word-documents/" rel="nofollow">http://coding2learn.org/blog/2014/04/14/please-stop-sending-...</a>
I gave up using Word to write manuscripts when I switched to Markdown documents in git.<p>In the last few months, though, I gave up on Markdown to switch to a more robust format - LaTeX. Before I switched, I didn't know LaTeX at all, but I knew from my reading that it had the features I needed.
Not long ago I read some article here on HN that the world is still waiting for a git equivalent for documents. This seems like a good start.<p>Now we need a native diff viewer for structured files, where the changes are presented with attribution either side by side, or alongside (like gitk, or like gitlab diff viewer).<p>Then we need an editor that supports doing the gitty stuff natively, so that the non-technical writer doesn't have to worry about creating repos and committing the changes from the command line.
You do know that a Word document is really a ZIP file? The text content is inside an XML document that, in principle, Github would work on. All you have to do is unzip the document, store the directory in GitHub and repack it for Word to use.
> since earlier this month Pandoc can read Word documents in docx format.<p>Given this line, I think it's fair to add (2014) to the title.<p>This is pretty old news by now :)
Before Word integrated its own improved version tracking in more modern versions, during my undergrad I participated in a research project to add version tracking to Word documents by abusing its zip file format[1]. My research partner created a plugin to manage the versions, and my main contribution was a Java tool that attempted version merging[2].<p>It wasn't fleshed out or usable, but it was an interesting project. I was impressed at how open the Word/Office format was, this was before Microsoft's reemergence into openness and open source.<p>[1]: <a href="https://dl.acm.org/doi/10.1145/2723147.2723152" rel="nofollow">https://dl.acm.org/doi/10.1145/2723147.2723152</a><p>[2]: <a href="https://github.com/jacobmischka/Vvord" rel="nofollow">https://github.com/jacobmischka/Vvord</a>
Ages ago I wrote a little Word VBA that exported a plaintext copy to go along with the .doc every time I hit save. Worked quite well for eyeballing the changes in a diff. Obviously you don't get merge support for .doc but since that was still running on SVN where workflows tend to be less merge-heavy (or was it still CVS? I feel old..) and I was working solo anyways the human-readable diff worked well enough.
I'm using Fossil for my book. My book is about business systems simplicity so it's a great fit. If I hadn't started using sqlite for a project, I would have never even heard of Fossil. What a great, beautifully simple combination. Don't add complexity unless the complexity is worth the dysfunction it addresses.
It's currently badly broken—see the issues, someone points out what needs fixing—but I have a tool that uses Word's built in track changes functionality as git diff backend: <a href="https://github.com/Gaelan/WordGit" rel="nofollow">https://github.com/Gaelan/WordGit</a>
Wow nice. I'm a big fan of git's `--word-diff` option for text edits. The output is almost as good as `latexdiff` and so much faster.<p>Another useful trick is to pipe the ANSI-colored terminal output through `aha` (<a href="https://github.com/theZiz/aha" rel="nofollow">https://github.com/theZiz/aha</a> or `brew install aha`) which produces HTML output, e.g.<p><pre><code> git wdiff | aha > ~/Desktop/mydiff.html
</code></pre>
You can then send the file mydiff.html to collaborators by email or add to CI build script.
Back in the bad old days of version control (thinking of VSS here), I was overall pretty satisfied with how the check-in/check-out mechanics worked for Word docs and the like. In this case you have the benefit of the sequential workflow, in fact enforced or hinted by the tool itself, while also getting rid of the recurrent weakness of email-based document storage. There were plenty of other things to dislike about VSS (like, pretty much the rest of them) but it wasn't so bad for maintaining documents.
I work with telco standards and the organizations that I follow use Word documents. The way we keep a paper trail of all the changes to a new standard’s draft is by separating the change proposals into their own documents (using change marks against the latest agreed draft) and only allowing a named editor to actually implement the agreed change proposals back to the master document. The change proposal documents, together with the meeting minutes create a perfect history of who proposed what changes and when.
I implemented something just like this but in CVS for legal documents back in 2007.<p>I am finally replacing it with a sharepoint solution. Its a headache to have to maintain versions for non-technical people.
Has anybody used SimulDocs[0], which sells itself as a "version control for Microsoft Word documents"? I've been really curious if it's a decent solution in this space, but I tend to keep myself away from Word docs in my life recently.<p>[0]: <a href="https://www.simuldocs.com/features/version-control-for-microsoft-word" rel="nofollow">https://www.simuldocs.com/features/version-control-for-micro...</a>
My solution is inspired by this blog post but creates a global attributes file. I documented it here:<p><a href="https://www.vivekkalyan.com/using-git-for-word" rel="nofollow">https://www.vivekkalyan.com/using-git-for-word</a><p>I tend to prefer markdown for most things, but find it hard to beat Word in terms of simplicity of elegant designs for, say, resumes.
.docx is just an archive format. If I remember correctly the contents inside the .docx archive are plain text. Can’t we just use version control inside of there? We would have to of course figure out a way to have git unpack and pack the archive each time.
At first, I'm like... but it's just a zipped archive of XML and other content files which can be used with git successfully, but yeah there's a mess in there. It's not really meant to be human readable.
yes! i have a collection of tweets where fiction/non-fiction writers joke about naming their versions different things. i'm like, use git?<p>i wrote a novella using a folder system + text editor + git. i'm trying to put that into a web app. don't know how useful it would be for other people though. and don't know if it will ever be finished because i need to write.