I'm looking for information extraction software that I can feed in historical legal agreements and it will report:<p>1. Changes in the text between the documents<p>2. Changes in other attributes of the documents (e.g., word count)<p>3. % change over time in the text and attributes (e.g., text in the 1986 version of the doc is 56% different than the text in the 1985 version of the doc)<p>Can anyone please point me to software that might fit this particular need?<p>Thanks in advance!
Michael
I don't know about software designed specifically for doing this with legal documents but most of this you could probably do quite easily with some simple Unix tools and a little scripting. I'm guessing the documents aren't plain text so the first step would be to extract the text. For example if they're pdf's you could use pdftotext.<p>Then for:<p>1. diff/diff viewers like xxdiff(the name may have changed somewhat recently)/git<p>2. wc<p>3. diff with some scripts to automatically process the documents, count the number of words in the documents, and write to a csv file
You can evaluate the project <a href="http://eucases.eu/" rel="nofollow">http://eucases.eu/</a>,
you can evalute the AKOMANTOSO standard
Here you can find an editor
<a href="https://legixinfo.wordpress.com/2015/07/02/coming-soon-a-new-web-based-editor-for-akoma-ntoso/" rel="nofollow">https://legixinfo.wordpress.com/2015/07/02/coming-soon-a-new...</a>
Apache UIMA ( <a href="https://uima.apache.org/" rel="nofollow">https://uima.apache.org/</a> ) and GATE ( <a href="https://gate.ac.uk/ie/" rel="nofollow">https://gate.ac.uk/ie/</a> ) come to mind.<p>Those are not ready-made software products, though but rather frameworks that allow you to implement IE algorithms. While not exactly trivial, implementing something like what you're suggesting is definitely possible with GATE.