I am currently researching / designing a kind of annotation system for digital
materials (PDFs, EPUBs, scanned books) meant primarily for experts to create
annotations over the source material of their domain. The system will allow quick
navigation, linking, referencing, creating hierarchies of tags, scripting and
heavy domain-specific customization.<p>Think professor of literature annotating thousands of pages of Dumas
over the span of months for easier lookup later / notes / categorization / research,
creating a graph of knowledge you can interact with in the process, where
facts are referencing the source material or building on other facts.<p>This system might be used for a while by few people and after
few months / 5 years / a decade it might be retired.<p>What general advice do you have to make sure that when it happens, someone
(people who found the files on the internet, students, archivists, former users)
can recover the data without much burden?
Simple, documented(!) formats.<p>Much to be said for text-based formats (+ some standard format for images), e.g. annotations could be HTML with some extra data and UI added in (but e.g. readable with a normal browser), or something JSON-based, or ...