There are numerous classification and cataloguing systems. Dewey Decimal is only one of a large set. I'd strongly suggest looking at the US Library of Congress's <i>two</i> sets of classifiers: the Library of Congress Classification, a set of 20 alphabetically-denoted categories (A-Z, excluding I, O, W, X, and Y), originally based on a classification devised by Thomas Jefferson, whose donated collection seeded the Library of Congress's holdings.<p>The LoC <i>also</i> has a set of <i>subject headings</i>, which is a <i>controlled vocabulary</i> used to <i>describe</i> works.<p>The chief difference between the two is that any given work is assigned <i>one</i> Classification, which is used for shelving and retrieval, but can have <i>multiple</i> subject headings, which are used for general cataloguing.<p>Paul Otlet, an early 20th-century archivist, created the <i>universal decimal classification</i>, based on Dewey, for a project similar to what you seem to be after: a collection of <i>information</i> rather than <i>works</i>, in a project called the Mundaneum, in many ways a precursor to Google, though based on index cards. Much of the original was destroyed by Nazi Germany during WWII.<p>The UDC shares with the Dewey and Library of Congress classifications the benefit of being in widespread extant use, which is to say that these classifications reflect current informational needs and have been revised from historical standards which may no longer be especially suitable, <i>and</i> have established bodies and procedures for further updates and revisions.<p>The Library of Congress classfication & subject headings, as works of the US government, are also in the public domain, though useful & usable electronic formats are not readily available to the best of my knowledge. I believe the Library of Congress sells various products, however.<p>There are other classifications as well, several of which I've heard of though I've not used them: the colon classification, Bliss Bibliographic Classification, and several national / language-specific classifications (e.g., German, Nippon, Chinese, Korean, Russian).<p>Among the more interesting classifications is SuDocs, the Superintendent of Documents Classification, developed and maintained by the U.S. Government Publishing Office. This is not a universal or subject-based system, but is principally organised <i>by Federal department or agency</i>, additional classes for subordinate offices, category classes (e.g., annual reports, bulletins, law, ...), book numbers. Whilst likely not directly useful to you, it's an example of a classification designed for the specifics of a particular organisational context.<p>There are <i>bibliographic standards</i> such as the Dublin Core, which defines 15 metadata elements (though without clearly defining their specifications, meanings, or encodings): Contributor, Coverage, Creator, Date, Description, Format, Identifier, Language, Publisher, Relation, Rights, Source, Subject, Title, and Type. You'll find many content management systems incorporate Dublin Core to some extent.<p>Another biblographic standard is MARC: Machine-readable cataloging, a standard which emerged from the Library of Congress beginning in the 1960s. It is <i>arcane</i> and has a heavy influence from mainframe-computer (and punch-card data storage) practices of the era.<p>It's quite useful to think of how any system you specify will be used, by whom, and how it will be applied and maintained:<p>- Who will be using the classification?<p>- What purposes will it serve? Especially any processes related to rights management, documentation, provenance, and/or regulatory frameworks.<p>- What <i>processes</i> will it serve?<p>- Who will maintain the classification itself?<p>- Who will maintain the catalogue, e.g., adding new records, updating and correcting existing records, de-aquisitioning records.<p>- What if any extant standards are there in my specific field? (I don't know, for example, what if any machine-learning standards exist.)<p>Keep in mind that <i>classifications</i> tend to be married to the notion of <i>physical records stored in a specific location</i>, which is generally not the case for electronic data. For the latter, <i>useful descriptions of the work or information</i>, <i>information on its provenance</i>, <i>unique item identifier(s)</i>, and <i>cross references between related items</i> (e.g., source or derived data) might be more relevant information to capture.<p>There are various schools of thought on cataloguing and classification generally. Over the past few decades, "self-describing" works, often based on full-text search and some relevance measure, has become popular --- essentially Google and other online General Web Search indices. Hashes (e.g., SHA-256 checksums) are another self-descriptive tool, which are useful for identifying <i>a specific file</i> but tell you nothing about related works (e.g., the plain text, PDF, and ePub versions of a document, or JPEG, PNG, and SVG versions of a graphic). The advantage to the self-descriptive approach is that human inputs are relatively minimal, and documents self-describe through their contents and relations to other materials and factors. The disadvantage is that this approach lacks any central coordination, uniform classification, quality controls, or validation of self-described contents. The traditional practice (owing much to Melville Dewey himself) of having an <i>independent</i> cataloger role affords greater control and consistency, but tends to be helplessly out of date with new incoming information --- something of an age-old problem in the library field.<p>In practice hybrid systems are probably most feasible, with assigned bibliographic characteristics being added to works as time permits and need arises. My own thoughts are that the notion of a cataloguing <i>workflow</i> be explicitly notated in the bibliographic metadata, and that levels of automated and manual review and assignment be coded to works as well.<p>There are several schools of library & information science, and you might want to poke around their course offerings, syllabi, etc., for information. The School of Information at UC Berkeley (previously SIMS), and the Information programme at Pittsburg are two of which I'm specifically aware. Wikipedia of course has a more extensive listing: <<a href="https://en.wikipedia.org/wiki/List_of_information_schools" rel="nofollow noreferrer">https://en.wikipedia.org/wiki/List_of_information_schools</a>><p>There are also organisations working with electronic data collections at scale, including the Internet Archive and the Wikimedia Foundation, most particularly Wikidata, which might be of interest or relevance to you.<p>Wikipedia also has articles on most of the classifications and topics I've mentioned here: Library of Congress Classification & Subject Headings, Universal Decimal Classification, Colon Classification, Paul Otlet, and more.