I am endlessly fascinated with content tagging systems

590 pointsby redbar0nover 2 years ago

72 comments

errantmindover 2 years ago

Instagram's tagging system was actually really effective at categorizing content and discovery because each hashtag was treated as a node in a (giant) graph, where each node has multiple properties, including post count (number of posts using a tag), 'velocity' (number of posts using a particular tag per unit time), etc. I could write up a big post about it as I made a study of it in when I created a web app for finding the most relevant tags a few years ago.All that to say there was a lot to their system and it worked because users became aware that they were rewarded for using the most relevant tags. Using irrelevant tags was punished. This guided users towards using a mix of relevant popular and niche tags to maximize their reach, which, in turn, further improved the tagging system.Instagram's tagging system isn't as important anymore as their algorithm has deemphasized it, in favor of other methods for classification and discovery, but there were a couple of golden years where it worked very well. Most users still look back on those years as the 'good times' even if they don't know exactly why. I'd go so far as to say they ruined the app after they deemphasized tags (and added way too many ads)

评论 #33256573 未加载

评论 #33253160 未加载

评论 #33256105 未加载

评论 #33255724 未加载

评论 #33256322 未加载

评论 #33267940 未加载

dahdumover 2 years ago

I adore tagging systems and have worked on them in several different applications and implementations, but there are always pitfalls and trade offs, and it’s possible to bury yourselfNowadays I nearly always store the assigned tags as an integer array column in Postgres, then use the intarray extension to handle the arbitrary boolean expression searches like “((1|2)&(3)&(！5))”. I still have a tags table that stores all the metadata / hierarchy / rules, but for performance I don’t use a join table. This has solved most of my problems. Supertags just expand to OR statements when I generate the expression. Performance has been excellent even with large tables thanks to pg indexing.

评论 #33253137 未加载

评论 #33256058 未加载

评论 #33257157 未加载

评论 #33253777 未加载

评论 #33253120 未加载

xg15over 2 years ago

I worked with the Wikipedia category system a few years ago, and you could see the problems with hierarchical tagging systems right in action back then. (Though it may have gotten better in the meantime)The system appeared simple: There were just two relations, "Article A is a member of category B" and "Category X is a subcategory of category Y".However, in practice, the community was using this system to represent a whole host of wildly different relationships between items, often with different implications what a category actually applied to.E.g., if A has a subcategory B, this could mean one of several things: B might be an additional constraint on the items in A ("American writers" -> "19th century American writers"), the things in B might be more specific than the things in A: ("Writers" -> "Novelists"), A might apply to the concept B, not the things in B ("Occupations" -> "Writers") or A might refer to the category B ("Categories with more than 100 entries" -> "Writers") and on and on...Of course those different aspects could even be combined. E.g. "Categories with more than 100 entries" might have a child "Categories with more than 100 entries in need of review", which represents a constraint but might itself contain less than 100 entries...The basic question "Is item X in category Y" becomes impossible to answer generally, because there is no clear indication if a category only applies to its direct children or to all of its descendants or only to the subcategories itself.I'm sure there are sophisticated ontological systems which would allow users to specify all those different relationships separately. I'm also pretty sure that users would become sloppy after a short time or would disagree which particular relationship to use in a particular situation...

评论 #33254340 未加载

评论 #33252729 未加载

评论 #33252852 未加载

评论 #33253003 未加载

评论 #33254025 未加载

评论 #33255489 未加载

评论 #33257463 未加载

评论 #33252159 未加载

评论 #33255711 未加载

评论 #33253028 未加载

评论 #33253961 未加载

评论 #33253831 未加载

at_a_removeover 2 years ago

This seems like one of those Eternal Problems that people, whether librarians, programmers, or hobbyists, stumble across, think they'll make headway in, then discover that they've really managed to progress just a few feet across a vast and hostile surface of landmines, pitfalls, and lures. Each "obvious" step (I'll have parent relations to define a context!) is only yet another bargain with the Devil, who laughs at your precautions.

评论 #33252040 未加载

评论 #33251389 未加载

jrochkind1over 2 years ago

> I can't find anything on how to design and implement anymore more than the barebones basics of a system.All of this stuff (horse/horses etc) is extensively discussed, maybe look under "taxonomy" or "ontology".Now, whether you want to use any of those solutions or not or find the discussion useful or not... if you aren't finding anything about it at all, you aren't looking in the right places.(I learned about it in librarian school)

评论 #33252043 未加载

评论 #33251784 未加载

评论 #33252212 未加载

评论 #33252011 未加载

评论 #33250987 未加载

评论 #33250720 未加载

feorenover 2 years ago

I'm surprised I haven't seen more discussion of how tags are an entry point into plain-old data architecture. It should be obvious that by the time you're using tags for queries like "start-date: BEFORE 2022-03-01", you've created an inner-platform where you're building a plain-old relational database on top of your tags. Stop what you're doing and elevate "start date" out of tag-land and into a more structured representation with more application support.Many enterprise databases add a memo field called "Comments" to almost every table. Clients very often end up coming up with their own guidelines about how to embed various information in the comments fields that the primary structure is missing. Looking over how clients are using the "comments" fields is a great way to discover new things that should be formally incorporated into the structure of your data architecture. Similarly with tags.Look at tags as a starting point for adding a bit of loose structure to the frontiers of your data architecture. Mix them in with more structured data architecture. Be ready to "graduate" tags up to the next level of structure when it becomes appropriate. Stop worrying about how to make tagging perfect and embrace it for what it is: an easy way to get started on modeling the parts of the domain that you haven't spent a long time thinking about yet. A good way to understand how users want to use your system. Something you're always revisiting, cleaning up, and using as a source of inspiration. If you see some tags getting out of hand, don't try to improve your tagging system; instead take what those tags are trying to represent and add more structured fields and queries for them. This pipeline of less to more structure should be constantly playing out in a healthy, evolving system.

评论 #33263574 未加载

评论 #33255807 未加载

评论 #33259355 未加载

maoeurkover 2 years ago

I've been wanting to make a datalog tagging system for a few things for a while now but don't have the energy to actually do it. Essentially the idea is to encode relationships allowing for very specific queries like: "show me pictures of a person wearing a green hat looking at another person" which is not something most tagging systems could reasonably do.Breaking that down, that'd be something like:<pre><code> wearing(person1, hat), is_hat(hat), is_green(hat), is_person(person1), is_person(person2), looking_at(person1, person2). </code></pre> I wanted to apply this to Brazilian Jiu Jitsu videos to be able to find very specific queries like, "matches where player 1 gets a takedown, gets swept by player 2, and player 2 wins by submission". A sufficiently well tagged data set would let you find specific stories and sequences of events in a way that I don't think a non-computational query system could do.Most of the effort and value around a system like this would be building a community of people to tag the data and tools to make that tagging easy... and perhaps a more user friendly query UI.

评论 #33264138 未加载

kortexover 2 years ago

I think tag aliases are fine, but in my opinion, tags should not have hierarchies. That is just opening the can of ontology worms, and most systems are ill-equipped to deal with ontologies...including ontological systems.Tags are just dumb strings which label data. They are basically KeyValues, where the value is just always equal to True. We don't think of KVs as hierarchical unless they are explicitly a path string, and in that case, they are forced to be a plain tree with no cycles or diamonds.

评论 #33251382 未加载

评论 #33251622 未加载

评论 #33251254 未加载

评论 #33250740 未加载

评论 #33250941 未加载

评论 #33251204 未加载

评论 #33251271 未加载

VectorLockover 2 years ago

One example of an unexpectedly rich and deep tagging ontology is the Danbooru "Anime" image board [NSFW] <a href="https://danbooru.donmai.us/" rel="nofollow">https://danbooru.donmai.us/</a>

评论 #33254726 未加载

评论 #33254434 未加载

评论 #33253190 未加载

didgetmasterover 2 years ago

I created a new kind of object store where tagging is one of its key features. Each data object (called a Didget - short for Data Widget) can have a set of contextual tags attached. This is true whether the Didget holds file data like a photo, a document, or a piece of software; or if it holds other kinds of structured or semi-structured data (relational tables, folders, configuration, etc.).Each defined tag has a data type (STRING, INTEGER, DATETIME, etc.) and a 2 level context. Like a column in a relational table within a columnar store; all the values for the same defined tag are stored together. This makes querying extremely fast.So you can define tags like Person.FirstName, Event.Wedding, FileSystem.Extension and then attach values to files and other kinds of content. You can then query the system (e.g. Find all photos where Person.FirstName = 'Billy') based on their tags.I have created containers with 200M of these objects and put a dozen or so tags on each one. It can run queries that return in just a couple of seconds.Demo Video: <a href="https://www.youtube.com/watch?v=dWIo6sia_hw" rel="nofollow">https://www.youtube.com/watch?v=dWIo6sia_hw</a>

评论 #33288381 未加载

fleddrover 2 years ago

As many commenters have mentioned (as does the article) hierarchical tags are a pain, if not an impossibility to get right. Related tags, though, can be done on the cheap and are surprisingly powerful, fun and cool under the right conditions.Say you have a massive database of photos, each photo having tags. As example we'll use the tag "United States", which is used as a tag on 50,000 photos. Next, you go over each of those 50,000 photos and check which other tags were used, and sort them by occurrence.This reveals useful and often surprising implicit relations between tags. The relation can be of any type, hierarchical or otherwise. It reveals relations never explicitly mapped or maintained. It's organic, which kind of fits the philosophy of tagging.

aaviator42over 2 years ago

A few months ago I worked on some proof-of-concept code for searching tagged data: <a href="https://github.com/aaviator42/Cha" rel="nofollow">https://github.com/aaviator42/Cha</a>I now work full-time in a role where part of my duties is designing a content tagging system and its search functionalities. It's very interesting and fun! Lots of puzzles.How do you weigh different tags? How do you do fuzzy searching ('city' should match with plural ('cities'), misspellings ('citys'), etc)?How do you program the system so that 'hotdog' is not matched with 'hot' and 'dog'? What about synonyms? What about regional terminology and synonym tables?Then there's one-to-one and one-to-many and many-to-one mapping.As a side project I'm also working on a beta public search engine that I'll launch on HN sometime in the next year or so, where I'm having similar puzzles.

评论 #33256221 未加载

评论 #33256243 未加载

评论 #33251376 未加载

roberthahnover 2 years ago

I'm so happy to see people talk about this! I too am endlessly fascinated with content tagging systems.Hillel's thoughts are completely unsurprising to me so I guess I've come to similar conclusions.I do notice that we seem to care about different things though - where Hillel appears to focus on tag types (and the implementation challenges that go with that) I focus more on human factors like what problem are we solving? for who? How do we maintain relevance (and power) in tagging systems (and for who?)I'm of the opinion that tagging systems should not be made by the few for the many but by each person for themselves. Which, of course, sucks because that puts the onus on everyone who wants tagged content to do their own work. But I believe the output of that investment would be quite valuable and useful!An easy example I could use might be recommendation engines. Assume I have a database of tags (a tag cloud?), and I know you have similar interests to me. If you also have a tag cloud, I could input links to both of our tag clouds into a purpose-built recommendation engine to discover new content I might not have consumed yet.

评论 #33253210 未加载

micromacrofootover 2 years ago

This reminds me of a talk from Clay Shirky about categorization and general ontology. It's interesting to read in hindsight, because it's from when recommendation algorithms were in their infancy.Warning PDF: <a href="https://ia800203.us.archive.org/10/items/Ontology_is_Overrated_Categories_Links_and_Tags/shirky.pdf" rel="nofollow">https://ia800203.us.archive.org/10/items/Ontology_is_Overrat...</a>> This is what we're starting to see with del.icio.us, with Flickr, with systems that are allowing for and aggregating tags. The signal benefit of these systems is that they don't recreate the structured, hierarchical categorization so often forced onto us by our physical systems. Instead, we're dealing with a significant break -- by letting users tag URLs and then aggregating those tags, we're going to be able to build alternate organizational systems, systems that, like the Web itself, do a better job of letting individuals create value for one another, often without realizing it.

评论 #33250634 未加载

subparover 2 years ago

I've done this professionally in a couple different settings, from building topic classifiers for news events (it is sometimes hard to know when one news event should stop and another start) to creating tagging systems for audio recordings of group conversations (where topics often merge in and out of each other, often within a single sentence).I'm currently working on classifying non-speech, non-musical sound and it can be useful to piggyback on an existing knowledge system, though they tend to be industry-specific. As an example, Google's ontology for sound identification [1] is a nice starting point for general classification, whereas the taxonomy [2] used by the audio post-production industry (sound effects, foley, etc) is structurally quite different (which isn't surprising, but it sure is fun!). From a totally different field (electro-acoustic composition), the work of Michel Chion and Pierre Schaeffer [3] add psychoacoustic elements to more traditional measurable characteristics, i.e. how the sound is perceived and comprehended is just as important as its medium of travel and its source. It is helpful to see what others have done before you so you can pick and choose elements of their work to incorporate into your own.1: <a href="https://github.com/audioset/ontology" rel="nofollow">https://github.com/audioset/ontology</a>2: <a href="https://docs.google.com/spreadsheets/d/1b2UhKpcOAE-jd1edOsxCJALqttyL761h/edit#gid=1490464926" rel="nofollow">https://docs.google.com/spreadsheets/d/1b2UhKpcOAE-jd1edOsxC...</a>3: [big pdf!] <a href="https://monoskop.org/images/0/01/Chion_Michel_Guide_To_Sound_Objects_Pierre_Schaeffer_and_Musical_Research.pdf" rel="nofollow">https://monoskop.org/images/0/01/Chion_Michel_Guide_To_Sound...</a>

评论 #33256384 未加载

counttheforksover 2 years ago

Anyone have a suggestion for a tagging filesystem that is maintained? Or if not a filesystem, something that at least works? I still feel like this is the best way to organize personal photos and media, and while <a href="https://www.tagsistant.net/" rel="nofollow">https://www.tagsistant.net/</a> is pretty good it hasn't been updated in 6 years and is fairly buggy.

评论 #33250708 未加载

评论 #33250822 未加载

评论 #33251263 未加载

评论 #33258698 未加载

评论 #33252103 未加载

评论 #33256480 未加载

评论 #33251591 未加载

joshuover 2 years ago

there's a massive difference between tagging-for-self-recall and tagging-for-other-recall. when i invented tagging the first was paramount, but the latter has become dominant and has very different design considerationsone interesting note: you can infer a bunch of hierarchical information since people frequently tag from broader to more specific, topicwise.some things can be tagged by multiple people and you can thus infer synonyms as well. this can thus be fixed in search.

评论 #33251298 未加载

CobrastanJorjiover 2 years ago

One weird content tagging system I recall was Amazon's "Amapedia" (<a href="https://en.wikipedia.org/wiki/Amapedia" rel="nofollow">https://en.wikipedia.org/wiki/Amapedia</a>). It was a product wiki, a way for people to curate information of all sorts about Amazon products. It allowed each product to be arbitrarily tagged. It was short-lived, failed, and abandoned, for all of the reasons you'd immediately expect.What was neat about it was that it must have involved someone a little too interested in set theory. A product was an article, and a product could have tags, but tags were themselves articles, and so tags could also have tags, and those tags could also belong to tags, etc.The whole system was focused on these tags. If you wanted to compare two products, you'd compare the pages, and the comparison would focus on the differences in the tags of the two pages. Tags could have values, too, so products could have a "RAM" tag, and each RAM tag would have an associated value for that page, but the RAM page itself would have general information about RAM as a concept (which would probably have tags itself...). Searching worked the same way. You could search for pages with certain tags or tags whose values were greater/less/equal to whatever values.Anyway, it was a fun and interesting way to do content tagging that did not work out.

heliophobicdudeover 2 years ago

My similar issue is with names in source code.Fuzzy matching names and interrogating the contributor about the changes being checked in. Questions to ask the contributor, are the names similar to any of these other names? Is there an opportunity to use the same name or are they different concepts?Code grows and grows and becomes harder to grep if inconsistently naming things.

turnsoutover 2 years ago

This is the reason the Semantic Web never took off—people on the internet can't even agree on what a "sandwich" is, let alone the exact hierarchy of ontology.This is an area where large language models have a role to play—whatever you're hoping to achieve with user-generated tags can probably be achieved with ML-powered associations or navigation. And the potential benefit is that it could be tailored to each user—so you're only surfacing "Hot Dogs" when certain users click "Sandwich."

评论 #33252026 未加载

评论 #33251934 未加载

philip1209over 2 years ago

We spent a lot of time building tagging systems to organize technology skills on <a href="https://www.moonlightwork.com" rel="nofollow">https://www.moonlightwork.com</a>.The coolest part was training a collaborative filter on the tags. So, when you add "Django" as a skill, it could recommend "Python" as a related skill. This made for some refined user experiences.Getting typeahead search right took a lot of refinement. Here is some of the logic we ended up implementing over time:1. Exact matches get prioritized first (e.g. "Go")2. Abbreviations support (e.g., "AWS" for "Amazon web services" or "ROR" for "Ruby on Rails")3. Name that start with query should go before non-leading matches (e.g., "Ru" should return "ruby" before "task runner")4. We tracked an "Aliases" column for each tag to enhance search. So, "golang" was an alias for "go".

qwerty456127over 2 years ago

This is crazily sad non-invasive (without embedding into the file body) tagging is not standardized across OSes and file systems. The only system to support tags I know is KDE/Dolphin/Baloo, outside KDE tagging seemingly is supported only by a handful of incompatible 3-rd party apps.Sadly I don't expect much progress to happen in this area. Almost nobody cares about storing and organizing of files locally nowadays.I hope it is going to be done some day or later (there isn't much to do: just standardize some xattrs and something like RDF schema to be used in an alternative FS stream + add support for these to the standard file management and search tools, this is orders of magnitude easier than implementing a new FS) but probably not soon - it would be a huge luck to get any resources allocated to this.

xcskier56over 2 years ago

The hierarchical nature of the information he's talking about really reminds me of the ontologies and terminologies that are used in healthcare to organize medical information. E.g. Ibuprofen 10mg Tab < Ibuprofen < NSAID < ... < Therapeutic Chemical.This is a field that I'm only tertiary familiar with but it's a fascinating discipline trying to group, and manage all of the different categories of healthcare data. You can use the RxNav tool to look at the RxNorm terminology which is only 1 of many terminology systems.<a href="https://mor.nlm.nih.gov/RxNav/search?searchBy=String&searchTerm=ibuprofen" rel="nofollow">https://mor.nlm.nih.gov/RxNav/search?searchBy=String&searchT...</a>

emjover 2 years ago

Openstreetmap is map data that is basically coordinates with tags on them and relations between those tags. I guess this is true for most GIS software but there is very little 2D map data that can not be described in the OSM tagging model.You can never express everything with tags, you need stats and metadata on metadata, documentation and a strong heterogeneity which also need to be able to adapt to new ideas.<a href="https://wiki.openstreetmap.org/wiki/Tags" rel="nofollow">https://wiki.openstreetmap.org/wiki/Tags</a> <a href="https://wiki.openstreetmap.org/wiki/Map_features" rel="nofollow">https://wiki.openstreetmap.org/wiki/Map_features</a>

评论 #33254073 未加载

aaws11over 2 years ago

<a href="https://threadreaderapp.com/thread/1534301374166474752.html" rel="nofollow">https://threadreaderapp.com/thread/1534301374166474752.html</a>

PeterStuerover 2 years ago

Look into AI systems from the 1960's and you will find Semantic Networks. If you just need categories you can go with taxonomies and folksonomies. If you want to (over?) formalize and describe mainly non-agentive structure you look at ontologies.

asdffover 2 years ago

I don't know how people deal with tags. It adds so much friction to me. Naming tags, deciding what rules this tag is supposed to have, deciding what stuff is tagged. I tried the firm approach of being extremely discrete with tags and it took a lot of effort, and I've tried the loose approach of tagging things if they are even slightly related which imo defeated the whole purpose of organizing things to make it easy to find them later if a lot of tangentially related things share the same tags.Folders seem a lot more straightforward for me at least, and if I need something in two places at once, there's always ln -s

labradorover 2 years ago

I am too but I've given up. I've collected a lot data over the years and spent a lot of time trying to organize it so I can find relevant connections. It's just too time consuming. I've decided discerning relationships in unstructured data is where I want to focus.

评论 #33259048 未加载

flanked-everglover 2 years ago

Look at wikidata, RDF and semantic web. This is somewhat a well solved problem that should not be solved differently again.

c7bover 2 years ago

Tags are arguably superior to folders for organising files, unfortunately the major OS don't seem to agree. I'm using the same (expanded, adapted) folder structure for all my files since I got my first computer, and it's survived multiple OS migrations, being synced between multiple devices with different form factors, multiple changes in life circumstances (school, undergrad, postgrad, work),... I love tags and I've used them in some parts (eg in my old mp3 collection, for academic papers, for Anki flash cards) and I'd love to use a (simple and dumb, not rich enough to enable set theory paradoxes) tagging system to organise my files instead.However, my experience has left me convinced that the only truly long-term solution for your personal data are flat files sitting on your hard drive inside a simple hierarchical folder structure. Anything else is likely going to rot at some point, after a system change, after some BigTech decides they want to use something else, after a start-up disappears, or it's going to keep you locked into some walled garden. Unless there's something I've missed, if so please let me know.

评论 #33258874 未加载

pessimizerover 2 years ago

> It gets even more complex if tags can have multiple parents, like Wikipedia categories. "American Male Novelists" is a subtag of "American Male Writers" and "American Novelists". Now we have diamond problems, redundancy, a whole host of other edge cases.I don't understand this problem. I would think that you would havetag:americantag:maletag:novelisttag:writer,and tag:novelist would itself be tagged as tag:writer, because all novelists are writers.

tra3over 2 years ago

I've been dabbling in personal knowledge bases for a long time now. I remember the when I discovered tags -- thought it was the best thing ever. The first good implementation in the wild (for me) was del.icio.us. Eventually I ran into all the problems that the linked thread describes. "Movie" or "movies"? "Book" or "books"?In any case, I still think flat tag lists are better than a directory tree structure ("Content/Movies" vs "movies, movie, entertainment, science fiction, space travel, aliens").A recent innovation that I'm enjoying is backlinks. I believe roam research was the first major player that showed you related entries via the links that you included, even though a similar concept existed forever. Then you can generate clouds of relationships and find concepts visually [0].0: <a href="https://noduslabs.com/cases/visualize-connections-notes-roam-research-infranodus/" rel="nofollow">https://noduslabs.com/cases/visualize-connections-notes-roam...</a>

评论 #33250751 未加载

craniumover 2 years ago

Tags are beautiful. They enable a non-hierarchical way of linking elements together so they form a graph. And graphs are beautiful. But they are also messy and bring a whole cohort of problems that you wouldn't have with trees.The problem with tags is that they are the first and often only metadata available to represent the complex relationships between elements. So everything goes in it: tags for the semantic (ontology is rabbit hole in itself), tags for relations with other items, and not forgetting the tags project management (priorities, people, milestone,...).Want to empower your tags, for instance adding hierarchy or dynamic tags? Then every tag will get these features and associated problems. A solution would be to have tags of different "types", each processed differently, and migrate the metadata from a "bag of tags" to "a bag of bags of tags". But then tagging wouldn't be as simple writing a name in a field.

raffraffraffover 2 years ago

I'd love to know what those prolific Spotify engineers think of this.That was a joke because Spotify doesn't let you tag music.

评论 #33258563 未加载

poloteover 2 years ago

A big miss on the list, is that words (so a tag) do not mean the same things for each people and do not even mean the same things in different contexts

openfutureover 2 years ago

Why twitter man.. these questions are clearly important but there is a space to discuss them <a href="https://matrix.to/#/#datalisp:matrix.org" rel="nofollow">https://matrix.to/#/#datalisp:matrix.org</a>

评论 #33251946 未加载

评论 #33252854 未加载

contextfreeover 2 years ago

"Advice: don't let the tag predicates refer to other tags"But then how would I search by the tag of all tags that do not tag themselves???

cptcobaltover 2 years ago

I can't wait for the author of this thread to discover the AO3 tagging system, which is, frankly, a masterpiece that demonstrates how effective community management can lead to extremely good tagging and categorization, with very little miscategorization.<a href="https://www.wired.com/story/archive-of-our-own-fans-better-than-tech-organizing-information/" rel="nofollow">https://www.wired.com/story/archive-of-our-own-fans-better-t...</a><a href="https://archiveofourown.org/faq/tags" rel="nofollow">https://archiveofourown.org/faq/tags</a>

评论 #33250545 未加载

评论 #33250591 未加载

评论 #33259477 未加载

评论 #33250815 未加载

评论 #33250564 未加载

评论 #33251083 未加载

blueblobover 2 years ago

A lot of the items described are problems in ontologies

评论 #33250531 未加载

NWoodsmanover 2 years ago

In my app, users apply a set of tags to a note, but then the app automatically creates hierarchical associations in a tree. There are an exponential number of associations between tags (At one point design was failing because it was trying to prebuild 100k+ GUI items for these cross-referenced tags) so I had to virtualize the intersection of tags at the exact moment a user expands a tree item.You cannot plan what tag search will lead you back to the data you want, so every node in the graph must be bidirectional.

dekervinover 2 years ago

I hacked together a small extension to tag hacker news stories. A small presentation here,<a href="https://datum.alwaysdata.net/static/extension/index.html" rel="nofollow">https://datum.alwaysdata.net/static/extension/index.html</a>With the js files for the extension.The motivation to finish it partly came from this hn thread. <a href="https://news.ycombinator.com/item?id=32970560" rel="nofollow">https://news.ycombinator.com/item?id=32970560</a>

PaulHouleover 2 years ago

People look forward to a visit with the ontologist they way they do a visit with the orthodontist.

somatover 2 years ago

My (Chomskyish)hierarchy of tag systems goes something like.tagged datakey=value tagged datahierarchically tagged data (we just found the the unix filesystem!)hierarchical key = value tagged data (oh damn, it's ldap, we dug too deep.)

hamashoover 2 years ago

When I started creating a simple blog system as a newbie developer, I needed to design its category/tagging system. Then I was surprised by the lack of good resources on how to design such basic features. I just wanted to know several design patterns and their pros and cons, but I couldn't find any, so I ended up designing my own crappy system.I hope someone wrote articles on it with actual DB schemas.

photochemsynover 2 years ago

One area that's illuminating is the effort to annotate the results of whole-genome sequencing projects. Tagging stretches of the genome which represent coherent units of some sort, and then relating them to some functional capability of the organism, is not at all a solved problem.Here's an overview from 2011 where they're struggling to even get a good tagging system up for single-celled microorganisms (a much easier problem than multicellar genomes like humans):<a href="https://pubmed.ncbi.nlm.nih.gov/22180819/" rel="nofollow">https://pubmed.ncbi.nlm.nih.gov/22180819/</a>> "Highlights include the development of annotation assessment tools, community acceptance of protein naming standards, comparison of annotation resources to provide consistent annotation, and improved tracking of the evidence used to generate a particular annotation. The development of a set of minimal standards, including the requirement for annotated complete prokaryotic genomes to contain a full set of ribosomal RNAs, transfer RNAs, and proteins encoding core conserved functions, is an historic milestone."

swyxover 2 years ago

we have a big tagging problem where i work and yesterday I tried using gpt3 to assist. worked well!code and context: <a href="https://github.com/airbytehq/airbyte/issues/17893" rel="nofollow">https://github.com/airbytehq/airbyte/issues/17893</a>

cpsnsover 2 years ago

I’ve written a tagging system from scratch for an existing system and it was one of the most interesting things I’ve worked out. I had total control over how it was implemented and I think I came up with a really nice, minimalist, scalable way to tag things, and to search them.

评论 #33250843 未加载

评论 #33262513 未加载

ethnover 2 years ago

Generally you either use Latent Dirichlet Allocation, exact tags, or a mixture of both. I structure the metric space to weigh exact tags greater than LDA—-whereas you can then create two more classes in that LDA space, of the heavier similar tags and then the description.

Archelaosover 2 years ago

Those interested in the state of the art of professional tagging systems in culture heritage may have a look into the CIDOC Conceptual Reference Model (CRM): <a href="https://www.cidoc-crm.org/" rel="nofollow">https://www.cidoc-crm.org/</a>

ok_dadover 2 years ago

I like how they worked out an advanced tagging system's requirements from a ~dozen tweets, starting with the most basic tagging system and working up through a tag hierarchy to a tree to a DAG, then even talks about K/V tags and etc.

robgover 2 years ago

Surprised no one has nailed a use case for semantic tags and their associations. Python and snake doesn’t require hierarchies to differentiate from Python and coding. Why aren’t co-occurrences within and between content samples enough?

jshandlingover 2 years ago

I set out building my first full-stack webapp [0] to make a custom theme-based tagging/organizational system for musical ideas. I did not initially realize all the hairy design choices inherent in this domain, but have found it humbling and educational.Remaining features to be implemented include in-app audio recording, editing, and custom labeling outside of the main tree structured organizational system.I'd appreciate any thoughts or suggestions if anyone cares to take a look![0] <a href="https://www.soundseeker.app/" rel="nofollow">https://www.soundseeker.app/</a>

AtlasBarfedover 2 years ago

Eh, the diamond problem and transitive issues don't exist because what is being reduced to is simply a set and membership. if expansions / aliases / synonyms / multi-membership produce overlaps, who cares, it's a set of hashs. The overwrites only represent wasted computation.Really this is a simpler version of multiple inheritance. You don't have the issue of conflicting method signatures and implementations, only names.The only danger is names meaning different things. You need your tags to be relatively unique to the meaning.

PaulHouleover 2 years ago

Maybe it's the project I am working on but right now I see the ideal search interface to be something like an OWL class axiom, that is, I am searching for instances of a class that has the following restrictions<pre><code> * subclass of Actor * subclass of Singer * has been in at least 7 movies * was born after December 3, 1980 * has been married to at most 3 other people </code></pre> these can be intersected, unioned, complemented, etc.

评论 #33254480 未加载

system2over 2 years ago

I am increasingly hating twitter being used for blogging.

didipover 2 years ago

If you don’t want to think too hard, just funnel the tags information into a search engine like Elastic Search.It already handles stemming, stop words, aliases, etc.

acchowover 2 years ago

Sounds like they are trying to embed the search semantics in the data storage. Why not treat search as a distinct problem?

评论 #33255219 未加载

josefrichterover 2 years ago

I was fascinated by ontologies 10 years ago. Since then, I've been studying human brain, only to realize that this is an effort to basically build a software version of human brain. Maybe it's possible, but it's definitely not feasible in 99.9% of cases. The closest thing we have is some machine learning approaches.

tincoover 2 years ago

If you're at the point where you're adding hierarchies to your tags, I think you're fighting a losing battle. At that point, why not do what Google does and just make a BERT embedding. No way you're going to manually achieve the full extent of complexity of how humans group and describe things.

pphyschover 2 years ago

My current solution to this problem is just putting a JSONB column in relevant tables. GIN indexes do the heavy lifting as needed.This lets us implement arbitrary, queryable ontologies on top of the data without requiring further database instrumentation (aside from creating an index now and then).

Tomteover 2 years ago

Also great on the topic of tagging, with more information about the AO3 scheme: <a href="https://idlewords.com/talks/fan_is_a_tool_using_animal.htm" rel="nofollow">https://idlewords.com/talks/fan_is_a_tool_using_animal.htm</a>

terpimostover 2 years ago

I was interested in that too. I stopped when as soon as I realized that any good search in tagging system would be just a full text search. E-commerce catalogs have detailed filters but I think people use maximum 2 properties in addition to simple name input search

ggmover 2 years ago

Approximate date is the bugbear of photo tagging. EXIF and Dublin core and vendors can't agree what to do. Camera manufacturers don't care because at time of shot, date is fixed. It's archival, scanned and copied predigital work.

评论 #33255293 未加载

kristianpover 2 years ago

In a more readable form:<a href="https://threadreaderapp.com/thread/1534301374166474752.html" rel="nofollow">https://threadreaderapp.com/thread/1534301374166474752.html</a>

redbar0nover 2 years ago

A very insightful thread by Hillel Wayne on content tagging systems and their challenges.Their ubiquitous use (in library and information sciences, and popular social networks like Instagram, Twitter, and Pinterest), their deceptive ease of implementation, and "obvious advantages" over hierarchies/folders, means that almost every developer has (or will) run into them at one point or another..Feel free to comment with good theory and case studies on tagging systems. (It's especially interesting with good case studies for how to model an advanced tag system in a graph database).

评论 #33250951 未加载

throwaway920102over 2 years ago

Empornium aka luminance has a great tagging system.

endisneighover 2 years ago

Is there an optimal tagging system, performance wise? Seems like there could be a database just for tagging.

评论 #33273910 未加载

nickm12over 2 years ago

A lot of the author's questions can be answered by "use an inverted index".

k__over 2 years ago

I don't know much about this topic.The only thing I learned: if you think you have a taxonomy, then you don't.

scottmcdotover 2 years ago

Yet we still can't search for multiple hash tags on instagram.

taylorbuleyover 2 years ago

Pro tip: use stemming!

wtf77over 2 years ago

I am endlessly fascinated by how twitter has now become a dumping ground for complex topics that are difficult to read and follow. But what happened to the old blogs?

评论 #33251993 未加载

评论 #33251299 未加载

评论 #33253456 未加载

评论 #33252258 未加载

72 comments

errantmindover 2 years ago

评论 #33256573 未加载

评论 #33253160 未加载

评论 #33256105 未加载

评论 #33255724 未加载

评论 #33256322 未加载

评论 #33267940 未加载

dahdumover 2 years ago

评论 #33253137 未加载

评论 #33256058 未加载

评论 #33257157 未加载

评论 #33253777 未加载

评论 #33253120 未加载

xg15over 2 years ago

评论 #33254340 未加载

评论 #33252729 未加载

评论 #33252852 未加载

评论 #33253003 未加载

评论 #33254025 未加载

评论 #33255489 未加载

评论 #33257463 未加载

评论 #33252159 未加载

评论 #33255711 未加载

评论 #33253028 未加载

评论 #33253961 未加载

评论 #33253831 未加载

at_a_removeover 2 years ago

评论 #33252040 未加载

评论 #33251389 未加载

jrochkind1over 2 years ago

评论 #33252043 未加载

评论 #33251784 未加载

评论 #33252212 未加载

评论 #33252011 未加载

评论 #33250987 未加载

评论 #33250720 未加载

feorenover 2 years ago

评论 #33263574 未加载

评论 #33255807 未加载

评论 #33259355 未加载

maoeurkover 2 years ago

评论 #33264138 未加载

kortexover 2 years ago

评论 #33251382 未加载

评论 #33251622 未加载

评论 #33251254 未加载

评论 #33250740 未加载

评论 #33250941 未加载

评论 #33251204 未加载

评论 #33251271 未加载

VectorLockover 2 years ago

One example of an unexpectedly rich and deep tagging ontology is the Danbooru "Anime" image board [NSFW] <a href="https://danbooru.donmai.us/" rel="nofollow">https://danbooru.donmai.us/</a>

评论 #33254726 未加载

评论 #33254434 未加载

评论 #33253190 未加载

didgetmasterover 2 years ago

评论 #33288381 未加载

fleddrover 2 years ago

aaviator42over 2 years ago

评论 #33256221 未加载

评论 #33256243 未加载

评论 #33251376 未加载

roberthahnover 2 years ago

评论 #33253210 未加载

micromacrofootover 2 years ago

评论 #33250634 未加载

subparover 2 years ago

评论 #33256384 未加载

counttheforksover 2 years ago

评论 #33250708 未加载

评论 #33250822 未加载

评论 #33251263 未加载

评论 #33258698 未加载

评论 #33252103 未加载

评论 #33256480 未加载

评论 #33251591 未加载

joshuover 2 years ago

评论 #33251298 未加载

CobrastanJorjiover 2 years ago

heliophobicdudeover 2 years ago

turnsoutover 2 years ago

评论 #33252026 未加载

评论 #33251934 未加载

philip1209over 2 years ago

qwerty456127over 2 years ago

xcskier56over 2 years ago

emjover 2 years ago

评论 #33254073 未加载

aaws11over 2 years ago

<a href="https://threadreaderapp.com/thread/1534301374166474752.html" rel="nofollow">https://threadreaderapp.com/thread/1534301374166474752.html</a>

PeterStuerover 2 years ago

asdffover 2 years ago

labradorover 2 years ago

评论 #33259048 未加载

flanked-everglover 2 years ago

Look at wikidata, RDF and semantic web. This is somewhat a well solved problem that should not be solved differently again.

c7bover 2 years ago

评论 #33258874 未加载

pessimizerover 2 years ago

tra3over 2 years ago

评论 #33250751 未加载

craniumover 2 years ago

raffraffraffover 2 years ago

I'd love to know what those prolific Spotify engineers think of this.That was a joke because Spotify doesn't let you tag music.

评论 #33258563 未加载

poloteover 2 years ago

A big miss on the list, is that words (so a tag) do not mean the same things for each people and do not even mean the same things in different contexts

openfutureover 2 years ago

评论 #33251946 未加载

评论 #33252854 未加载

contextfreeover 2 years ago

"Advice: don't let the tag predicates refer to other tags"But then how would I search by the tag of all tags that do not tag themselves???

cptcobaltover 2 years ago

评论 #33250545 未加载

评论 #33250591 未加载

评论 #33259477 未加载

评论 #33250815 未加载

评论 #33250564 未加载

评论 #33251083 未加载

blueblobover 2 years ago

A lot of the items described are problems in ontologies

评论 #33250531 未加载

NWoodsmanover 2 years ago

dekervinover 2 years ago

PaulHouleover 2 years ago

People look forward to a visit with the ontologist they way they do a visit with the orthodontist.

somatover 2 years ago

hamashoover 2 years ago

photochemsynover 2 years ago

swyxover 2 years ago

cpsnsover 2 years ago

评论 #33250843 未加载

评论 #33262513 未加载

ethnover 2 years ago

Archelaosover 2 years ago

ok_dadover 2 years ago

robgover 2 years ago

jshandlingover 2 years ago

AtlasBarfedover 2 years ago

PaulHouleover 2 years ago

评论 #33254480 未加载

system2over 2 years ago

I am increasingly hating twitter being used for blogging.

didipover 2 years ago

If you don’t want to think too hard, just funnel the tags information into a search engine like Elastic Search.It already handles stemming, stop words, aliases, etc.

acchowover 2 years ago

Sounds like they are trying to embed the search semantics in the data storage. Why not treat search as a distinct problem?

评论 #33255219 未加载

josefrichterover 2 years ago

tincoover 2 years ago

pphyschover 2 years ago

Tomteover 2 years ago

terpimostover 2 years ago

ggmover 2 years ago

评论 #33255293 未加载

kristianpover 2 years ago

In a more readable form:<a href="https://threadreaderapp.com/thread/1534301374166474752.html" rel="nofollow">https://threadreaderapp.com/thread/1534301374166474752.html</a>

redbar0nover 2 years ago

评论 #33250951 未加载

throwaway920102over 2 years ago

Empornium aka luminance has a great tagging system.

endisneighover 2 years ago

Is there an optimal tagging system, performance wise? Seems like there could be a database just for tagging.

评论 #33273910 未加载

nickm12over 2 years ago

A lot of the author's questions can be answered by "use an inverted index".

k__over 2 years ago

I don't know much about this topic.The only thing I learned: if you think you have a taxonomy, then you don't.

scottmcdotover 2 years ago

Yet we still can't search for multiple hash tags on instagram.

taylorbuleyover 2 years ago

Pro tip: use stemming!

wtf77over 2 years ago

I am endlessly fascinated by how twitter has now become a dumping ground for complex topics that are difficult to read and follow. But what happened to the old blogs?

评论 #33251993 未加载

评论 #33251299 未加载

评论 #33253456 未加载

评论 #33252258 未加载