Ask HN: Why is Confluence Wiki Search so bad?

176 点作者 nicktorba超过 3 年前

The title says it all. To me, the most important component of a wiki is search. With that said, why is confluence wiki search basically unusable?(by unusable, I mean I can never find the page I am looking for when I search. Basically, I have to maintain my own wiki of important links I may need to reference in the future)

36 条评论

polote超过 3 年前

Searching corporate wiki is pretty difficult, because contrary to something like Google, you can't use context of a search query to recommend content.* First you have a few occurrence of the same search query in your search history (because only a few people searched similar words in the past)* You can't either use synonyms of remove stop words to recommend better content (IT, can means "information technology, or the pronoun. THE can be an acronym, ...).So basically the only thing you can do is search words. Confluence is worse than that because it tries to remove stop words and do things that break exact match search. But this is a difficult job. Ways to improve search: allow multi titles, index with tags, attributes, only do exact words match, allow users to suggest content for a specific search query, search autocompletion, searching in live during typing ... (many things that Confluence doesn't care about). You also have to respect rights when returning documents, each documents, can have rights from folder or document itself, inherited from team access or user access, so this is really computation intensive too, or pre-compute rights(Working on a competitor [0] of Confluence and I have put plenty of hours of work on that specific issue, and I can tell you this is really hard)[0] <a href="https://dokkument.com" rel="nofollow">https://dokkument.com</a>

评论 #28599211 未加载

评论 #28599916 未加载

评论 #28608166 未加载

评论 #28600439 未加载

simonw超过 3 年前

The good news here is that the Confluence API is actually really good, and very easy to integrate with.I wrote a custom search engine that worked by running on cron, pulling in all of the content from Confluence and writing it into a SQLite table with SQLite full-text search enabled (using <a href="https://sqlite-utils.datasette.io/en/stable/python-api.html#full-text-search" rel="nofollow">https://sqlite-utils.datasette.io/en/stable/python-api.html#...</a>), then sticking a <a href="https://datasette.io/" rel="nofollow">https://datasette.io/</a> interface in front of it.

评论 #28598690 未加载

评论 #28598977 未加载

评论 #28599821 未加载

评论 #28601757 未加载

PaulHoule超过 3 年前

Most search engines are pretty bad because the developers of most search engines don't do any work to improve relevance.This methodology works<a href="https://ccc.inaoep.mx/~villasen/bib/AN%20OVERVIEW%20OF%20EVALUATION%20METHODS%20IN%20TREC%20AD%20HOC%20IR%20AND%20TREC%20QA.pdf" rel="nofollow">https://ccc.inaoep.mx/~villasen/bib/AN%20OVERVIEW%20OF%20EVA...</a>and I used it to tune up the relevance of a search engine for patents to the point where users could immediately perceive that it worked better than other products.After I worked on that I wound up talking to the developers and/or marketing people for many enterprise search engines and few of them, if any, did any kind of formal benchmarking of relevance.People at one firm told me that they used to go to TREC conferences because they thought it got them visibility but that they decided it didn't so they quit going.A message I got repeatedly was that these firms thought that the people who bought the search engines didn't care much about relevance, but they did care about there being 200 or more plug-ins to import data from various sources.In principle the tuning is unique to the text corpus. One reason for that is that there is a balancing act of having a search engine that prefers small documents (they have spiky vectors that look more like query vectors) or large documents (they have so many words they match everything.) Different corpuses have different distributions of document sizes, not to mention different distributions of words that appear.Few organizations are willing to do the work to tune up a search engine (you have to decide about the relevance of 10,000+ document hits), but I've had the experience that you can beat the pants off the defaults even using a generic tuning. For instance that patent search engine was tuned up against the GOV2 corpus instead of a patent corpus. A small patent corpus showed us we were on the right track, however.

modeless超过 3 年前

Because "enterprise" tools are bought by people who don't have to use them, so improvements that actually matter to users are not a priority.

abeppu超过 3 年前

I'll take a stab at actually guessing why aside from the issue that people making purchasing decisions don't see how bad it is until work has already gone into bringing in docs and pushing people to use it.Aside from the organizational issues, I think there's a problem where basically no search system can be good for every org with any kind of internal info and different queries from perhaps several distinct types of users with different goals. To get good, a system needs to improve through at least rudimentary ML. At its simplest, if Alice searches for X today and clicks doc3, if Bob searches for X tomorrow, doc3 should rank higher. This requires collecting and aggregating click stream data, and using this count info (with cardinality #docs x #queries) at search time. But sometimes it requires a richer model relating search terms to terms in relevant (clicked) docs and optimizing for some measure of search quality (NDCG) etc. All of this requires detailed access to docs, search/click histories, and a fair amount of computation and storage. But customers have legit reasons for wanting these docs to only be accessible by their own employees. And they don't want to dedicate their own staff to improving such a system. No one wants to hear that their model retaining ran out of memory, etc. So shipping a simple system which doesn't improve but doesn't have moving parts becomes a local optima.

评论 #28600208 未加载

abridgett超过 3 年前

It didn't really seem to have any prioritisation - e.g. around titles, headings or any metadata (view count, edits, last updates). Agree completely it was awful.OTOH I'm also a believer that you should be able to navigate to the right information.People seem to think that writing pages is sufficient. A library works because pages are gathered in books, organised by sections and has an army of librarians to keep it running smoothly.I treat documentation like code - DRY, refactor apply just the same. e.g. I might split a page up so that some common part can be re-used. I'll cull obsolete information or mark it obsolete. I'll _also_ updated headings to help them show up in searches.

walrus01超过 3 年前

I don't understand why people use confluence.I can gain far more functionality with a properly implemented self-hosted mediawiki server (the same code that runs wikipedia itself) with a number of useful plugins installed and enabled.It doesn't require a rocket science level of apache2+php7+mariadb knowledge to set up. The instructions are really quite straightforward.

评论 #28599267 未加载

评论 #28599497 未加载

评论 #28599172 未加载

mdoms超过 3 年前

I used to work at Atlassian but NOT on Confluence and I have no special information about this. But I can tell you that internally it is well known how awful the search is - they run one of the biggest known instances of Confluence - and there have been many spikes and projects to improve it. I have spoken to lots of people and asked why it continues to be so bad but all I get is handy-waving about how it's such a hard problem.Honestly I wish I knew more but it was like pulling teeth trying to get people there to speak openly about why it's so hard when it is solved in so many other products.

thedogeye超过 3 年前

It's unbelievably bad. This is literally the only thing you need a wiki for. I can't believe this is the market leader. Notion is going to crush them.

评论 #28602255 未加载

leetrout超过 3 年前

So I am interested in this space. There are some alternatives out there but I suspect companies will be concerned with letting a 3rd party have access to the data needed. If you are interested in this space and would be willing to chat with me about what you're looking for OR what you are currently using I'd love to chat! My email is my username at gmail.comSome existing tooling:Google cloud search has a confluence connector <a href="https://developers.google.com/cloud-search/docs/connector-directory" rel="nofollow">https://developers.google.com/cloud-search/docs/connector-di...</a>Elastic workplace search has a connector. <a href="https://www.elastic.co/guide/en/workplace-search/current/workplace-search-confluence-server-connector.html" rel="nofollow">https://www.elastic.co/guide/en/workplace-search/current/wor...</a>Lessonly had / had a thing called Obie <a href="https://www.lessonly.com/blog/how-to-search-better-in-confluence-documentation-and-workspaces/" rel="nofollow">https://www.lessonly.com/blog/how-to-search-better-in-conflu...</a>Raytion <a href="https://www.raytion.com/connectors/raytion-confluence-connector" rel="nofollow">https://www.raytion.com/connectors/raytion-confluence-connec...</a>

评论 #28599222 未加载

cybrexalpha超过 3 年前

As others have said Atlassian don't care about you, the user. Their products are piles of features that perform well in feature comparisons, with the minimum amount of effort to UX."Atlassian Tools" is on my list of automatic rejections for companies I'm thinking of working at for this reason.

VWWHFSfQ超过 3 年前

I'm my experience almost everything that Atlassian makes is total garbage. Bitbucket, Jira, Confluence, etc. are all horribly slow to the point of being unusable and most of it has very poor UI/UX. I pretty much don't recommend anything they make. It's not surprising at all that a fundamental feature of a wiki, search, doesn't work very well.

评论 #28598953 未加载

评论 #28599374 未加载

评论 #28599772 未加载

评论 #28598856 未加载

评论 #28599655 未加载

评论 #28599226 未加载

评论 #28602665 未加载

评论 #28599540 未加载

评论 #28599123 未加载

EamonnMR超过 3 年前

I crossed a huge milestone last week. I actually found something I was looking for in confluence.

giantg2超过 3 年前

In my experience, it's usually that the person who created the page did not title it with something that a person would search for.The organization of most teams' documentation is horrendous at my company. There are at least 3 different pages I have to go to for how-to articles and that's just within my current team's space. Not to mention there's limited information on those pages.Documentation is an after thought. We've also seen a lot of attrition this year. I'm the senior person on my team as a midlevel. I have one contractor who's term is up in a couple months and one junior. They can't fill the 4 positions that have been open for 2-3 months.

CPLX超过 3 年前

This thread is well timed, I was just about to pick a wiki solution and was leaning towards confluence. But search is really important to me.What’s the prevailing wisdom these days on the best solution for an internal knowledge base/wiki platform?

评论 #28599139 未加载

评论 #28599279 未加载

评论 #28598745 未加载

RegW超过 3 年前

I'm amazed to see this here.My colleagues and I have been grumbling for ages that our instance of Confluence must be really badly configured. If you put in a single word search term, there will be lots of results, but no guarantee that any pages containing that word in the title (or body), will appear above ones where it doesn't.The search problem was solved long ago by Apache Solr/Lucene. Although this may not be true for multiple languages.

dangoor超过 3 年前

I agree. This is why I've tried to make use of Confluence's other tools to make content findable and also improve search…1. give pages labels. This lets you insert a label-based index, and also makes it possible to narrow search by label2. use spaces. Separate the content into spaces based on who is likeliest to need that information. You can narrow search by space, and put a search box on the page in the space.3. use the hierarchy. You have to put the pages somewhere in the hierarchy anyway, so try to make it reasonable.4. Make useful index pages. Obviously, this doesn't scale, but if you can provide people with useful starting points, it will help them. For example, at Khan Academy we have a space for the whole org with a front page to get you to every team's front page. The engineering team has a front page with a small collection of useful & commonly-used links5. if you have a page in your hierarchy with a lot of content underneath it, add a search box on that page that constrains the search to that set of pages.The biggest problem Confluence search has is that it's terrible with relevance, and using its tools to narrow down the search can improve the relevance of the results considerably.

boyter超过 3 年前

I found the search pretty iffy at times. There was an exisiting marketplace app for it that was not much better so I wrote my own. Then turned it into a full marketplace app so others could benefit.It does partial matches anywhere in a word, supports every language even in the same document, and even has regex support for those who need it. Update instantly with instant filters.It can find things like 168.0 in 192.168.0.1 which the existing confluence search cannot for example. Or search for AKIA credentials /AKIA[A-Z0-9]{16}/ I have heard people describe it as Agolia for confluence which makes me happy.<a href="https://marketplace.atlassian.com/apps/1225034/better-instant-search-for-confluence?hosting=cloud&tab=overview" rel="nofollow">https://marketplace.atlassian.com/apps/1225034/better-instan...</a>As for why their search is so bad? It's probably due to how they apply permissions. Every permission for their search needs to apply per search per user. It makes it complex and hard to apply changes, making it hard to improve things. I imagine it's one of those parts of confluence that is a major pain to work with.I think a lot of this is also due to their cloud migration. When using the server version they were allowing you to host yourself you could store the index on disk. With cloud they suddenly need to keep the index state somewhere persistent, but they also want to dynamically scale up and down.Lastly, they also apply stop words, stemming and such, using out of the box lucene. Lucene is a great tool, but it can also be a pain to work with. You can see problems when you mix languages on the page too, such as having Thai, Chinese and English on a single page which confuses the Lucene tokeniser.

dmpanch超过 3 年前

We are using Confluence for public and internal wiki, it has a bad search and really slow, but no matter how much everyone hates it, the market does not provide worthy alternatives.When choosing 3 years ago, we used the following criteria:* WYSIWYG editor. Any user must have a minimum effort to write documentation* Flexible access permissions to various parts of the documentation. Public documentation is open to anonymous users, the internal one is divided into many sections with access for certain groups* Multilingual support. Not out of the box, but possible with plugins* Multilingual pdf export. In some markets, some customers prefer to have exported manuals* The ability to inherit articles. We need to be able to make edits once, instead of duplicating the same articles* Have a relatively modern appearance. Wiki engines are familiar to many because the whole world uses Wikipedia, but this does not make them more pleasing to the eyes, if I can say so3 years have passed, I periodically look at alternatives, so far only wiki.js seems like a good solution but it’s not even close yet.

评论 #28599799 未加载

sideproject超过 3 年前

I use BitBucket, because it's free and I've been using it for a long time. Maybe GitHub is faster, but I don't access BitBucket enough to justify migrating ~50 repos I have. Can't be bothered. Its UI/UX? meh. I got used to it.I use Confluence and Jira because, again, we use them at work. So I guess I'm using them because I have to. I also understand it's a pain to move our company from one to another (oh we've had discussions to move to Coda and others) but again, I'm not taking on that project. Again, UI/UX, search - all meh - they are working and I got used to it.The inconvenience of using them does not justify the amount of time I need to spend to overcome my inconvenience. Some things, you just have to let them slide.

sharva超过 3 年前

Yes both Jira and Confluence search are frustrating at times. This is one of the big wins of using Glean (<a href="https://glean.com" rel="nofollow">https://glean.com</a>) for me as a developer :-)

BuyMyBitcoins超过 3 年前

On a confluence that covers the whole of the Fortune 500 company I work for, I do NOT want to search over the corpus of all the documents hosted on it. I want a persistent search filter where I can easily restrict my results within certain parameters without having to constantly re-filter my results.I think most search engine designers want to make the index as broad as possible, but the problem seems to be that people rarely want such broad searches. What they really want are very detailed indices and metadata implications over well trodden folders.

Krssst超过 3 年前

In my understanding, you have to prefix all your keywords with "+" for all of them to be necessary for a page to be included in your results. This makes the behavior slightly closer to Google.

jacquesm超过 3 年前

Try gmail. More than a decade on and still no partial word match.

评论 #28598402 未加载

评论 #28599861 未加载

评论 #28598287 未加载

Cryptonic超过 3 年前

Yes it only finds you crap results. Not sure why they have the most naive search algorithm out there. Maybe good search needs more AI and CPU power than we think.Maybe this is something google should take on. A search plugin for Confluence where google crawlers logs in from time to time for internal crawling to enable non-public teach request on that data. That boost knowledge workers efficiency a lot. I hope somebody from Google reads this and takes on the challenge. I'm sure companies would pay a lot for this.

评论 #28598399 未加载

hyperation超过 3 年前

Same experience for me. However, I started to be more diligent on tagging each Confluence page whenever I see them lacking and that definitely helps with the searches.

irvingprime超过 3 年前

Compared to jira search Confluence search is quite good.

deevin9超过 3 年前

My company uses Coveo [www.coveo.com] for their intranet. They have a native connector for Confluence, it works MUCH better: <a href="https://docs.coveo.com/en/1716/index-content/install-the-coveo-plugin-for-atlassian-confluence" rel="nofollow">https://docs.coveo.com/en/1716/index-content/install-the-cov...</a>

staplung超过 3 年前

It's been a long time since I worked at Google but when I did (10 yrs ago), the search system for the intranet was notoriously awful. Part of the reason was that PageRank tends not to work so well in places where things aren't heavily cross-linked, which is a hard place to get to if you search system already sucks.

评论 #28598820 未加载

phone8675309超过 3 年前

ysk: You can save sites for reference later if you don't want to create a page in Confluence to do it: <a href="https://support.atlassian.com/confluence-cloud/docs/save-a-page-for-later/" rel="nofollow">https://support.atlassian.com/confluence-cloud/docs/save-a-p...</a>If you want best of both words, you can use the "Favorite Pages Macro" on any page to reference all of the pages that you have saved for later, which makes keeping that page up to date with your latest changes to saved pages trivial.

xs83超过 3 年前

Ive not really found any of the searches in Wiki's like this to be good - Notion is a beautiful Wiki type tool let down by its absolutely atrocious search capability.

nitwit005超过 3 年前

I don't think it's unusually bad. Rather, if an app offers open ended search, it will generally generate fairly poor results.

评论 #28599823 未加载

sahinyanlik超过 3 年前

I really wish one day I can search Bitbucket as I can search Github.

marcodiego超过 3 年前

Let's stop asking "why closed feature in closed product works so bad?" type of questions. The only appropriate answer is: because costumers continue to use it.

评论 #28599841 未加载

itomato超过 3 年前

Lucene

Kalanos超过 3 年前

Confluence search is great! I could always find what I needed. In fact it's my favorite feature about Confluence. I'd say it's my favorite search outside of Google.