TechEcho

15 comments

jartover 1 year ago

Looking at <a href="https://explore2.marginalia.nu/search?domain=simonwillison.net" rel="nofollow noreferrer">https://explore2.marginalia.nu/search?domain=simonwillison.n...</a> now that's an interesting service. The web has felt isolating since it became commercialized. Bloggers are living in the Google dark ages right now. Having information like this be readily accessible could help us find each other and get the band back together. The open web can be reborn.

评论 #38507188 未加载

评论 #38507061 未加载

marginalia_nuover 1 year ago

BTW, if anyone wants to dabble in this problem space, I make among other things the entire link graph available here: <a href="https://downloads.marginalia.nu/exports/" rel="nofollow noreferrer">https://downloads.marginalia.nu/exports/</a>(hold my beer as I DDOS my own website by offering multi-gigabyte downloads on the HN front page ;-)

评论 #38508084 未加载

评论 #38542297 未加载

robbomacraeover 1 year ago

In 2012 I was trying to turn my PhD thesis into a product for a better guitar tab and song lyrics search engine. The method was precisely this: use cosine similarity on the content itself (musical instructions parsed from the tabs or the tokens of the lyrics).This way I wasn't just able to get much better results searching than with PageRank but there was another benefit byproduct of this approach in that you could cluster the results and choose a distinct separate cluster for each subsequent result. With google you would not just get bad results at number 1 but results 1-20 would be near duplicates of just a few distinct efforts.Unfortunately I was a terrible software engineer back then and had much to learn about making a product.

janalsncmover 1 year ago

The author describes calculating cosine similarity of high dimensional vectors. If these are sparse binary vectors why not just store a list of nonzero indexes instead? That way your “similarity” is just the length of the intersection of the two sets of indexes. Maybe I’m missing something.

评论 #38507010 未加载

评论 #38506587 未加载

jakearmitageover 1 year ago

I love the random page: <a href="https://search.marginalia.nu/explore/random" rel="nofollow noreferrer">https://search.marginalia.nu/explore/random</a>This makes me feel in the old open web again.

eek2121over 1 year ago

I am surprised nobody has thought about looking into page content itself to help fight spam. If a blog has nothing except paid affiliate links (Amazon, etc.), ads, popups after page loads (news letter signups, etc) then it should probably be down ranked.I have actually been developing something like that, but it does more, including down ranking certain categories of sites that contain unnecessary filler, such as some recipe sites.

评论 #38507693 未加载

评论 #38508339 未加载

nemoniacover 1 year ago

It gives plausible results for websites similar to HN.<a href="https://explore2.marginalia.nu/search?domain=news.ycombinator.com" rel="nofollow noreferrer">https://explore2.marginalia.nu/search?domain=news.ycombinato...</a>

buildbotover 1 year ago

Aww, sadly nothing for my own websites!This is such a great idea, often when I find a small blog or site I want more of it! This is the perfect tool to discover that. It’s a clear and straightforward idea in retrospect, as all really great ideas tend to be!

评论 #38508428 未加载

solardevover 1 year ago

What are the details? The sample page just 404s

评论 #38507800 未加载

评论 #38505401 未加载

评论 #38505389 未加载

kazinatorover 1 year ago

Concludes that a certain www.example.com is 42% similar to example.com, when they are exactly the same: one redirects to the other.The only thing different is the domain names, and those character strings themselves are more than 42% similar.

评论 #38525382 未加载

renegat0x0over 1 year ago

I have searches the github repo for information for page ranking.I am newbie in SEO. I would grately appreciate if marginalia provided clean readme about it, about their algorithm.At marginalia search front page we have access to search keywords, page algorithm is important enough to be at least discussed on layman terms.How to optimize page, so it could have a high ranking?I undestand this could be in the code documentation, but I have not yet checked it, sorry.

评论 #38506808 未加载

评论 #38507165 未加载

评论 #38506797 未加载

derelictaover 1 year ago

surprisingly effective to discover new mastodon or pleroma instances!

kgbciaover 1 year ago

Explore sample data 404

评论 #38507806 未加载

lowkey_over 1 year ago

Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."This could be helpful in the short-term, but I'm skeptical long-term as it'll become just as gamed.

评论 #38511919 未加载

zanethomasover 1 year ago

Isn't that Google's original algorithm?

评论 #38509429 未加载

15 comments

jartover 1 year ago

评论 #38507188 未加载

评论 #38507061 未加载

marginalia_nuover 1 year ago

评论 #38508084 未加载

评论 #38542297 未加载

robbomacraeover 1 year ago

janalsncmover 1 year ago

评论 #38507010 未加载

评论 #38506587 未加载

jakearmitageover 1 year ago

I love the random page: <a href="https://search.marginalia.nu/explore/random" rel="nofollow noreferrer">https://search.marginalia.nu/explore/random</a>This makes me feel in the old open web again.

eek2121over 1 year ago

评论 #38507693 未加载

评论 #38508339 未加载

nemoniacover 1 year ago

buildbotover 1 year ago

评论 #38508428 未加载

solardevover 1 year ago

What are the details? The sample page just 404s

评论 #38507800 未加载

评论 #38505401 未加载

评论 #38505389 未加载

kazinatorover 1 year ago

评论 #38525382 未加载

renegat0x0over 1 year ago

评论 #38506808 未加载

评论 #38507165 未加载

评论 #38506797 未加载

derelictaover 1 year ago

surprisingly effective to discover new mastodon or pleroma instances!

kgbciaover 1 year ago

Explore sample data 404

评论 #38507806 未加载

lowkey_over 1 year ago

Goodhart's law: "When a measure becomes a target, it ceases to be a good measure."This could be helpful in the short-term, but I'm skeptical long-term as it'll become just as gamed.

评论 #38511919 未加载

zanethomasover 1 year ago

Isn't that Google's original algorithm?

评论 #38509429 未加载

A new approach to domain ranking

15 comments

A new approach to domain ranking

15 comments