Google rejected me and now I'm building a search engine

73 点作者 daoudc11 个月前

21 条评论

kevmo31411 个月前

I'm sure the post's author doesn't need interview advice anymore but in case there are any prospective interview candidates out there, completely freezing during an interview is a super negative signal. Even if you need to manually multiply out 2's on a whiteboard it would be more productive than saying "I don't know".In my experience the only reason you should say "I don't know" is if you're going to follow it with "but if I had to guess" or similar. Sounds like the interviewer definitely came on strong but being able to ace the psychological part of an interview is often as important or more important than the actual solution.

评论 #40858360 未加载

评论 #40858513 未加载

评论 #40859672 未加载

评论 #40858417 未加载

评论 #40858427 未加载

评论 #40858363 未加载

评论 #40858429 未加载

评论 #40858470 未加载

评论 #40858611 未加载

评论 #40858351 未加载

评论 #40861787 未加载

评论 #40858439 未加载

评论 #40858450 未加载

crazygringo11 个月前

This user already submitted this same article yesterday and it was flagged:<a href="https://news.ycombinator.com/item?id=40850725">https://news.ycombinator.com/item?id=40850725</a>Rather than this clickbaity "Google rejected me" story about something that happened 15 years ago, here's a link to the actual project:<a href="https://github.com/mwmbl/mwmbl">https://github.com/mwmbl/mwmbl</a>

评论 #40858525 未加载

评论 #40858459 未加载

harles11 个月前

> He continued to ask more questions about numbers of bits. I couldn’t answer any of them without a lot of help. He didn’t ask me about my PhD work building a new theory of natural language semantics.This strikes me as fairly petty “I didn’t answer wrong, you asked me the wrong questions!”. Honestly it’s the recruiting process working as intended - folks with this type of attitude don’t make good team members in my experience.Also > At the time “Don’t be evil” still meant something. Now it seems like their mantra is just “Be evil”.Seems really petty. It’s a shame because we could good big tech alternatives, but building something out of spite without much perspective is unlikely to create a good alternative.

评论 #40858542 未加载

评论 #40858551 未加载

评论 #40858608 未加载

评论 #40858426 未加载

评论 #40858578 未加载

评论 #40858489 未加载

评论 #40858573 未加载

评论 #40858453 未加载

Imnimo11 个月前

I like this interview question. It's perfectly solvable without a calculator as the interviewer said. It doesn't rely on having memorized some weird binary tree inversion algorithm. It tests the ability to take facts that you already know (e.g. 2^8 or 2^10) and use them to solve a problem that might appear out of reach at first glance.

评论 #40860078 未加载

daemonologist11 个月前

Page appears to have been taken down, but is available on archive.org: <a href="https://web.archive.org/web/20240702162540/https://daoudclarke.net/search%20engines/2024/07/02/google-rejected-me" rel="nofollow">https://web.archive.org/web/20240702162540/https://daoudclar...</a>

评论 #40859931 未加载

philipwhiuk11 个月前

> you who ranks the search resultsThe actual link from it says the rankings are, like everywhere else:> To train a learning to rank model. No matter how many queries are manually curated, most user queries will be organic because of the natural diversity of user queries. Curation is still important for these results since it impacts the machine learning model that will be trained on the curated rankings.so this not true in the long term.

评论 #40859516 未加载

dmitrygr11 个月前

Assuming the quotes are accurate, interviewer was indeed being a bit of a dick, but being able to tell approx how many bits a number needs is something I'd expect any programmer to be able to do, and I would also give negative feedback to someone who could not do that in an interview.

foota11 个月前

It's at the bottom of the article, but note that this interview experience is from 15 years ago.

eterm11 个月前

It wasn't google, but last year I had the worst interview experience of my life when I was berated for not being able to remember if a System.Tick was 10nanoseconds or 100nanoseconds.I remarked that in the circumstances I'd need to know, that I'd google it and check the documentation to make sure I got it right.The interviewer (who I later found out was the founder/CEO) absolutely laid into me for that answer, saying if he wanted people to google that a "thousand Indians graduating in computer science every day" could google it.I tried to argue that I was looking to be employed for my problem solving skills and experience rather than rote knowledge, but he was really angry. He literally said to be verbatim, "Let me give you some interview advice, NEVER tell an interviewer you'd google something". He also made a mildly off-colour remark that if he "wanted someone just to google, [he] could hire one of thousands of fresh graduates coming out of India".It was an experience so bad that it inspired me to create a glassdoor account just to leave negative feedback, something I've never done before or since. The recruiter was absolutely pissed, and still doesn't provide me leads, which is kind of annoying since he's the most active C#/.Net recruiter in my area.But my point is that some people have absoultely atrocious interview manners. Interviews are a two-way street and I discovered that there was absoultely no way I'd want to work with them. (Even when I just thought they were a team lead rather than the CEO it was enough to put me off.)

评论 #40858440 未加载

评论 #40858662 未加载

评论 #40859136 未加载

评论 #40858887 未加载

评论 #40858481 未加载

评论 #40858880 未加载

评论 #40858516 未加载

评论 #40858397 未加载

Lockal11 个月前

Not sure how "need a few more to get 56, well 6 would be enough. So 26 bits?" is a solution.If he remembers that max signed int is ~2 billion, than easier to divide 4 billion by 2. 2b/1b/500m/250m/127m/64m - got 6 divisions, 32-6=26.If you think that max int is irrelevant to the position - it is so relevant, I can't even describe, this number is everywhere, from database design to js-wasm (limited by 32-bit), from deep-learning (where some libraries still limited to 32-bit buffers) to networking (hello ipv4)

financltravsty11 个月前

Search engines are dying. Information retrieval and recommendation engines are still mostly living in the dark ages from all the work that's been done in the last 50 years.Figure that problem out first (something novel and useful), then start marketing yourself.Right now you just gave us a story we've all lived (academic hazing) without any plan of action -- so 2010.

1vuio0pswjnm711 个月前

When I try to use the provided URL, I get this:<pre><code> Sorry this page does not exist =( </code></pre> Alternative:<a href="https://cc.bingj.com/cache.aspx?d=4652446581392&w=-V-8V9bl07F3JL04ZjrptOMD-qpI3ecz" rel="nofollow">https://cc.bingj.com/cache.aspx?d=4652446581392&w=-V-8V9bl07...</a>

yashasolutions11 个月前

Competition is good. We need diverse search product again.Kagi is great but more options would be good too.OP's product is clearly at a very early stage. OP's post is also pretty opinionated.Hard to say which impact on product it will have - but as long we have more options for search engines, this will be one out of many options.

jerryjose11 个月前

After giving about 3000 job opportunities world wide , our agency is still giving out Jobs and Business loans worldwide.if you need a job or financial aid kindly contact us now via email : shalomagency247@outlook.comThanks.

swyx11 个月前

ah, Spite, the ultimate developer fuel.

评论 #40858568 未加载

评论 #40858456 未加载

daoudc11 个月前

I took the page down as it was attracting the wrong sort of attention. As some commenters surmised, the goal was to promote the search engine, but it wasn't working out that way...

joatmon-snoo11 个月前

It's really easy to read this as "shitty interviewer runs off good candidate".It's also easy to read this as "interviewer hand-held a candidate through a problem".

评论 #40858673 未加载

bko11 个月前

Whenever I hear about alternative search engines, I try out a few famous people hoping to see Wikipedia entries towards the top. And almost always I see nonsense.For instance, if you search for 'Trump', the top links are```1. <a href="http://www.trump.de" rel="nofollow">http://www.trump.de</a> — found via Mwmbl -- Trump2. <a href="https://itep.org/md/" rel="nofollow">https://itep.org/md/</a> — found via Mwmbl -- Trump Tax Proposals Would Provide Richest One Percent in Maryland with 69.7 Percent of the State’s Tax Cuts Earlier this year, the Trump administration r…3. <a href="https://is.gd/mUHYTg" rel="nofollow">https://is.gd/mUHYTg</a> — found via Mwmbl --- Trump embraces QAnon conspiracy because ‘they like me’ After skirting the issue for weeks, President Donald Trump offered an embrace Wednesday of the fri…4. <a href="http://dict.cn/trump" rel="nofollow">http://dict.cn/trump</a> — found via Mwmbl -- trump是什么意思_trump在线翻译_英语_读音_用法_例句_海词词典```Surely there are millions of results more relevant to the phrase 'Trump' than trump.de. The other links aren't better. A random article from 2017? Another one from 2020. A Chinese dictionary definition of 'Trump'?I get that search is hard, but what's going on here? You can try any phrase, and you just get weird results.

评论 #40858543 未加载

评论 #40859545 未加载

bentobean11 个月前

I sympathize with some of what the author has to say. That said, Google's choice to do business with Israel does not represent "support for genocide." It is also within their prerogative to dismiss employees who protest company policy.Naive / biased statements such of these cause me to lend less credence to author's other points.

评论 #40858927 未加载

评论 #40858483 未加载

derefr11 个月前

> It’s you who chooses what sites we crawlYeah, but you still reserve the right to not crawl sites (or to remove them from your index), yes? So there's still the opportunity to do evil.I'm still waiting for a "raw" search spidering provider. One that:1. runs a web-spidering cluster — one that's only smart enough to know what robots.txt is, to know how to follow links in HTML pages, and to obey response caching-policy headers;2. captures the spidering process losslessly, as e.g. HAR transcript files;3. packs those HAR transcript files, a few million at a time, into tar.xz.tar files (i.e. grab a "chunk" of N HAR files; group them into subdirs by request Host header; archive each subdir, and compress those archives independently; then archive all the compressed archives without compression) — and then uploads these semi-random-access archives to a CDN or private BitTorrent tracker (or any other data delivery system that enables clients to only retrieve the blocks/byte-ranges of files they're interested in);4. generate a TOC for the semi-random-access files, as a stream of tuples (signed archive URL, chunk byte-range, hostname, compressed URL-list); push these to a managed reliable message queue on an IaaS, publishing each entry to both an all-hostnames topic, and a per-hostname topic. (I say an IaaS, as this allows consumers to set up their own consumer-groups on these topics within their own IaaS project, and then pay the costs of message retention in these consumer-groups themselves.)5. Also buffer these TOC-entry streams into files (e.g. Parquet files), one archive series per topic; and host these alongside the HAR archives. Prune TOC topic stream entries if (entries are at least N days old AND the entries have been successfully "offlined" into a hosted TOC-stream archive.)---This "web-spidering-firehose data-lake as-a-Service" architecture, would enable pretty much anyone to build whatever arbitrary search index they want downstream of it, containing as much or as little of the web as they want — where each consumer only needs to do as much work as is required to fetch and parse the HARs of the domains they've decided they care about indexing something under.This architecture would also be "temporal" (akin to a temporal RDBMS table) — as a consumer of this service, you wouldn't see "the current version" of a scraped URL, but rather all previous attempts to scrape that URL, and what happened each time. (This would mean that no website could ever censor the dataset retroactively by adding a robots.txt "Disallow *" after scrapes have already happened. Their robots.txt config would prevent further scraping, but previous scraping would be retained.)And in fact, in this architecture, the HTTP interaction to retrieve /robots.txt for a domain, would produce a HAR transcript that would get archived like any other. Domains restricted from crawling by robots.txt, would still get regular HAR transcripts recorded of the result of checking that their /robots.txt still restricts crawling. (Reducing over these /robots.txt HAR transcripts is how a consumer-indexer would determine whether they should currently be showing/hiding a domain in their built index.)

评论 #40860647 未加载

评论 #40859608 未加载

roschdal11 个月前

Good luck competing with the Alphabet monopoly. See Peter Thiel books on the monopoly.

评论 #40858435 未加载

评论 #40858335 未加载

评论 #40858350 未加载