I used GPT to build a search tool for my second brain note-taking system

291 pointsby abbabonover 2 years ago

18 comments

These "augmented intelligence" applications are so exciting to me. I'm not as interested in autonomous artificial intelligence. Computers are tools to make my life easier, not meant to lead their own lives!There's a big up-front cost of building a notes database for this application, but it illustrates the point nicely: encode a bunch of data ("memories"), and use an AI like GPT to retrieve information ("remembering"). It's not a fundamentally different process from what we do already, but it replaces the need for me to spend time on an automatable task.I'm excited to see what humans spend our time doing once we've offloaded the boring dirty work to AIs.

评论 #34685536 未加载

110over 2 years ago

In case folks are interested in trying it out, I just released the Obsidian plugin[1] for Khoj (<a href="https://github.com/debanjum/khoj#readme">https://github.com/debanjum/khoj#readme</a>) last week.It creates a natural language search assistant for your second brain. Search is incremental and fast. You notes stay local to your machine.There's also a (beta) chat API that allows you to chat with your notes[2]. But that uses GPT, so notes are shared with OpenAI if you decide to try that.It is not ready for prime time yet but maybe something to check out for folks who are willing to be beta testers. See the announcement on reddit for more details[3]Edit: Forgot to add that khoj works with Emacs, Org-mode as well[4][1]: <a href="https://obsidian.md/plugins?id=khoj" rel="nofollow">https://obsidian.md/plugins?id=khoj</a>[2]: <a href="https://github.com/debanjum/khoj#chat-with-notes">https://github.com/debanjum/khoj#chat-with-notes</a>[3]: <a href="https://www.reddit.com/r/ObsidianMD/comments/10thrpl/khoj_an_ai_search_assistant_for_your_second_brain/?utm_source=share&utm_medium=web2x&context=3" rel="nofollow">https://www.reddit.com/r/ObsidianMD/comments/10thrpl/khoj_an...</a>[4]: <a href="https://github.com/debanjum/khoj/tree/master/src/interface/emacs#readme">https://github.com/debanjum/khoj/tree/master/src/interface/e...</a>

评论 #34689962 未加载

rolenthedeepover 2 years ago

One of my biggest dreams is a self-hosted AI that always listens through my phone and automatically takes notes, puts events in my calendar, set reminders, and template journal entries. A true personal assistant to keep my increasingly-complex life in order.I'd love a system where I can just point a search engine at my brain. I tried really hard for a while, but I just didn't have the discipline or memory to exhaustively document everything.An AI that can do this kind of thing in the background would be an absolute godsend for ADHD and ASD people.

评论 #34690633 未加载

评论 #34686065 未加载

评论 #34693558 未加载

评论 #34690045 未加载

leobgover 2 years ago

Slight overkill to use GPT, though it works for the author and I can see that it’s the low hanging fruit, being available as an API. But this can also be done locally, using SBERT, or even (faster, though less powerful) fastText.Also, it’s helpful not to cut paragraphs into separate pieces, but rather to use a sliding window approach, where each paragraph retains the context of what came before, and/or the breadcrumbs of its parent headlines.

评论 #34686922 未加载

评论 #34685511 未加载

sowbugover 2 years ago

I wonder whether your individually trained chat bot will be allowed to assert the Fifth Amendment right against self-incrimination to stop it from talking when the police interview it. And if it's allowed, do you or it decide whether whether to assert it? What if the two of you disagree?Similar questions for civil trials, divorce proceedings, child custody....

评论 #34685762 未加载

评论 #34684891 未加载

评论 #34684901 未加载

评论 #34684968 未加载

评论 #34684077 未加载

PaulHouleover 2 years ago

Would be nice to see some indication of how well it works in his case.I worked on a ‘Semantic Search’ product almost 10 years ago that used a neural network to do dimensional reduction and had inputs to the scoring function from the ‘gist vector’ and the residual word vector which was possible to calculate in that case because the gist vector was derived from the word vector and the transform was reversible.I’ve seen papers in the literature which come to the same conclusion about what it takes to get good similarity results w/ older models as a significant amount of the meaning in text is in pointy words that might not be included in the gist vector, maybe you do better with an LLM since the vocabulary is huge.

评论 #34682925 未加载

trane_projectover 2 years ago

I've been thinking of using GPT or similar LLMs to extract flashcards to use with my spaced repetition project (<a href="https://github.com/trane-project/trane/">https://github.com/trane-project/trane/</a>). As in you give it a book and it creates the flashcards for you and the dependencies between the lessons.I played around with chatgpt and it worked pretty well. I have a lot of other things in my plate to get around first (including starting a math curriculum) but it's definitely an exciting direction.I think LLMs and AI are not anywhere near actual intelligence (chatgpt can spout a lot of good sounding nonsense ATM), but the semantic analysis they can do is by itself very useful.

评论 #34686403 未加载

评论 #34689642 未加载

tra3over 2 years ago

This is fascinating.Can I train it on 5 years of stream of consciousness morning brain dumps and then say "write blah as me"?Before I do that, I'd love to know if training data becomes part of the global knowledge base available to everyone..

评论 #34684180 未加载

评论 #34684039 未加载

评论 #34684286 未加载

asdffover 2 years ago

Did the author show how this system outputted results? I see an example of a lexical search and the technical implementation, but no example of some semantic output showing how its relevant to the lexical search string without containing that string. The author used the literal search string "failure mode" as their example. I was wondering if chatgpt would bring up results relevant to the lay person interpretation of failure mode, a technical interpretation, or something in between.

danweeover 2 years ago

Umm, the only thing that stops me from doing this is uploading my notes to OpenAIs' servers.

评论 #34690701 未加载

FiberBundleover 2 years ago

Does anybody know how search engines apply semantic search with embeddings? To my knowledge no practical algorithms exist that find nearest neighbors in high dimensional space (such as that in which word/sentence/document vectors are embedded in), so those wouldn't give you any benefit compared to an iterative similarity search as applied here. Which obviously is totally impractical for real search engines. There are approximate nearest neighbor algorithms such as Locality-sensitive hashing, but even they seem impractical for real world usage on the scale of the indexes that search engines use. So how can Google e.g. make this work?

评论 #34693631 未加载

whatever1over 2 years ago

So if you keep adding notes furiously every day for years, do you asymptotically get your consciousness on—a-chip?

abrknover 2 years ago

I’d love to have a ChatGPT that was also trained on all of the pages from my “second brain,” Roam Research.Imagine, I could ask it questions about myself, my friends, and my business. It would in many ways know me better than me from reading all my journal entries.How many years are we away from something like this?

评论 #34686233 未加载

qwerty456127over 2 years ago

I don't imply any judgment (like good/bad) but I tend to suspect the major (not necessarily intentional) reason of all these "second brains" (I use too) to exist in the grand scheme of things is to be a high-quality input for AIs to learn.

评论 #34702106 未加载

articsputnikover 2 years ago

Wow, this was super interesting as someone using a Second Brain daily. Thank you so much for digging into it, putting in the work, and sharing with us all! Much appreciated. I will follow you for more.I am much excited to do more with my Second Brain, but one concern, as you point out, is to use chatGPT or similar; we'd need to upload all our private and sometimes sensitive notes, which is a no go for me. So happy that you do everything locally. I wonder what the equivalent would be to train the model to search and ask questions based on our second brain (plus the already trained information). That's also where Obisidan will win in the long run, as other tools do not have the data locally. Obviously, it's already in the cloud; they could train on them, but training on customer-sensitive data would be a big problem. Something I will follow closely.

college_physicsover 2 years ago

Desktop search feels like it has stagnated for at least a decade. Yet its an obvious way to both enhance privacy, improve relevance and even open up entirely new capabilities

totetsuover 2 years ago

I think letting a language model make an outline of a topic you want to make notes on and writing in the details might not be such a bad thing.

lukemtxover 2 years ago

I wanted to do this! :D