I am a proficient backend/systems developer who wants to learn about AI (pardon the generic term, I don’t know what I don’t know).<p>I don’t know anything about it, and my goal would be to start a learning process that would lead me to a fairly deep understanding of how something like ChatGPT works.<p>To make an analogy, I am not a kernel developer, but I am experienced enough in C and theory of operating systems/hardware that I would be able to jump into any Linux kernel subsystem code, for example the virtual memory management, and understand its inner working within a few days of studying it.<p>I would like to get to this same level of comfort with AI.<p>What would be the recommended studying material to get there? Is there something that starts from first principles, covering the math behind it and such? I have a master degree in computer engineering so I have done my fair share of courses in linear algebra/calculus/statistics in the past, but a refresher custom tailored to AI would be good as well.<p>Thanks
Hi, without intentions to attack you. Have you used google? It's easy just to ask big poppa, but a CS degree should also have given you the ability to research a topic without any help. Google will give you a lot of sources.<p>At first, I would try to find the paper (googs engineers published) which is entitled "attention is all you need".
Then, try to search with goog on Twitter or directly on YouTube "build chatGPT from scratch". There are quite a lot of videos made and especially a longish video Made by andrej karpatov (Tesla's AI guy).
So here you'll get how this works and get the tokenizing explained. From the scratch.<p>And that is your start. Then I would go to hugging face and check what's happening there. But yeah.. there is a HN post for a "search engine for AI related things" that has been posted no longer than two weeks ago. I can't find it anymore, but, there all AI news are collected from several sources. Also papers and tweets and and and...<p>Upd: found it<p><a href="https://news.mioses.com/" rel="nofollow">https://news.mioses.com/</a>
Well you can start from the highest level, and get LLaMA or Stable Diffusion running locally. Depending on your hardware, llama.cpp (or a GPU llama repo if you have a big GPU) and HuggingFace diffusers's StableDiffusionPipeline are good barebones systems to dissect.