TechEcho

4 comments

> For me, it took eight re-reads of Raschka's (emininently clear and readable) explanation to get to a level where I felt I understood it.It’s interesting to observe in oneself how repetition can result in internalizing new concepts. It is less about rote memorization but more about becoming aware of nuance and letting our minds “see” things from different angles, integrating it with existing world models either through augmentation, replacement, or adjustment. Similar for practicing activities that require some form of motor ability.Some concepts are internalized less explicitly, like when we “learn” through role-modeling behaviors or feedback loops through interaction with people, objects, and ideas (like how to fit into a society).

评论 #43264942 未加载

评论 #43264087 未加载

评论 #43266672 未加载

评论 #43264249 未加载

评论 #43264513 未加载

评论 #43264976 未加载

penguin_booze2 months ago

One sees from scratch, and then one also sees<pre><code> from fancy_module import magic_functions </code></pre> I'm semi-serious here, of course. To me, for something to be called 'from scratch', requisite knowledge should be built ground up. To wit, I'd want to write the tokenizer myself but don't want to derive laws of quantum physics that makes the computation happen.

评论 #43274172 未加载

评论 #43268230 未加载

评论 #43267469 未加载

评论 #43267085 未加载

评论 #43266731 未加载

kureikain2 months ago

In case if anyone want to read this book and live in bay areas, you can also access Oreilly media through your local library online and will be granted access to orelly media. this book is available there.

ForOldHack2 months ago

Part 8? Wait... Is this a story that wrote itself?1) I am kidding. 2) At what point does it become self replicating? 3) skynet. 4) kidding - not kidding.

4 comments

andsoitis2 months ago

评论 #43264942 未加载

评论 #43264087 未加载

评论 #43266672 未加载

评论 #43264249 未加载

评论 #43264513 未加载

评论 #43264976 未加载

penguin_booze2 months ago

评论 #43274172 未加载

评论 #43268230 未加载

评论 #43267469 未加载

评论 #43267085 未加载

评论 #43266731 未加载

kureikain2 months ago

ForOldHack2 months ago

Part 8? Wait... Is this a story that wrote itself?1) I am kidding. 2) At what point does it become self replicating? 3) skynet. 4) kidding - not kidding.

Writing an LLM from scratch, part 8 – trainable self-attention

4 comments

Writing an LLM from scratch, part 8 – trainable self-attention

4 comments