For self-directed learning, my favorite has been actually using ChatGPT (GPT4, especailly), because you can just ask as you go along. Some questions I asked:<p><pre><code> I have a pytorch ml llm gpt-style model and it has many layers, called "attention" and "feed forward". Can you explain to somehow who is highly technical, understands software engineering, but isn't deeply familiar with ML terms or linear algebra what these layers are for?
Where I can get all the jargon for this AI/ML stuff? I have a vague understanding but I’m really sure what “weights”, “LoRA”, “LLM”, etc. are to really understand where each tool and concept fit in. Explain to a knowledgeable software engineer with limited context on ML and linear algebra.
In gpt/llm world, what's pre-training vs fine-tuning?
in huggingface transformers what's the difference bertween batch size vs microbatch size when training?
how do I free cuda memory after training using huggingface transformers?
I'm launching gradio with `demo.queue().launch()` from `main.py`. How can I allow passing command line arguments for port and share=True?
Comment each of these arguments with explanations of what they do (this is huggingface transformers)
args=transformers.TrainingArguments(
per_device_train_batch_size=micro_batch_size,
gradient_accumulation_steps=gradient_accumulation_steps,
warmup_steps=100, #
max_steps=max_steps,
num_train_epochs=epochs,
learning_rate=learning_rate,
fp16=True,
logging_steps=20,
output_dir=output_dir,
save_total_limit=3,
),
In the context of ML, can you explain with examples what "LoRA", "LLM", "weights" are in relation to machine learning, specifically gpt-style language models?</code></pre>