Working with LLMs in existing pipelines can often be bloated, complex, and slow. That's why I created <i>FlashLearn</i>, a streamlined library that mirrors the user experience of scikit-learn. It follows a pipeline-like structure allowing you to "fit" (learn) skills from sample data or instructions, and "predict" (apply) these skills to new data, returning structured results.<p><i>High-Level Concept Flow:</i><p>Your Data --> Load Skill / Learn Skill --> Create Tasks --> Run Tasks --> Structured Results --> Downstream Steps<p>Installation:<p><pre><code> pip install flashlearn
</code></pre>
<i>Learning a New "Skill" from Sample Data</i><p>Just like a fit/predict pattern in scikit-learn, you can quickly "learn" a custom skill from minimal (or no!) data. Here's an example where we create a skill to evaluate the likelihood of purchasing a product based on user comments:<p><pre><code> from flashlearn.skills.learn_skill import LearnSkill
from flashlearn.client import OpenAI
# Instantiate your pipeline "estimator" or "transformer", similar to a scikit-learn model
learner = LearnSkill(model_name="gpt-4o-mini", client=OpenAI())
data = [
{"comment_text": "I love this product, it's everything I wanted!"},
{"comment_text": "Not impressed... wouldn't consider buying this."},
# ...
]
# Provide instructions and sample data for the new skill
skill = learner.learn_skill(
data,
task=(
"Evaluate how likely the user is to buy my product based on the sentiment in their comment, "
"return an integer 1-100 on key 'likely_to_buy', "
"and a short explanation on key 'reason'."
),
)
# Save skill to use in pipelines</code></pre>
skill.save("evaluate_buy_comments_skill.json")<p><i>Input Is a List of Dictionaries</i><p>Simply wrap each record into a dictionary, much like feature dictionaries in typical ML workflows:<p><pre><code> user_inputs = [
{"comment_text": "I love this product, it's everything I wanted!"},
{"comment_text": "Not impressed... wouldn't consider buying this."},
# ...
]
</code></pre>
<i>Run in 3 Lines of Code - Concurrency Built-in up to 1000 calls/min</i><p><pre><code> # Suppose we previously saved a learned skill to "evaluate_buy_comments_skill.json".
skill = GeneralSkill.load_skill("evaluate_buy_comments_skill.json")
tasks = skill.create_tasks(user_inputs)
results = skill.run_tasks_in_parallel(tasks)
print(results)
</code></pre>
<i>Get Structured Results</i><p>Here's an example of structured outputs mapped to indexes of your original list:<p><pre><code> {
"0": {
"likely_to_buy": 90,
"reason": "Comment shows strong enthusiasm and positive sentiment."
},
"1": {
"likely_to_buy": 25,
"reason": "Expressed disappointment and reluctance to purchase."
}
}
</code></pre>
<i>Pass on to the Next Steps</i><p>You can use each record’s output for downstream tasks such as storing results in a database or filtering high-likelihood leads:<p><pre><code> # Suppose 'flash_results' is the dictionary with structured LLM outputs
for idx, result in flash_results.items():
desired_score = result["likely_to_buy"]
reason_text = result["reason"]
# Now do something with the score and reason, e.g., store in DB or pass to next step
print(f"Comment #{idx} => Score: {desired_score}, Reason: {reason_text}")</code></pre>