TE
TechEcho
Home24h TopNewestBestAskShowJobs
GitHubTwitter
Home

TechEcho

A tech news platform built with Next.js, providing global tech news and discussions.

GitHubTwitter

Home

HomeNewestBestAskShowJobs

Resources

HackerNews APIOriginal HackerNewsNext.js

© 2025 TechEcho. All rights reserved.

Show HN: I hacked LLMs to work like scikit-learn

4 pointsby galgia4 months ago
Working with LLMs in existing pipelines can often be bloated, complex, and slow. That&#x27;s why I created <i>FlashLearn</i>, a streamlined library that mirrors the user experience of scikit-learn. It follows a pipeline-like structure allowing you to &quot;fit&quot; (learn) skills from sample data or instructions, and &quot;predict&quot; (apply) these skills to new data, returning structured results.<p><i>High-Level Concept Flow:</i><p>Your Data --&gt; Load Skill &#x2F; Learn Skill --&gt; Create Tasks --&gt; Run Tasks --&gt; Structured Results --&gt; Downstream Steps<p>Installation:<p><pre><code> pip install flashlearn </code></pre> <i>Learning a New &quot;Skill&quot; from Sample Data</i><p>Just like a fit&#x2F;predict pattern in scikit-learn, you can quickly &quot;learn&quot; a custom skill from minimal (or no!) data. Here&#x27;s an example where we create a skill to evaluate the likelihood of purchasing a product based on user comments:<p><pre><code> from flashlearn.skills.learn_skill import LearnSkill from flashlearn.client import OpenAI # Instantiate your pipeline &quot;estimator&quot; or &quot;transformer&quot;, similar to a scikit-learn model learner = LearnSkill(model_name=&quot;gpt-4o-mini&quot;, client=OpenAI()) data = [ {&quot;comment_text&quot;: &quot;I love this product, it&#x27;s everything I wanted!&quot;}, {&quot;comment_text&quot;: &quot;Not impressed... wouldn&#x27;t consider buying this.&quot;}, # ... ] # Provide instructions and sample data for the new skill skill = learner.learn_skill( data, task=( &quot;Evaluate how likely the user is to buy my product based on the sentiment in their comment, &quot; &quot;return an integer 1-100 on key &#x27;likely_to_buy&#x27;, &quot; &quot;and a short explanation on key &#x27;reason&#x27;.&quot; ), ) # Save skill to use in pipelines</code></pre> skill.save(&quot;evaluate_buy_comments_skill.json&quot;)<p><i>Input Is a List of Dictionaries</i><p>Simply wrap each record into a dictionary, much like feature dictionaries in typical ML workflows:<p><pre><code> user_inputs = [ {&quot;comment_text&quot;: &quot;I love this product, it&#x27;s everything I wanted!&quot;}, {&quot;comment_text&quot;: &quot;Not impressed... wouldn&#x27;t consider buying this.&quot;}, # ... ] </code></pre> <i>Run in 3 Lines of Code - Concurrency Built-in up to 1000 calls&#x2F;min</i><p><pre><code> # Suppose we previously saved a learned skill to &quot;evaluate_buy_comments_skill.json&quot;. skill = GeneralSkill.load_skill(&quot;evaluate_buy_comments_skill.json&quot;) tasks = skill.create_tasks(user_inputs) results = skill.run_tasks_in_parallel(tasks) print(results) </code></pre> <i>Get Structured Results</i><p>Here&#x27;s an example of structured outputs mapped to indexes of your original list:<p><pre><code> { &quot;0&quot;: { &quot;likely_to_buy&quot;: 90, &quot;reason&quot;: &quot;Comment shows strong enthusiasm and positive sentiment.&quot; }, &quot;1&quot;: { &quot;likely_to_buy&quot;: 25, &quot;reason&quot;: &quot;Expressed disappointment and reluctance to purchase.&quot; } } </code></pre> <i>Pass on to the Next Steps</i><p>You can use each record’s output for downstream tasks such as storing results in a database or filtering high-likelihood leads:<p><pre><code> # Suppose &#x27;flash_results&#x27; is the dictionary with structured LLM outputs for idx, result in flash_results.items(): desired_score = result[&quot;likely_to_buy&quot;] reason_text = result[&quot;reason&quot;] # Now do something with the score and reason, e.g., store in DB or pass to next step print(f&quot;Comment #{idx} =&gt; Score: {desired_score}, Reason: {reason_text}&quot;)</code></pre>

no comments

no comments