I highly useful tool in my household for dealing with the SEO/tracking scourge that recipe blogs have become is <a href="https://www.paprikaapp.com/" rel="nofollow">https://www.paprikaapp.com/</a>.<p>Hoping someday to have some spare time to integrate this with <a href="https://grocy.info/" rel="nofollow">https://grocy.info/</a> and have a pipeline for recipe -> preparation automation.
I'm surprised nobody has mentioned "Recipe Filter" <a href="https://addons.mozilla.org/en-CA/firefox/addon/recipe-filter/" rel="nofollow">https://addons.mozilla.org/en-CA/firefox/addon/recipe-filter...</a><p>Cuts the fluff and puts the recipe front and center. I wouldn't be able to find recipes online without this.
Paprika 3 (I use the iOS version, but I believe the Mac version has the same function) has a fantastic web scraper for recipes. I've had to correct maybe 1-2 errors across 100 recipes I've brought in from a bunch of different sites. It's super helpful to look through them in a standardized way (and you can sort by ingredient/category) to figure out what to make.
But how will I read about "Dakota", an avid yoga enthusiast who just happens to be a mom, who enjoys making healthy and savory meals for her family while blogging?<p>Seriously, I hope this spells an end to the Google ranking imposed nonsense that makes the simple act of searching for a recipe so insufferable.
Interesting! I wrote <a href="https://plainoldrecipe.com" rel="nofollow">https://plainoldrecipe.com</a> (open source!) to solve this, an inadvertently discovered many of the metadata tags described here.<p>The irony is that the content is required for SEO purposes, but once you’ve landed on the page you don’t want to see it. I wonder if there would be a way to write SEO that only the google bot sees and hide it from humans...
Are there any legal issues with scraping recipe sites in a commercial app like that?<p>I'm assuming ingredients and directions are "facts" so can't be copyrighted, but what about the pictures?
The simple truth is that the core recipes are fact-based and non-copyrightable, and the 1000-word blogspam recipe header is both copyrightable and garners better search result rankings.<p>So the business model is to take facts from the public domain, wrap it in bullshit prose, and then SEO the bullshit to have higher ranking than the naked source facts, for more unique visitors and ad revenue.<p>Making comments about "providing recipes for free" are exactly as useful as comments about "providing phone numbers for free" or "providing mailing addresses for free" or "providing the original text of 'Little Women' for free" or "providing the steps of the long division algorithm for free".<p>Obfuscating the public domain is not a valuable service. Automatically removing the obfuscation is valuable. A "Project Gutenberg" style repository of recipes would be recurringly donation-worthy.
This could also be useful for websites that do not print well. I have run into a few occasions where adds and other website elements printed with the actual recipe. The result was a small recipe divided on several pages mostly covered with other content. There were pictures and text formatting that I could not copy out. Often for stuff like that I just pull the HTML and edit until it prints well but I would rather have an easier way.
Here's the question... why is it so difficult to do this in Android?<p>Seriously, AndroidDriver for Selenium was last updated 2013... and importing it throws an HttpClient error now. Update that client and you get a class duplication hell that is impossible to exit.<p>All I needed was to interact with 2-3 fields on a webpage but it's been eight hours and now I hate my life.
Cool, now the next interesting step would be to categorize recipes, maybe some kind of clustering algorithm, to see how similar they are and whether they have a common ancestor.<p>When I look at a recipe and notice some unusual proportions I usually check against Joy of Cooking or some other standard book. I've noticed that often everything old is new again.
This is great! Its a wonderful write-up.<p>I've also made something almost identical - a Go library for recipes scrapers for ingredients [1] and instructions [2]. Instead of the LCA method here, in my version I try to find the longest sequence of highest scoring HTML tags and those are "ingredients" or "instructions". It works very well (although I think this one works better).<p>Like the article mentioned, I found that the heuristics for finding HTML elements with ingredients turn out to be surprisingly simple - they usually include just a number, a measurement, and a food! This simple heuristic worked better than other sophisticated things I tried.<p>[1]: <a href="https://github.com/schollz/ingredients" rel="nofollow">https://github.com/schollz/ingredients</a><p>[2]: <a href="https://github.com/schollz/instructions" rel="nofollow">https://github.com/schollz/instructions</a>
I saw all the terrible SEOd recipe websites and my first thought was: I should make a better recipe website that is simpler and is better SEOd.<p>---<p>FIRST EXAMPLE:<p>How to cook chicken on a skillet<p>Step 1 -- get this much chicken [picture]<p>Step 2 -- cook on skillet for 5 minutes<p>OPTIONAL -- here are seasonings you may add [pictures]<p>RELATED:<p>- How to cook a lot of chicken on a skillet [LINK]<p>- How to fry chicken breast [LINK]<p>---<p>But then I didn't understand how any of these websites are making money so I didn't do it.
I just started transcribing every recipe I make. Even if you can extract all the essential information from a recipe site, some changes are needed:<p>- I need to convert recipes to metric. I am neither equipped nor inclined to cook in freedom units.<p>- A "can" or a "packet" is not a standard unit of measurement.<p>- Package sizes vary between countries. I often adjust recipes to avoid wasting food.<p>- I cook by mass, not volume. I convert the units them round them.<p>- Instructions are sometimes too verbose. I make them easier to follow while my hands are busy.<p>- I will make my own changes and I must write them down somewhere.<p>Besides, sites go down and links break. Food.com broke many of my bookmarks a few years ago. Other sites went dark. My recipes are plain text. They are editable, searchable, editable, and available offline.
Hey Ben, thanks for that write up! You may not have time for this, but your article and the intersection of food/recipes and computer science would make a good book, at least I would read it.<p>I wrote [1] about 12 years ago in Clojure because for health reasons I had to track my intake of vitamin K, then decided to track all nutrients in the USDA nutrition database. I am working on a semantic web product (with another semantic product in planning) but maybe the end of this year will get to rewriting my food web app in Common Lisp and as a macOS app. I am adding a link to your article and these comments here to my notes for that project. Useful stuff.<p>[1] <a href="http://cookingspace.com" rel="nofollow">http://cookingspace.com</a>
Neat write-up, and thanks for putting me on to jsonld.js - looks useful.<p>I'm building <a href="https://simplescraper.io" rel="nofollow">https://simplescraper.io</a> and we're trying to create heuristics to update CSS selectors whenever a website changes. People become unhappy when a scrape task that ran smoothly on Monday suddenly returns nothing on Tuesday so while it's a tough nut to crack it's super important.<p>We use a combination of XPath, historical data and data type (the value may change but the type and length often remain the same or similar) to narrow down the options.<p>Of course there's more sophisticated methods using Machine learning etc. but it's fun to try different approaches to solve this problem.
In 2011, Google released "Google Recipe Search". With filtering based on ingredients, cook time, and calories.<p><a href="https://www.wired.com/2011/02/google-recipe-semantic/" rel="nofollow">https://www.wired.com/2011/02/google-recipe-semantic/</a><p><a href="https://latimesblogs.latimes.com/technology/2011/02/google-debuts-recipe-view-search-function-for-cooks.html" rel="nofollow">https://latimesblogs.latimes.com/technology/2011/02/google-d...</a>
I personally just find recipes, make it as written from the website, and then (if I actually like it), I'll convert it to be sane for actually following and output into Apple Notes.<p>What I mean by that is most recipes call for using wwwaaayyy more intermediary bowls/plates than actually required (e.g. if spices, chopped veggies, and minced garlic are going into the pot at the same time, there's no point in using three bowls) or list ingredients out of order of how you'd actually use them.
So far the best way I've found to search for recipes is to search in a foreign language. Translate what you're looking for, then search and translate back to English. There are still recipe blogs, but 5 instead of 5,000, and usually an authentic dish, not what Michelle The Stir Fry Queen From Michigan thinks constitutes a "Moroccan" dish because it has cinnamon and tomatoes.<p>Would love to see someone put together a search engine that excludes recipe blogs and penalizes SEO.
This is pretty interesting. I wonder how the recipe parsers from MyFitnessPal or Pinterest compare to this. Sometimes I think they do pretty good, but often they do miss the mark. My guess is on Pinterest they only treat something as a Recipe if it contains the metadata mentioned in the article, and do the easy parse if so. MFP seems to try something a bit more advanced, but I've never been super-impressed with its parsing abilities.
This is great. I made a similar product at No Nonsense Recipes <a href="https://nononsense.recipes" rel="nofollow">https://nononsense.recipes</a> because I was also tired of dealing with all the dreck on recipe sites. I did scrape some recipes to seed the site with but haven't integrated it as a feature yet.<p>I did ignore the photos though, since while recipes are not subject to copyright, photos are.
Off-topic, but I just wanted to mention that Ben's been one of my favorite 'teachers' in YouTube. He has some quality content on React and JS stuff. For those wanting to learn React (including some advanced stuff), check out his channel! And no he didn't pay me to post this here. Hey thanks Ben - I know a bit of React and have used it on a few projects thanks (also) to you.
A surprisingly good UX for recipes is Google Home. Ask it for a recipe, and it will ask if you want directions or ingredients. If you ask for ingredients, it will say them one by one, and pause between them until you ask it for the next one. My son has used it to great effect to make pancakes.
Really nice! I often copy and paste recipes into text files I have locally so this is a great alternative.<p>One feature request (if I may be so bold): it would be great to offer an imperial<->metric convertor. This is predominantly one of the reasons I keep copies of recipes I find and use.
I've been working on something similar for the past couple of days, but the trouble comes with wanting static types. There are a few projects out there that offer either a microdata parser, or types derived from schema.org but nothing that combines the two as yet
This is pretty interesting, I wonder if this meta could be reused for tutorials of any kind (and not only of food, a.k.a. recipes). A tutorial normally has some requisites, and then step by step guide of how to achieve it, and then the final result.
I did something similar a while ago. I still have somewhere a DB with half a million recipes somewhere. I didn't continue it because I got stuck with the client side and I didn't find anyone interested in helping me.
Is there any recipe tool out there that can do at least one of the following:<p>1) Scale the quantity of ingredients and cooking time as number of people to be served increases?<p>2) Tell me what dishes I can make with the ingredients I have?
Pleasantly surprised to learn that most recipe sites include structured metadata. Makes sense given the combination of a relatively straightforward schema, and SEO incentive from Google.
I've been using Tasty. Quick videos showing all the steps and the how it's supposed to look like along the way. That's the only way I can accept recipes anymore.
This is pretty awesome. I'm currently working on a data pipeline to demonstrate recipe scraping with kafka streams. This is going to be a big help in part of it.
There is markup specifically for recipes. I wonder why it isn't more often used.<p>EDIT: Yes, the article mentions it, but doesn't give a clue why it isn't more prevalent.
Another tool for difficult-to-scrape sites is OCR. There are a few decent free/opensource options available:<p><a href="https://source.opennews.org/articles/so-many-ocr-options/" rel="nofollow">https://source.opennews.org/articles/so-many-ocr-options/</a>
Be careful with this, some recipes are subject to copyright law. I think you can list ingredients of a recipe with no problem, but once you get to exact measurements and prep it somehow switches over to falling under copyright law. There used to be a bunch of open sourced recipe repos/databases...but almost all of them are gone.