|
| 1 | +# Open Recipes |
| 2 | + |
| 3 | +## About |
| 4 | + |
| 5 | +Open Recipes is an open database of recipe bookmarks. |
| 6 | + |
| 7 | +Our goals are simple: |
| 8 | + |
| 9 | +1. Help publishers make their recipes as discoverable and consumable (get it?) as possible. |
| 10 | +2. Prevent good recipes from disappearing when a publisher goes away. |
| 11 | + |
| 12 | +That's pretty much it. We're not trying to save the world. We're just trying to save some recipes. |
| 13 | + |
| 14 | +## Recipe Bookmarks? |
| 15 | + |
| 16 | +The recipes in Open Recipes do not include preparation instructions. This is why we like to think of Open Recipes as a database of recipe bookmarks. We think this database should provide everything you need to *find* a great recipe, but not everything you need to *prepare* a great recipe. For preparation instructions, please link to the source. |
| 17 | + |
| 18 | +## The Database |
| 19 | + |
| 20 | +Regular snapshots of the database will be provided as JSON. The format will mirror the [schema.org Recipe format](http://schema.org/Recipe). We've [posted an example dump of data](http://openrecipes.s3.amazonaws.com/openrecipes.txt) so you can get a feel for it. |
| 21 | + |
| 22 | +## The Story |
| 23 | + |
| 24 | +We're not a bunch of chefs. We're not even good cooks. |
| 25 | + |
| 26 | +When we read about the [acquisition shutdown of Punchfork](http://punchfork.com/pinterest), we just shook our heads. It was the same ol' story: |
| 27 | + |
| 28 | +> We're excited to share the news that we're gonna be rich! To celebrate, we're shutting down the site and taking all your data down with it. So long, suckers! |
| 29 | +
|
| 30 | +This part of the story isn't unique, but it continues. When one of our Studiomates spoke up about her disappointment, we listened. Then, [we acted](https://hugspoon.com/punchfork). What happens next surprised us. The CEO of Punchfork [took issue](https://twitter.com/JeffMiller/status/314899821351821312) with our good deed and demanded that we not save any data, even the data (likes) of users who asked us to save their data. |
| 31 | + |
| 32 | +Here's the thing. None of the recipes belonged to Punchfork. They were scraped from various [publishers](https://github.com/fictivekin/openrecipes/wiki/Publishers) to begin with. But, we don't wanna ruffle any feathers, so we're starting over. |
| 33 | + |
| 34 | +Use the force; seek the source? |
| 35 | + |
| 36 | +## The Work |
| 37 | + |
| 38 | +Wanna help? Fantastic. We knew we liked you. |
| 39 | + |
| 40 | +We're gonna be using [the wiki](https://github.com/fictivekin/openrecipes/wiki) to help organize this effort. Right now, there are two simple ways to help: |
| 41 | + |
| 42 | +1. Add a [publisher ](https://github.com/fictivekin/openrecipes/wiki/Publishers). We wanna have the most complete list of recipe publishers. This is the easiest way to contribute. Please also add [an issue ](https://github.com/fictivekin/openrecipes/issues) and tag it `publisher`. If you don't have a github account you can also email us suggestions at [email protected] |
| 43 | +2. Claim a publisher. |
| 44 | + |
| 45 | +Claiming a publisher means you are taking responsibility for writing a simple parser for the recipes from this particular publisher. Our tech ([see below](#the-tech)) will store this in an object type based on the [schema.org Recipe format](http://schema.org/Recipe), and can convert it into other formats for easy storage and discovery. |
| 46 | + |
| 47 | +Each publisher is a [GitHub issue](https://github.com/fictivekin/openrecipes/issues), so you can claim a publisher by claiming an issue. Just like a bug, and just as delicious. Just leave a comment on the issue claiming it, and it's all yours. |
| 48 | + |
| 49 | +When you have a working parser (what we call "spiders" below), you contribute it to this project by submitting a [Github pull request](https://help.github.com/articles/using-pull-requests). We'll use it to periodically bring recipe data into our database. The database will be available intially as data dumps. |
| 50 | + |
| 51 | +## The Tech |
| 52 | + |
| 53 | +To gather data for Open Recipes, we are building spiders based on [Scrapy](http://scrapy.org), a web scraping framework written in Python. We are using [Scrapy v0.16](http://doc.scrapy.org/en/0.16/) at the moment. To contribute spiders for sites, you should have basic familiarity with: |
| 54 | + |
| 55 | +* Python |
| 56 | +* Git |
| 57 | +* HTML and/or XML |
| 58 | + |
| 59 | +### Setting up a dev environment |
| 60 | + |
| 61 | +> Note: this is strongly biased towards OS X. Feel free to contribute instructions for other operating systems. |
| 62 | +
|
| 63 | +To get things going, you will need the following tools: |
| 64 | + |
| 65 | +1. Python 2.7 (including headers) |
| 66 | +1. Git |
| 67 | +1. `pip` |
| 68 | +1. `virtualenv` |
| 69 | + |
| 70 | +You will probably already have the first two, although you may need to install Python headers on Linux with something like `apt-get install python-dev`. |
| 71 | + |
| 72 | +If you don't have `pip`, follow [the installation instructions in the pip docs](http://www.pip-installer.org/en/latest/installing.html). Then you can [install `virtualenv` using pip](http://www.virtualenv.org/en/latest/#installation). |
| 73 | + |
| 74 | +Once you have `pip` and `virtualenv`, you can clone our repo and install requirements with the following steps: |
| 75 | + |
| 76 | +1. Open a terminal and `cd` to the directory that will contain your repo clone. For these instructions, we'll assume you `cd ~/src`. |
| 77 | +2. `git clone https://github.com/fictivekin/openrecipes.git` to clone the repo. This will make a `~/src/openrecipes` directory that contains your local repo. |
| 78 | +3. `cd ./openrecipes` to move into the newly-cloned repo. |
| 79 | +4. `virtualenv --no-site-packages venv` to create a Python virtual environment inside `~/src/openrecipes/venv`. |
| 80 | +5. `source venv/bin/activate` to activate your new Python virtual environment. |
| 81 | +6. `pip install -r requirements.txt` to install the required Python libraries, including Scrapy. |
| 82 | +7. `scrapy -h` to confirm that the `scrapy` command was installed. You should get a dump of the help docs. |
| 83 | +8. `cd scrapy_proj/openrecipes` to move into the Scrapy project directory |
| 84 | +9. `cp settings.py.default settings.py` to set up a working settings module for the project |
| 85 | +10. `scrapy crawl thepioneerwoman.feed` to test the feed spider written for [thepioneerwoman.com](http://thepioneerwoman.com). You should get output like the following: |
| 86 | + |
| 87 | + <pre> |
| 88 | + 2013-03-30 14:35:37-0400 [scrapy] INFO: Scrapy 0.16.4 started (bot: openrecipes) |
| 89 | + 2013-03-30 14:35:37-0400 [scrapy] DEBUG: Enabled extensions: LogStats, TelnetConsole, CloseSpider, WebService, CoreStats, SpiderState |
| 90 | + 2013-03-30 14:35:37-0400 [scrapy] DEBUG: Enabled downloader middlewares: HttpAuthMiddleware, DownloadTimeoutMiddleware, UserAgentMiddleware, RetryMiddleware, DefaultHeadersMiddleware, RedirectMiddleware, CookiesMiddleware, HttpCompressionMiddleware, ChunkedTransferMiddleware, DownloaderStats |
| 91 | + 2013-03-30 14:35:37-0400 [scrapy] DEBUG: Enabled spider middlewares: HttpErrorMiddleware, OffsiteMiddleware, RefererMiddleware, UrlLengthMiddleware, DepthMiddleware |
| 92 | + 2013-03-30 14:35:37-0400 [scrapy] DEBUG: Enabled item pipelines: MakestringsPipeline, DuplicaterecipePipeline |
| 93 | + 2013-03-30 14:35:37-0400 [thepioneerwoman.feed] INFO: Spider opened |
| 94 | + 2013-03-30 14:35:37-0400 [thepioneerwoman.feed] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) |
| 95 | + 2013-03-30 14:35:37-0400 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023 |
| 96 | + 2013-03-30 14:35:37-0400 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080 |
| 97 | + 2013-03-30 14:35:38-0400 [thepioneerwoman.feed] DEBUG: Crawled (200) <GET http://feeds.feedburner.com/pwcooks> (referer: None) |
| 98 | + 2013-03-30 14:35:38-0400 [thepioneerwoman.feed] DEBUG: Crawled (200) <GET http://thepioneerwoman.com/cooking/2013/03/beef-fajitas/> (referer: http://feeds.feedburner.com/pwcooks) |
| 99 | + ... |
| 100 | + </pre> |
| 101 | + |
| 102 | + If you do, [*baby you got a stew going!*](http://www.youtube.com/watch?v=5lFZAyZPjV0) |
| 103 | + |
| 104 | +### Writing your own spiders |
| 105 | + |
| 106 | +For now, we recommend looking at the following spider definitions to get a feel for writing them: |
| 107 | + |
| 108 | +* [spiders/thepioneerwoman_spider.py](scrapy_proj/openrecipes/spiders/thepioneerwoman_spider.py) |
| 109 | +* [spiders/thepioneerwoman_feedspider.py](scrapy_proj/openrecipes/spiders/thepioneerwoman_feedspider.py) |
| 110 | + |
| 111 | +Both files are extensively documented, and should give you an idea of what's involved. If you have questions, check the [Feedback section](#feedback) and hit us up. |
| 112 | + |
| 113 | +To generate your own spider, use the included generate.py program. From the scrapy_proj directory, run the following (make sure you are in the correct virtualenv: |
| 114 | + |
| 115 | +`python generate.py SPIDER_NAME START_URL` |
| 116 | + |
| 117 | +This will generate a basic spider for you named SPIDER_NAME that starts crawling at START_URL. All that remains for you to do is to fill in the correct info for scraping the name, image, etc. See `python generate.py --help' for other command line options. |
| 118 | + |
| 119 | +We'll use the ["fork & pull" development model](https://help.github.com/articles/fork-a-repo) for collaboration, so if you plan to contribute, make sure to fork your own repo off of ours. Then you can send us a pull request when you have something to contribute. Please follow ["PEP 8 - Style Guide for Python Code"](http://www.python.org/dev/peps/pep-0008/) for code you write. |
| 120 | + |
| 121 | +## Feedback? |
| 122 | + |
| 123 | +We're just trying to do the right thing, so we value your feedback as we go. You can ping [Ed ](https://github.com/funkatron), [Chris ](https://github.com/shiflett), [Andreas ](https://github.com/andbirkebaek), or anyone from [Fictive Kin ](https://github.com/fictivekin). General suggestions and feedback to [[email protected]](mailto:[email protected]) are welcome, too. |
| 124 | + |
| 125 | +We're also gonna be on IRC, so please feel free to join us if you have any questions or comments. We'll be hanging out in #openrecipes on Freenode. See you there! |
0 commit comments