I haven’t touched Feedalizer since 2008 but MichaĆ«l Rigart has a reboot of the project on GitHub!
Feedalizer is a small Ruby library that glues together Hpricot with the standard RSS library in a way that makes it easy to transform web pages into RSS feeds. If you ask me, it makes it too easy.
A feed for this page is created with this code:
url = "http://sydsvenskan.se/serier/nemi/index.jsp?context=serie"
feedalize(url) do
feed.title = "Nemi"
feed.about = "..."
feed.description = "Daily Nemi strip scraped from Sydsvenskan"
scrape_items("option") do |rss_item, html_element|
link = html_element.attributes["value"]
date = Time.parse(html_element.inner_html)
rss_item.title = [feed.title, date.strftime("%Y-%m-%d")].join(", ")
rss_item.description = grab_page(link).search("//img[@width=600]")
rss_item.date = date
rss_item.link = link
rss_item.guid.isPermaLink = true
rss_item.guid.content = link
end
end
You can find more code in the examples directory.
gem install feedalizer
You can also download the files manually or get the development version from the Subversion repository.
Please let me know if you write a script, I’d love to include it with Feedalizer as an example.
Skim through the code and take a look at the included examples. Make sure you know Hpricot and RSS::Maker if you want to do anything fancy.
Bug reports, feature requests and cries of despair are welcome at the tracker on RubyForge.
See CHANGELOG.txt.
The project came to life when I and Olle discussed web scraping at a SSRUG meeting in February 2006. I’ve played around with different HTML parsing libraries since then and you can find HTree and Tidy powered variants in the repository.