Feedalizer – qerub.se

I haven’t touched Feedalizer since 2008 but Michaël Rigart has a reboot of the project on GitHub!

Feedalizer is a small Ruby library that glues together Hpricot with the standard RSS library in a way that makes it easy to transform web pages into RSS feeds. If you ask me, it makes it too easy.

Example

A feed for this page is created with this code:

url = "http://sydsvenskan.se/serier/nemi/index.jsp?context=serie" 

feedalize(url) do
  feed.title = "Nemi" 
  feed.about = "..." 
  feed.description = "Daily Nemi strip scraped from Sydsvenskan" 

  scrape_items("option") do |rss_item, html_element|
    link = html_element.attributes["value"]
    date = Time.parse(html_element.inner_html)

    rss_item.title = [feed.title, date.strftime("%Y-%m-%d")].join(", ")
    rss_item.description = grab_page(link).search("//img[@width=600]")
    rss_item.date = date
    rss_item.link = link

    rss_item.guid.isPermaLink = true
    rss_item.guid.content = link
  end
end

You can find more code in the examples directory.

Feedalizin’

gem install feedalizer

You can also download the files manually or get the development version from the Subversion repository.

Please let me know if you write a script, I’d love to include it with Feedalizer as an example.

Documentation

Skim through the code and take a look at the included examples. Make sure you know Hpricot and RSS::Maker if you want to do anything fancy.

Reporting Problems

Bug reports, feature requests and cries of despair are welcome at the tracker on RubyForge.

Version History

See CHANGELOG.txt.

Background

The project came to life when I and Olle discussed web scraping at a SSRUG meeting in February 2006. I’ve played around with different HTML parsing libraries since then and you can find HTree and Tidy powered variants in the repository.