Warning: This hasn’t been touched since 2008 and is abandonware!
url = "http://sydsvenskan.se/serier/nemi/index.jsp?context=serie" feedalize(url) do feed.title = "Nemi" feed.about = "..." feed.description = "Daily Nemi strip scraped from Sydsvenskan" scrape_items("option") do |rss_item, html_element| link = html_element.attributes["value"] date = Time.parse(html_element.inner_html) rss_item.title = [feed.title, date.strftime("%Y-%m-%d")].join(", ") rss_item.description = grab_page(link).search("//img[@width=600]") rss_item.date = date rss_item.link = link rss_item.guid.isPermaLink = true rss_item.guid.content = link end end
You can find more code in the examples directory.
gem install feedalizer
Please let me know if you write a script, I’d love to include it with Feedalizer as an example.
Bug reports, feature requests and cries of despair are welcome at the tracker on RubyForge.
The project came to life when I and Olle discussed web scraping at a SSRUG meeting in February 2006. I’ve played around with different HTML parsing libraries since then and you can find HTree and Tidy powered variants in the repository.