Tag: scraping

Bulk Scraping

Tags: python, scraping

Published on Friday, January 6th, 2017

One of my friends who is a non-CS concentrator would like to scrape emails of all faculties listed in this website. Unfortunately, the emails are not on the page itself, but are on subpages. It would take forever to scrape the data by hand, so I helped. To do this, I need to send multiple requests to scrape each subpage. Naively, we would send a request, wait for a response, then repeat until we go over all list of faculties. This however would take a lot of time. We can do better by sending requests asynchronously. This is feasible because there is no dependency in the data.


Huginn

Tags: huginn, rss, scraping

Published on Monday, December 26th, 2016

Yahoo! terminated Yahoo! Pipes on June 4, 2015. It breaks my heart to see another good service dying. However, I recently found another project which has an ability just like Yahoo! Pipes: Huginn


Yahoo! Pipes

Tags: yahoo-pipes, rss, scraping

Published on Sunday, May 11th, 2014

This post is migrated from my old blog.

I don’t know if RSS or Atom are still popular or not. I personally use them a lot. Here are some examples of the feeds that I subscribed.