RCC and UQ ran a half-day Web scraping and data cleanup workshop for UQ staff and students in the Library’s Centre for Digital Scholarship on Friday, 25 November.
Marco Fahmi from UQ’s Faculty of Humanities and Social Sciences and RCC’s Belinda Weaver led the workshop, which attracted 18 participants.
Participants were shown how to extract structured data (e.g. tables) from websites using the import.io tool. Unlike other Web scraping tools, Import.io does not require any coding knowledge. Attendees were also shown alternative ways of getting such data, e.g. using tools such as the YouTube and Twitter APIs.
They were then introduced to the OpenRefine tool for data cleanup. Such was the success of this demonstration that one person left the workshop and immediately taught two others how to use OpenRefine.
OpenRefine is a free, open source tool for wrangling messy data. Import.io is a fee-based service, but it is possible to get 500 free scrapes, which makes it free for the occasional user.
Please contact rcc-support@uq.edu.au if you would like to express interest in future Web scraping and data cleanup workshops.