In a recent project, I had to scrape Web pages that were only accessible after authorization. Due to a complicated authorization process (“Estonian ID card”), I could not just fake the login with tools such as Selenium. So I had to somehow execute code in the browser environment. It took me some time to figure out for what exactly I was looking for. I ended up with Greasemonkey for Firefox. This allowed me to automatically execute a Javascript script after page load. The script consistent of three steps:
- Scrape data from HTML
- Save data to LocalStorage
- Change Webpage
When the scraping process is done, you can copy the localStorage to the clipboard
copy(JSON.stringify(JSON.stringify(localStorage)));
Paste and save it.