The internet is full of unstructured information which, with a bit of elbow grease and luck, can be turned to gold. Leading mobile analytics company App Annie, for example, got their start by scraping app rankings from publicly accessible pages.
I wanted to see how easy it would be to turn a website with an ever-changing ranking of items into an excel sheet I could monitor. I compared two services - Import.io and Blockspring. This guide covers Blockspring (and Diffbot, the service Blockspring uses for scraping).
Blockspring has many integrations out-of-the-box:
Blockspring - Feed your Sheets
Blockspring is a service which allows you to add data feeds to your Google Sheets. If you spend hours or days copying data on websites, pasting it into Google Sheets, and cleaning it to make it nice 'n useful, Blockspring may be a huge time saver for you.
A typical Blockspring workflow looks like this:
- Open a Google Sheet
- Open the Blockspring addon
- Pick the data you're seeking
Blockspring relies on integrations with dozens of other services (Slack, Amazon, etc.) to get the data you want. I chose Diffbot, a web scraping service. Diffbot scrapes a website, and Blockspring dumps it into my Google Sheet.
From Webpage to Sheet in Ten Minutes
This required signing up for both Blockspring and Diffbot (both have free trials). So to begin, sign up up for a free trial with Diffbot, get a developer key (emailed to you) and return to Blockspring.
Register for Blockspring and then try to create a block with Diffbot. Choose "extract a list of products" with Diffbot, and then fill in the Diffbot API developer key you were given, and the URL you'd like to scrape and turn into a Google Sheet. I'd like to create a list of the top crypto currencies, so I chose Coin Market, which lists the top 100 cryptocurrencies by market cap.
After you put in the URL you want to scrape, go to your Google Sheets, sign in, click on "Addons" at the top, and add Blockspring as an add-on.
Now go back to the blockspring webpage where you're setting up the Diffbot Block. In step three you'll see "Google Sheets" option. Choose that and pick the Google account you just added Blockspring to.
You should at some point see something like this, which sends you back to sheets:
Now back in Google Sheets, pop open the Blockspring console via the Addons menu item on the top of sheets if it's not open already. Click to sign in with your Blockspring account - should connect it automagically.
Now scroll to the Diffbot connector, click it, click "connect", and choose "extract products from webpage":
Click run if needed:
And tada! The data should be imported to your page. You can even schedule a refresh of the data pull hourly, daily, etc.
Though my page (the crypto currencies) wasn't exactly a list of products, Diffbot figured it out.
Verdict - Promising, and no coding required
Based on my brief experience with Blockspring, the service seems promising. I could see this powering all sorts of time-consuming business processes or even be used to power a business itself. And Diffbot, for those not afraid to code, appears to have a lot of functionality.
I wished the Diffbot block had other options and better hints as to how to use it. For example, with Import.io, you can choose to only receive an update when items on the page change or new ones are added. The whole excitement of using Diffbot with Blockspring is to not need to program anything, and it's unfortunate that more powerful features of Diffbot is locked behind coding.
Nevertheless, I accomplished my primary goal for free in just a few minutes without any coding.
Keep on Learning!