You just came up with a brilliant idea and want to grab all of the data in the world, consolidate it in one location and make it actually useful for us internet addicts. You look for hours to find the data you need, but the website you want the information from doesn’t have an API.
In the first part we will scrape an event listing website Peek.com with ParseHub. In the second part we will create our Python and Flask web app that displays all of the events we gathered from Peek.com.
How to scrape dynamic websites with ParseHub
The next steps will show you how to turn Peek.com into an API using the ParseHub browser extension.
Download the ParseHub browser extension and go to Peek.com
Select all of the cities from the dropdown, and tell ParseHub to show activities for each city
- Using the select tool, click on the dropdown “Things to Do This Week in San Francisco”.
- Use the click tool, to open the dropdown. Now, use the select tool to click on all of the locations in the opened dropdown.
- Use the list tool, to create a new empty empty JSON object or empty Excel row for each selected city. Rename the list “locations”.
- Use the extract tool, and rename the extraction “city”. You should see all of the city names extracted into your sample results.
- Use the click tool again to tell ParseHub to click on each city separately.
Click the “see more” button to display all of the activity categories for each city
- Let’s go to the page with all of the results for each city. Find the “See More in San Francisco” button. Using the select tool, click on the button.
- Now, use the navigate tool, to tell ParseHub to click on the button and navigate to a new page. We need to create a new template, for the new page by entering “results” into the textbox and clicking “Create New Template”.
Click on the categories on the page and navigate to display all of the activities
- Let’s get all of the categories on the new page, so we know which activity is “under $50” or “Fun for Locals”. Click on one of the categories using the select tool, hold the SHIFT key and click on another category.
- Use the list tool, and rename it “categories”.
- Use the extract tool, and rename the extraction “category”. Watch the category names appear in your sample results.
- Let’s click on each category and navigate to our final page that lists all of the activities. Use the navigate tool, type in “activities” and click “Create New Template”.
Filter for activities that cost less than $100 and extract their price and name
- Now, let’s get the price and the name for each activity, but only if the activity is under $100. Click on one of the activity prices, hold the SHIFT key and click on another activity price. Make sure you are selecting the <span>.
- Use the list tool, to create empty scopes for each activity in JSON.
- Use the conditional tool, and type in $e.text <= 100 into the textbox below.
- Use the extract tool, and rename the extraction “price”. Scroll down through your sample results and notice that none of the activities over $100 are present.
- Use the relative select tool, to select the name of each activity. Click on the price of one activity, and hover over the name of the same activity. Notice how the name and the location is highlighted. Hold down the Ctrl key (or Cmd key on Mac), press the number 2 key one time and watch how only the name is selected. Now click on the name.
Download your data in JSON/CSV
After setting up your project, you need to run it and wait for your results to appear. You can then download all of the data in JSON or CSV format.
- Click “Get Data”, “Run Once”, “Save” and “Run on Servers”. Wait a few minutes and your data should be available for all of the activities in all of the cities.
- Beside the “actions” text, click to download your data in JSON or CSV format.
Find your API key and Project token
To interact with your project through the ParseHub API, and to automatically feed the data you scraped into your mobile or web application, first find your API key and project token.