How to Turn Existing Web Pages Into RESTful APIs With Import.io

Continued from page 2. 

The first thing you need to do is put the extractor into edit mode. You'll do this by selecting the extractor you want to edit in the menu bar on the left side of the Import.io Dashboard as shown next in Figure 12.

Figure 12: Import.io has a feature that allow you to edit an extractor's columns.

Figure 12: Import.io has a feature that allow you to edit an extractor's columns.

Then, once selected, click the Edit button on the upper right of the Dashboard. (See Figure 12.)

Adding a Custom Column

Once the extractor is in edit mode, you'll proceed with the customization. The Edit page displays all of the columns that the extractor discovered automatically. You're going to delete all the columns and replace them with the columns for job Title, Description, City, and State. In order to delete all the columns click the Delete all columns button as shown in Figure 13, below.

Figure 13: Deleting all the columns from an extractor allows you to create a fresh customization.

Figure 13: Deleting all the columns from an extractor allows you to create a fresh customization.

Clicking the Delete all columns button removes all columns from the UI, but leaves behind a default column labeled, New column, as shown next in Figure 14. This column is not bound to any data, nor is the label, New column, currently of any particular use, so you need to do two things. First, you'll need to give the column a useful name. Then, you'll need to bind the column to data from the target web page.

To rename the column, click the down arrow icon on the on the right side of the column. Clicking the down arrow displays a context menu that contains the Rename column item. (Please see Figure 14, next.)

Figure 14: Click the down arrow icon to display the edit choices for a given column.

Figure 14: Click the down arrow icon to display the edit choices for a given column.

Clicking the Rename column context menu item puts the column header into edit mode. Once in edit mode, enter a new value for the column header. In this case you'll enter the column header value, Title. (Please see Figure 15, callout (1) shown next.)

Now you need to bind some data to the column. You'll do this by clicking on the field on the web page that contains the data that represents the field of interest, starting at the first occurrence of that field on the web page. After you click the first occurrence, click the next occurrence as shown in Figure 15, callout (2) below. Clicking the same field in sequence trains the extractor to identify all field occurrences in the page. You can tell that the extractor has been properly trained to identify a field's pattern by looking at the the column dialog on the right side of the web page, as shown in Figure 15, callout (3) next. Import.io automatically fills the column dialog with data the extractor has been trained to identify.

Figure 15: Import.io allows you to train an extractor to identify information on a web page as fields to be bound to a column of structured data.

Figure 15: Import.io allows you to train an extractor to identify information on a web page as fields to be bound to a column of structured data.

Having trained the extractor to identify data for the Title column, you'll use the same training technique to create columns for Description, City, and State.

Take a look at the video for creating a custom column
We've made a video that you can view on YouTube that walks you through the details of creating a custom column in Import.io. You can View the video here.
 

Formatting Data in a Custom Column with Regular Expressions

You'll have occasions when a column displays information that's really two pieces of data. The most common example is the city, state string value, in which the single string implicitly contains two data fields, City and State. Figure 16 (next) shows you that an extractor has been trained to identify a field of data that contains both City and State information that's separated by a comma. As you can see in Figure 16, when it comes to having valid data in a City column, the value Boston, MA will not suffice. All you want is the value, Boston.

Fortunately Import.io allows you to display a column's data according to a Regular Expression you specify. Thus, to satisfy the previous example, you can separate City from State and show only city values. Let's take a look at how to do this.

As shown in Figure 16 callout (1), you click the down arrow icon on the right side of the City column label. Clicking the icon displays a context menu that has the Set regular expression item.

Figure 16: Select [Set regular expression] (1) to apply a regular expression to a column's data.

Figure 16: Select [Set regular expression] (1) to apply a regular expression to a column's data.

Click theSet regular expression context menu item, as shown in Figure 16 callout (1). Clicking the context menu item displays the Set regular expression dialog box shown next in Figure 17.

Enter the regular expression, ^(.*), as shown below in Figure 17, callout (1). The meaning of that particular regular expression is "capture all the characters from the beginning of a line up to the first comma encountered." In the Replace text field, enter $1. This tells Import.io to replace the values in each row of the column with the group of characters captured by the regular expression. In this case the group is identified as $1, the first group. You can see by looking in the column dialog on the right (on Figure 17 callout (2)) that the regular expression is working as desired.

Figure 17: The dialog, [Set regular expression] allows you to apply a regular expression to all data in a column.

Figure 17: The dialog, [Set regular expression] allows you to apply a regular expression to all data in a column.

You'll take the same approach to editing the State column. The extractor for the State column has been trained to identify the City, State sequence of characters, but you're only interested in the characters relevant to State. So you'll click the down arrow icon in the State column as you did with the City column to select, Set regular expression. However, this time you're going to set a regular expression that starts at the first comma encountered and captures all characters after that comma.

Figure 18 callout (1) next shows that you entered the regular expression, ,\s(.*), and replaces the captured characters accordingly. Figure 18 callout (2) show the result of applying the regular expression.

Figure 18: Applying the regular expression.

Figure 18: Applying the regular expression.

Adding Default Values

Defining a default value for a column makes it so that when no data is present, you can provide a character or number rather than have a blank present. Set a default by clicking the column's Set default value context menu item. (Remember, you access a column's context menu by clicking the down arrow icon, as shown in Figure 19 callout (1).)

Figure 19: You apply a default value for column by selecting Set default value from the Edit menu.

Figure 19: You apply a default value for column by selecting Set default value from the Edit menu.

Clicking Set default value in a column's context menu shows the Enter default value dialog. (See Figure 20, next.) You enter the value you want to use as default in the Default value text field. In this case we enter the string, Unknown. (Please see Figure 20, callout (1).) Then click the Save and Close button.

(This article is paginated. Use the pagination controls below to retrieve the next or previous page)

Bob Reselman Bob Reselman is a nationally known software developer, system architect and technical writer/journalist. Over a career that spans 30 years, Bob has worked for companies such as Gateway, Cap Gemini, The Los Angeles Weekly, Edmunds.com and the Academy of Recording Arts and Sciences, to name a few. Bob has written 4 books on computer programming and dozens of articles about topics related to software development technologies and techniques as well as the culture of software development.
 

Comments (0)