Guidelines for Creating a RESTful API

Marcello La Rocca, Data Visualization Engineer, SwiftIQ
Jul. 11 2014, 02:14PM EDT

Development with RESTful APIs enables organizations to integrate and thus more fully deliver on the promise of social, mobile, analytics and the cloud. In this tutorial we will provide a step-by-step guide to developing RESTful APIs.

REST is commonly identified with Roy Fielding’s 2000 PhD thesis, in which he first used the acronym REST (Representational State Transfer) and defined it as “a coordinated set of architectural constraints that attempts to minimize latency and network communication while at the same time maximizing the independence and scalability of component implementations.”

According to Fielding, REST is defined by four interface constraints:

  1. Identification of resources
  2. Manipulation of resources through representations
  3. Self-descriptive messages
  4. Hypermedia as the engine of application state

As far as we are concerned, this means that:

  • Every interesting resource must have its own unique URI, which identifies the resource and can be used to request an instance of the resource. For example, http://pweb-14.appspot.com/videos/tutorial/ could identify the list of videos matching the query “tutorial.”
  • Resources are manipulated using HTTP requests, usually through GET and POST methods, but PUT and DELETE might be more appropriate for some operations.
  • Each identified resource can be returned in various formats (such as HTML, XML or JSON), and each of these formats is a representation of the identified resource; RESTful applications may support more than one representation of the same resource at the same URL, using the Accept HTTP header that is passed by the client to the server with each request for a resource. (The same goes for uploaded resources, using Content-Type HTTP header.)
  • Communication must be state-less: Each client request and server response is a self-descriptive message—that is, each message contains all the information necessary to complete the task.

RESTful principles also provide strategies to handle CRUD actions using HTTP methods; an example mapping could be the following:

  • GET /videos - Retrieves a list of videos (We won’t use this one because it wouldn’t make sense for our app.)
  • GET /videos/tutorial - Retrieves the list of videos that match the keyword “tutorial”
  • GET /videos/id/:id - Retrieves a specific video--in particular, the one whose id is :id
  • POST /videos - Creates a new video on the server (We won’t implement this method in our API.)
  • PUT /videos/:id - Updates video whose id is :id  (We won’t implement this method in our API.)
  • PATCH / videos/:id - Partially Updates video whose id is :id  (We won’t implement this method in our API.)
  • DELETE / videos/:id - Deletes the video whose id is :id (We won’t implement this method in our API.)

As you might have noticed, it is better to use the plural form, even when we retrieve/upload a single instance. This helps to avoid complicating the API: Think about the confusion that would be caused by odd plurals, or the bugs caused by mistakenly appending/not appending an “s” at the end of the URL!

Of the seven methods listed above, the first can’t be applied to our specific domain. (We can’t return a list of all the videos on YouTube, after all), while posting, updating and deleting videos would require OAuth authentication and are beyond the scope of this tutorial.

When choosing between the GET and POST methods, use this rule of thumb: Use POST when the action performed modifies the persistent data; use GET if--and only if--the action has no side effects. By default, crawlers (at least the “polite” ones) follow only GET methods. If you use GET, for instance, to delete an item from a list that can be accessed without authentication, you’ll end up with an empty list as soon as a search engine scans your website.

We can also take relations into account. For example, if we wanted to search videos matching certain keywords among those posted by a single user or having certain tags, we could use the following:

  • GET /videos/tutorial/related/:id- Retrieves the list of videos that matches the keyword “tutorial” related to the video whose id is :id
  • GET /videos/tutorial/regions/:region - Retrieves the list of videos for a particular country that match the keyword “tutorial

And, of course, the same design can be applied to the other CRUD methods.

Finally, we need to consider filtering and sorting. It is best to avoid retrieving the complete list of results for a query. (Sometimes we are even prevented from doing so, such as with the YouTube Data API.) To apply a cap on the number of results retrieved, sorting criteria become very relevant.

It’s always important to remember that simple means usable, and keeping it simple is almost always better when it comes to APIs.

To keep our API as simple as possible, implement filtering and sorting with query parameters in the URL:

  • GET /videos/tutorial?max_results=10&first_result=6  - Retrieves (at most) 10 videos that match the keyword “tutorial” starting from the sixth result
  • GET /videos/tutorial?max_results=5&sort=rating - Retrieves the first five videos that match the keyword “tutorial,” with results sorted by users’ ratings

From Theory to Python

We’ll build the features listed above in an incremental way, starting from the simplest one.

  • GET /videos/:keywords - Retrieves the list of videos that matches the parameter :keywords

What we need is a way to route the access to our site, capturing parameters. In this case, we want to create a rule for paths starting with “/videos/”, and then capture the text. Since we already know we will be allowing relations, and hence need more complex subpaths after the keywords, we can assume that every character after “/videos/” and until the next slash (‘/’), or the end of the URL string, will be captured as the keywords parameter.
To do so, we add a specific rule to the webapp2.WSGIApplication method

app = webapp2.WSGIApplication([
    ('/videos/([^/]+)/?', SimpleVideoSearchHandler)
], debug=True)

By using regular expressions, we can capture the text between our path identifier and the end of the URL. (If the URL is ended by another slash, it will still match our rule, but the slash won’t be included in the keywords parameter.)

If we also want to support complex query with relations, as in:

  • GET /videos/:keywords /related/:id- Retrieves the list of videos that match the parameter :keywords related to the video whose id is :id
  • GET /videos/:keywords /regions/:region - Retrieves the list of videos for a particular country that match the parameter :keywords

We need two more rules: app = webapp2.WSGIApplication([
    ('/videos/([^/]+)/?', SimpleVideoSearchHandler),
    ('/videos/([^/]+)/countries/([^/]+)/?', VideoSearchHandlerWithRegion),
    ('/videos/([^/]+)/related/([^/]+)/?', RelatedVideoSearchHandler)
], debug=True)

The router setup is almost done. We might want to add a rule to handle when none of the rules above is matched, just to take control of the error response we provide. We could either add it here, or, if we plan to return a static error page (possibly with instructions), we can handle it through the app.yaml configuration file. I decided for a hybrid approach. (See the gist repository for the final version.)

Now we have to create the handlers for these three paths. The great thing is that the parameters extracted from the URL will be sent directly to the Handler instance that will take care of the request. For example, we will define the get method for the SimpleVideoSearchHandler, taking into account that it will take two parameters ... class SimpleVideoSearchHandler(webapp2.RequestHandler):
    def get(self, keywords):
         params = extract_params()
         query_result = search_youtube_videos(query, params=params)
         responde_with_results(query_results)

… and for the other two paths: class RelatedVideoSearchHandler(webapp2.RequestHandler):
    def get(self, keywords, related_id):
         params = extract_params()
         query_result = search_youtube_videos(keywords, params=params , related_id=related_id)
         responde_with_results(query_results)
 
class VideoSearchHandlerWithRegion(webapp2.RequestHandler):
    def get(self, keywords, regionCode):
         params = extract_params()
         query_result = search_youtube_videos(keywords, params=params,
                                              regionCode=regionCode.upper())
         responde_with_results(query_results)
 
The code above is structured to leverage two helper functions defined in the global scope (outside the specific classes). This is designed on purpose to embrace the DRY principle--Do not Repeat Yourself. That is, if two methods, functions or classes do the same thing (except for, possibly, some minor details), there should be only one of them.

We can use still another design improvement: Since the three handlers share some methods (that is, some behavior) that have no other uses, it is a good idea to create a class hierarchy and have the common methods pushed up to the common ancestor (from which the three classes above will need to inherit): class VideoSearchHandler(webapp2.RequestHandler):
  #list all the possible valid values for the order
  VALID_ORDER_CRITERIA = set(['date', 'rating', 'relevance', 'title', 'videoCount', 'viewCount'])
  DEFAULT_SORTING_CRITERION = 'relevance'
 
  def validate_order(self, criterion):
    """ Validate the criterion passed, by verifying it is among the ones acceptable by the API
    """
    return criterion if criterion in VideoSearchHandler.VALID_ORDER_CRITERIA else
           VideoSearchHandler.DEFAULT_SORTING_CRITERION
 
  def extract_positive_int(self, param_name, default_value):
    """ Validate a positive int parameter – if validation fails return the default value provided
    """
    try:
      v = int(self.request.get(param_name, default_value=default_value))
      if v <= 0:
        return default_value
      else:
        return v
    except TypeError:
      return default_value
 
  def extract_params(self):
    """ Extract the query parameters from the URL and, after validation returns them as a
        dictionary.
    """
    return  {
              'first_result': self.extract_positive_int('first_result', 1),
              'max_results': self.extract_positive_int('max_results', None),
              'order': self.validate_order(self.request.get('order', default_value=None))
            }
  def search_youtube_videos(self, keywords, params={}, related_id=None, regionCode=None):
    youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)
 
    #prepare the parameters for the list method
    search_params = {
        'q':keywords,
        'part':"id,snippet",
        'type':'video'
    }
 
    if not related_id is None:
        search_params['relatedToVideoId'] = related_id
    if not regionCode is None:
        search_params['regionCode'] = regionCode
    if not params['order'] is None:
        search_params['order'] = params['order']
    if not params['max_results'] is None:
      search_params['maxResults'] = params['max_results']
 
    # Call the search.list method to retrieve results matching the keywords.
    search_response = youtube.search().list(
      **search_params   #unpack the dictionary to a list of named parameters
    ).execute()
 
    # Filter results to retaun only matching videos, and filter out channels and playlists.
    result_transform = lambda search_result: {
                          'id': search_result['id']['videoId'],
                          'title': search_result['snippet']['title'],
                          'thumbnail': search_result['snippet']['thumbnails']['default'],
                          'date': search_result['snippet']['publishedAt']
                         }
    return map(result_transform, search_response.get("items", []))
 
  def responde_with_results(self, results):
    self.response.headers['Content-Type'] = 'application/json'
    self.response.out.write(json.dumps(results))
 
class SimpleVideoSearchHandler(VideoSearchHandler):
  def get(self, keywords):
    params = self.extract_params()
    query_result = self.search_youtube_videos(keywords, params=params)
    self.responde_with_results(query_result)
 
class RelatedVideoSearchHandler(VideoSearchHandler):
  def get(self, keywords, related_id):
    params = self.extract_params()
    query_result = self.search_youtube_videos(keywords, params=params, related_id=related_id)
    self.responde_with_results(query_result)
 
class VideoSearchHandlerWithRegion(VideoSearchHandler):
  def get(self, keywords, regionCode):
    params = self.extract_params()
    query_result = self.search_youtube_videos(keywords, params=params,
                                              regionCode=regionCode.upper())
    self.responde_with_results(query_result)

 
Methods are pretty standard:

  • extract_params extracts the filtering and ordering parameters from the URL, using a couple of helper methods to ensure they assume acceptable values
  • responde_with_results writes the results to the response stream
  • search_youtube_videos, instead, performs the actual operations on the YouTube Data API

There is, however, one step that is noteworthy: Since we have to specify different parameters according to the type of constrains we need to enforce, we could be tempted to write a different method for each case, or at least to handle the different cases with an if/else waterfall.

But there is a more compact way to do it, using the Python unpack operator for dictionaries: By prepending a double asterisk to a dictionary when (and only when) using it as a function parameter, it will be unpacked into a list of named parameters.

In other words:
search(a=1, b=2, c=3) Is equivalent to:
search(**{a:1, b:2, c:3}))  
As you can see, the three versions of the get method in the children classes are pretty similar, except for the number and kind of parameter they get and in turn pass to search_youtube_videos; this looks, again, like a cry for DRY. In the final code provided on gist, we applied this further optimization, together with Python conventions to highlight methods intended as private. (As you may know, there is no information hiding in Python--not in the same way information hides in Java or C++, anyway).

Of course, as with every principle and rule, you need to take a number of parameters into consideration before applying them. For example, every function call causes a little decay in performance and requires a little more memory (for the call stack). When using Python classes the difference is particularly consistent. Therefore, if we are designing a set of function that will be called very often in a time-sensitive context, we might prefer to trade design and maintainability for better performance. But, remember: This kind of tuning should be limited to time-sensitive application and performed with the help of a profiler to spot the critic parts inside your code for which optimization can actually produce improvements.

One last thing we need to implement: the filter on the first result retrieved. This is actually one thing that has been made a little bit more complicated in the new API, in comparison with the previous ones.

Using API v1 and v2, you could just set two parameters--start_index and max_results--and you’d get (at most) max_results entries, starting from the one at index start_index.

In Data API v3, results are paginated. This means that when you start a search, you are delivered a “page” with at most 50 results (five by default, but you can set this value using the new maxResults parameter, as we discussed earlier). If you try to ask for more than 50 results, an HttpError will be raised.

Instead, for each query you run, you get a token each for the previous and next page of results. You can use it to run another query to retrieve that page of results. Add that page size is not fixed, but depends on the maxResults parameter. (So, you can see how this can lead to confusion.)

To make sequential navigation of results easier, some schools of thought encourage developers to save an instance of the search response and use caching to a significant degree. However, this forces developers to run four queries and transfer 150 useless records even if they are just interested in the, say, 151st  to 160th results. The model does make sense from YouTube’s point of view, apparently leading to improved efficiency on back-end servers. In any case, this system complicates the process of applying index-based filtering, and there are still open issues that prevent it from working properly in all situations. Since it goes well beyond the purpose of this post, we will just consider, as an example, filtering results between the first and the 50th.

The search_youtube_videos method only changes slightly:   def search_youtube_videos(self, keywords, params, related_id, regionCode):
    youtube = build(YOUTUBE_API_SERVICE_NAME, YOUTUBE_API_VERSION, developerKey=DEVELOPER_KEY)
 

    #prepare the parameters for the list method
    search_params = {
        'q':keywords,
        'part':"id,snippet",
        'type':'video'
    }
 

    if not related_id is None:
        search_params['relatedToVideoId'] = related_id
    if not regionCode is None:
        search_params['regionCode'] = regionCode
    if not params['order'] is None:
        search_params['order'] = params['order']
 

    first_result = params['first_result'] if 'first_result' in params else 1
    #To leave things simple, it checks that the first falls in the first results page
    if first_result >= YOUTUBE_MAX_RESULTS_PER_PAGE:
      raise HttpError("Invalid parameter: first_result must be lower than %d" %
                     
 YOUTUBE_MAX_RESULTS_PER_PAGE, 403)
 

    if not params['max_results'] is None:
      max_result = first_result + params['max_results']
      search_params['maxResults'] = max_result
 

      #To leave things simple, it checks that the last result falls in the first results page
      if max_result > YOUTUBE_MAX_RESULTS_PER_PAGE:
        max_result = YOUTUBE_MAX_RESULTS_PER_PAGE
 

      search_params['maxResults'] = max_result
 

    # Call the search.list method to retrieve results matching the keywords.
    search_response = youtube.search().list(
      **search_params   #unpack the dictionary to a list of named parameters
    ).execute()
 

    # Filter results to retaun only matching videos, and filter out channels and playlists.
    result_transform = lambda search_result: {
                          'id': search_result['id']['videoId'],
                          'title': search_result['snippet']['title'],
                          'thumbnail': search_result['snippet']['thumbnails']['default'],
                          'date': search_result['snippet']['publishedAt']
                         }
 

    return map(result_transform,
       search_response.get("items",[])[first_result:search_response['pageInfo']['totalResults']])
 
We have just added some validation for the filtering parameters, and, in the last line, we get the total number of results from the dictionary returned by the API call. This value will be, at most, equal to search_params ['maxResults'], but it can be less if there aren’t that very many results. The value can be used to retrieve all the results found, starting from first_result.

Do you have any additional advice for creating a RESTful API? Experiences to share? Please let us know in the comments section.
 

Marcello La Rocca As a developer I'm focusing on JavaScript, Python and Java (Android), but I have a weakness for algorithms. Lately, apparently I also became a tech blogger! My personal blog: mlarocca.github.io - Follow me on Google+

Comments

User HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.