How To Use Dailymotion API to Leverage Video Search

Marcello La Rocca, Data Visualization Engineer, SwiftIQ
Jun. 12 2014, 12:00PM EDT

In our series on video APIs, we have so far focused on the YouTube Data API--in particular, on its subset devoted to video search. But YouTube is not the only video hosting service on the Net, and competitors are gaining slices of the market. One key contender is Dailymotion, a French social video site. Popular in Europe, Dailymotion has one very big thing going for it: Very often, in countries where YouTube has been censored (such as in Turkey), Dailymotion has not been blocked. Further, there has been much interest in Dailymotion of late, including Yahoo's offer to acquire the company. Although that deal was blocked, Yahoo’s and others' interest indicates that Dailymotion is a service worth watching.

In this report we will focus on the video subset of the API.

To better understand how to make it work, we should probably start from the end: the response type of the API. All responses uses the JSON format, but there are two kinds of responses: one returned when you query for a single item (like when you want to retrieve a single, specific video) and the other one when you expect the answer to be a list of items.

The former consist of a list of properties whose keys match the fields requested. Consider, for example, the following (fictional) response:

It corresponds to the query https://api.dailymotion.com/video/x137on6?fields=id,language%2Cowner.screenname
The list response type, instead, is actually based on the item type, because it consists of a list of items, plus a handful of fields that helps us navigating through the results:

  • page: The current page number (for the results). By default, page 1 is returned, and its max value is 100. You’ll notice that Dailymotion, while still using the pagination pattern, is much more flexible than YouTube when you have to navigate through results because you can set the page size and pass the page index to request specific pages. (The index must be in the range 1..100).
  • limit: The current (max) number of items per response page. You can change this limit using the limit parameter, but the maximum number of items per page is 100, and you must be particularly careful when using this parameter and paginating results. For example, the 101st overall result can be the 1st result of the second page, if limit is set to 100, or the 1st of the third page, if limit is 50, or yet the 11th of the 4th page, if limit is 30.
  • has_more: A boolean property that indicates whether there are more pages with results after the current one.
  • total: This is an optional property that, when present, provides an estimate of the total number of results in the query. It is not to be trusted if your goal is to check whether there is a next page of results. (You should rather use has_more for that purpose.)

  A typical list response:

Now we can focus on requests syntax. Dailymotion does a great job with documentation and interactive tools to help you build your own requests. Let’s first see the list of methods available--in particular, the “base” methods:


 

As you can see, the API is designed to use different HTTP methods depending on the side effect that the method produces. To perform a search on Dailymotion, we can use the /videos sub-path. By clicking on the “List videos” row on the explorer page, it is possible to expand the method and see its parameters list:

As you will see here, there are several additional fields that can be set to narrow down your query; we will be particularly interested in a few of them:

  • fields: as mentioned, we can provide a list of the fields we want to retrieve for each item--in this case, “id”, “title”, “thumbnail_url”.
  • country: the country of origin for the video (takes ISO 3166-1 alpha-2 codes, as YouTube).
  • page: which page of results we need to retrieve.
  • limit: how many results per page are to be retrieved (default: 10).

An example URL for this request could be: https://api.dailymotion.com/videos?fields=id,thumbnail_url%2Ctitle&country=it&search=pizza&page=2&limit=50
 
To retrieve data for a single video, instead, we can use the URL format we have examined in the previous section. (Check this video out: https://api.dailymotion.com/video/x148eg3?fields=id,language%2Cowner.screenname.)
 
If you need to find the videos related to the one above, the Graph API makes it very easy with a different method:


For instance, try this: https://api.dailymotion.com/video/x148eg3/related?fields=id,title&language=it

As for all the other methods, by clicking on the relative row,you will be presented with a form that helps you to fill the parameters needed and create the right URL to query the API.

Taking a step back, in the previous section we showed three more methods: Two of them use the POST HTTP method to create and edit a video, while one uses DELETE to remove a video from Dailymotion.

It is worth mentioning, at this point, that the Dailymotion API provides different kinds of access for the APIs:

  • Unauthenticated access, for read-only methods, has no side effects and does not access personal information.
  • For methods requiring writing privileges--like create, edit, and delete--or that access personal info, OAuth 2.0 gets an access token for Dailymotion users via a redirect to Dailymotion. After you obtain the access token for a user, you can perform authorized requests on behalf of the user by including the access token in your API requests.
     

There are two different OAuth profiles:

  • The user-agent profile, based on developer keys, for client application residing in a user-agent (typically implemented in a browser using a scripting language such as JavaScript). Since clients cannot keep secrets confidential, the authentication of the client is based on the user-agent’s same-origin policy.
  • The web server profile is suitable for clients capable of interacting with the end-user’s user-agent (typically, a web browser) and capable of receiving incoming requests from the authorization server (capable of acting as an HTTP server).

Other Methods
In the video-related subset of Dailymotion Graph API, there are more and more methods to tackle different aspects of the social experience for video services:

Comments, for example, can be easily inserted, deleted or retrieved. (For the former two, you’d obviously need authentication with OAuth first.)

The list above shows the REST URLs of the methods used to operate on playlists.

Moreover, you can work on groups, records, subtitles, and so on. (Take a look here to see the complete methods list.)

Let’s get started
We can now move to the next step and implement the same RESTful interface we implemented to search YouTube APIs. To do so, we have to take into account a few small differences:

  • Some parameters have different names in the two APIs:
YouTube Dailymotion
Order Sort
maxResults
(max: 50, default: 5)
limit
(max: 100, default: 10)
pageToken
(alphanumeric hash)
page
(integer)
Q Search
Parts Fields
  • The result is a JSON object in both APIs, but the structure is completely different (hierarchical in YouTube, flat in Dailymotion), and corresponding fields in the results have different names.

 

Field in response YouTube Dailymotion
id result['id']['videoId'] result['id']
url https://www.youtube.com/watch?v=” + result['id']['videoId'] result['url']
title result['snippet']['title'] result['title']
thumbnail result['snippet']['thumbnails']['default']['url'] result['thumbnail_url']
date result['snippet']['publishedAt'] result['created_time']
(it also has a different format)
  • URLs for the videos are straightforward in YouTube (just add the video ID to a common prefix); they need to be retrieved from the server in Dailymotion.
  • The sort field in Dailymotion accepts different values than the order one in YouTube. (For the moment, this  is not a big deal, but if we want to make our API uniform and if we want to make it work consistently, we will have to somehow map these value into one another.)
  • The Dailymotion API to retrieve related videos won’t allow parameters like ‘search’ (the keywords for the query) or ‘sort’. While the difference is only formal for the former parameter (YouTube allows it, but its value is disregarded), the latter is the most relevant difference with YouTube API: You can’t query Dailymotion for related videos and retrieve the most recent ones instead of the most relevant ones; YouTube does provide this ability. To achieve a consistent API, we would probably have to ignore this parameter for YouTube.
    • To avoid confusion in our API due to the ‘search’ (‘q’ in YouTube) parameter in queries for related videos, we are going to slightly change our rules for the path handler, removing the :keywords parameter for related searches.

Moving through paginated results is much easier in the Dailymotion API than in YouTube. However, we can use the same mechanism for both providers. The only difference is that for YouTube we save a table on memcached with the pageToken for each page, while for Dailymotion we can just use the page number.

Speaking of memcached, we are going to cache results for the different providers individually, so that we’ll have a greater freedom in combining them in a later improvement.

Google App Engine memcached quota is 32MB for the whole app, and 1MB for each key. To simplify management and optimize execution, we are going to retrieve from the servers and store full pages of results from the providers. Since the limit in Dailymotion is flexible, we’ll use the same limit as YouTube--that is, 50 results.

Entries on memcached will be saved for each query entered, also making distinctions between queries about related videos, and queries restricted to particular countries. Since 50 (uncompressed) results will require approximately 10KB, and assuming queries won’t go past the third page of results on average (actually, they probably require less than two on average), at a single moment memcached should be able to retain the results of the thousand “hottest” queries.

To improve the cache hit ratio, we can zip the result before storing them to memcached and unzip them each time we retrieve them. It is very easy to make this improvement because current design already has two functions for storing and retrieving results (using a specific pattern to avoid race conditions, so that different server instances that try to update the same key-value pair concurrently won’t conflict).

There is, however, another catch: Memcached keys longer than 255 characters will be hashed, so there could be a small, but still not null, chance that two different (long) queries could be hashed to the same key. Since we don’t want a user searching for Kurt Cobain to get results for Miley Cyrus (bear with me, it is just to make an example--the actual queries should be much longer, and I can certainly think of more troubling situations), searches that result in memcached keys longer than 255 characters won’t be cached. (Although we expect this to be a rather uncommon situation, we need to find a balance between correctness and eventual consistency on one side and performance on the other when designing a system.)

In practice
We are going to extend the code developed for a previous post. The code will,  however, undergo a major redesign to allow us to handle different video providers. The final version of the code, including detailed explanations and documentation, can be found on this gist, but we’ll also examine part of it more closely below.

First, we need to add three new rules to our router handler. We will reserve the /videos path for the final, combined, search API, and use the /youtube path for queries on YouTube only and the /dailymotion path to query the French provider.


If you had changed the app.yaml configuration file as suggested in the previous posts, you’ll need to fix it accordingly, as well. (Go here for guidance.)

As you might notice, we also had to add several new Handlers, one for each new rule. An alternative would be capturing the first section of the path as a parameter (restricting it to be either “videos”, “youtube” or “dailymotion”), and have a single handler for the three of them. This would make the code more DRY, but the extra capturing and checking could also make it a bit slower and more complicated.

We also need to redesign our class hierarchy, introducing a common base class. This exposes a set of common methods and a few class parameters, so that the classes handling the specific videos provider can inherit them. Probably the most interesting among these methods are the ones dealing with memcached:

While the first one is pretty straightforward (but using it would allow us to easily introduce compression with a single-place modification in the code, as we mentioned above), the store_to_memcached method handles both keys too long and concurrency issues.

The base class, as well as the children ones, will be contained in a different module: youtube.py and dailymotion.py, so far.

The structure of the two classes is very similar. They both have a few helper methods, but their core is the search method; for the latter, after some computation to set the parameters and figure out which pages of results should be loaded, the data is loaded within the following iteration:

As anticipated, for Dailymotion we do not have an SDK, so we will use the web API and get the results through the urlfetch.fetch method, which basically performs a synchronous HTTP GET request. If everything is successful, we can process and catch the results, and then move to the next page of results to download.

You can test the result here, or check the code in more details and see the missing pieces on this gist repository; here, instead, you can find the documentation for the API we are building, created with a powerful new tool, https://speca.io/.
 

Marcello La Rocca As a developer I'm focusing on JavaScript, Python and Java (Android), but I have a weakness for algorithms. Lately, apparently I also became a tech blogger! My personal blog: mlarocca.github.io - Follow me on Google+

Comments

User HTML

  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • Web page addresses and e-mail addresses turn into links automatically.