How To Query YouTube Through Its Python APIs

Marcello La Rocca, Data Visualization Engineer, SwiftIQ
Mar. 26 2014, 09:00AM EDT

Video is becoming increasingly important to a growing number of businesses. Whether it's an internally produced video showing off your wares or the integration of customer videos in the context of social networking, video has become a key business tool. YouTube has become pretty much synonymous with video, and YouTube APIs let you bring the YouTube experience to your webpage, application or device. In this post, we will explore how to perform queries on videos stored on the video-hosting website and retrieve feeds according to different criteria. The basic principles involved can, of course, be applied to any other hosting service and for different programming languages, but here we will evaluate the process through the lens of the YouTube Data API for Python. First things first: The latest version for these APIs is Version 3; previous versions are still functional, but quite limited in comparison with the newest version. You can find code samples in several languages, tutorials and full documentation on the Google Developers portal. YouTube makes three sets of APIs available:

  • The Data API, which is the one we are going to review in this post. “The YouTube Data API (v3) lets you incorporate YouTube functionality into your own application. You can use the API to fetch search results and to retrieve, insert, update, and delete resources like videos or playlists,” states YouTube on its Developers Page.
  • The Player API, which lets you incorporate a YouTube video player inside your page; start, pause and stop a video; and execute actions when reproduction of a video is completed (as, for example, start playing another video or play the current one in a loop)
  • The Analytics API, which lets you create applications to retrieve statistics on videos and channels

Getting started

To use the YouTube Data API, you first need to have a Google account. (Register here if you still haven’t done that.) The next step is to create a new project on Google Cloud. If you prefer, you can also use your own, existing application, but it might be easier to follow the code below--step-by-step--with a new, empty application. If you feel comfortable with Google App Engine (GAE), you might prefer to skip to the YouTube Data APIs-specific code. If you go to your Cloud Console and select Create Project, you’ll be prompted with the following form: Fill in all the fields, then read and agree to the terms of service. If following the steps we lay out here exactly, you will need to come up with a unique project code. Simply replace every occurrence of "pweb-14" in the following code with your ID. Click on "Create," and the application will be added to your panel. You’ll be brought to this page https://console.developers.google.com/project/apps~YourProjectID. Here, YourProjectID is a placeholder for your actual project’s ID (pweb-14 in our case). Next, you need authorization to use the YouTube API. Just follow the instructions on the “Getting started” page. Since we aren’t planning to access private user data for this project, we don’t need OAuth 2 authentication; we can make do with just a server key to access the YouTube Data API from a server application. From the new application’s page (if you closed it you can access the same page by going to the Cloud console and select the application), click on "APIs & auth" in the menu panel on the left, opening a submenu. From this submenu, first choose APIs, then scroll down the list of APIs until the end, where you’ll find an item named “YouTube Data API v3.” Click on the "Off" button at right in the same row to enable the use of this API from your app. Then, from the same submenu, choose "Credentials" (it’s just below "APIs"). In the new frame that will be loaded, you’ll find two sections: Next, click on the "Create New Key" button, then click "Choose Server Key" and then "Create." (If you need to use advanced options such as IP capping, go here for more information; you’ll be able to edit this property later anyway.) The Public API access section will look like this now. You should copy the value in the API key field (masked in the image above) because because you will soon need to use it.

Setting up the local environment

The next thing you need to do is download a local environment for GAE and set it up to use Python. First, you need to install Python 2.7, then take a look at the instructions for downloading the latest installer for your system for GAE. Once everything is installed, you can run the GAE Launcher. The first time you run it the list of projects will be empty. Choose "Create a New Application" from the File menu to add your new app to the local environment. For our test app, we used the following parameters: Take care here with the application name (you need to type in your project ID to deploy it later), and with the port and admin port: They will be needed to access your app and the admin console for your app, respectively. Click on "Create," and then select and run your new project from the GAE admin console. Then try to connect to http://localhost:9084/ in your browser. (Just click the "SDK Console" button in the GAE launcher, with the newly created app selected). If everything is working correctly, you’ll have access to the developer locale console for your app. The console is quite a valuable tool: It can be used to--among other things--inspect, modify and flush your local datastore server and your app’s cache, check the queued tasks and set up cron jobs.

Hands-on: Setting up a GAE project

The next goal is to build an application that (for the sake of simplicity) takes a query as a GET parameter (a parameter encoded in the URL) and returns a list of results (taken among the videos only) in JSON format. First, you need to set the behavior of your app when its URL is surfed. In our tests, we set up a handler that mapped all the various paths we wanted to handle for our website into the proper Python methods. (In other words, we needed to tell the app what actions should be taken when a certain path is accessed.) In your favorite editor, open the main.py file (the main file for the app; the name can be changed if it is synched with app.yaml, but that goes beyond the scope of this post). You’ll see the following code: Importing the webapp2 module extends the RequestHandler class and overrides its GET method; instances of this class will be capable of handling GET requests to our page. The last command in this file makes use of the WSGIApplication in the webapp2 module to associate the path ‘/’ (the base path for our app) to the class defined above. As you can see, this method takes as its first parameter an array of tuples. In fact, we can define any number of classes, like MainHandler, or handle as many path-handler associations as we like. In tests we could express these paths using even regular expressions and extract parameters from them. This allowed us to create RESTful APIs. Again, however, this goes beyond the scope of this post. We next added a new class to handle the requests to the ‘/json’ path, extracted the query text from the URL parameter and performed our search. The method JsonHandler.get will be called when you access both ‘/json’ and ‘/json/’ in your app. In the code above, this method doesn’t actually do anything--for the moment. Next, extract the query parameter from the request and check it. If no such parameter is defined, you can either return an empty object or an error code. Otherwise, if the query text has a minimum length (let’s say 2 chars), you can actually perform the query on YouTube. We decided to return an empty list if the query wasn't valid. That way, no matter what we needed to return, we could simply add the following lines right after the code shown above: This sets the content-type header to notify we are going to return data in json format and then writes it to the output stream. (We must, however, also add import json at the top of our file.)

Hands-on: Retrieve video results by keyword

At this point, the app skeleton is ready. You can check that it is working by accessing http://localhost:8084/json and verifying that a couple of square brackets are printed. We still haven’t implemented the query on YouTube. First, we need to add a couple of new imports at the beginning of our file: The first one is a method that will be used to construct the instance of a YouTube API object through which queries can be performed. HttpError is an Error class that we are going to use to check whether the ajax call to the YouTube servers has been successfully completed or if there was any connection that would prevent the app from receiving results. (In fact, we’ll see it again shortly.) Now, apiclient is not in the list of third party libraries provided with the Google App Engine environment. This means that if you are going to run your application after including those two lines, the app will stop with the following error message: To solve the problem, you need to download the apiclient library from its project page (at publication time, google-api-python-client-gae-1.2.zip is the right zip file to download), then unzip it in your project’s main folder (the folder containing main.py). You shouldn’t need to, but if that doesn’t make the error message go away immediately, stop and restart your app from the GAE console. Next, you will need to set the parameters you are going to use to create the interface object above, including authentication. Remember that API key we recommended that you keep handy? It’s time to use it to replace the text REPLACE_ME (leaving the quotes in place) as the DEVELOPER_KEY. Let’s examine the method above line by line: When we define the method, we declare a second parameter, with default value 20. This will be used to set a cap on the number of results retrieved from the YouTube server. In theory, we could have skipped this step, but in practice there are several reasons why you want to limit the number of results you get through the API.

  • For security reasons, each front end request on GAE servers must be completed within 60 seconds (cron jobs, within 600 seconds); this means that if you have to run a query (whether on YouTube servers or on your database) that dumps lots of data, it will probably exceed the time limit. The process will be killed, and your client will receive a 500 error response. In other words, your application would become completely unusable.
  • Usually GAE charges according to how much bandwidth you consume, especially on dataset queries and APIs calls. This means that the more data you retrieve, the more you are likely to exploit the free bandwidth and consume paid services. Therefore, you’d better think twice before dumping data you don’t really need from any database. This applies equally to rows (don’t request records you don’t need or more records than you can handle) and to columns (avoid ‘select *’ queries; just retrieve the fields you really need to access).
  • Even if, despite downloading useless data, you manage to stay within the limits mentioned in the previous points, processing more data means a slower response to clients. It might be tenths of a second for a single request, scale that for millions of requests and you’ll have a crashing website.
  • For all of the reasons above, and probably a few more, YouTube Data API has a default value of 5 for the maxResults parameter, and it can’t be larger than 50. This line creates the object we’ll use to actually access YouTube Data API. It is our remote interface to YouTube’s servers. Here, we perform the search by keyword. The method search returns an interface for performing queries through the API. Its list method actually runs it, taking as parameters the actual query text; which part of the records we want to retrieve; and the maximum number of results we want to download. It is also possible to include extra parameters--for example, the age of the video or how the results should be ordered. (See here for more.) This is a function definition (using lambda notation); this helper method is aimed at extracting from each result only the fields you want to keep. (It provides even finer filtering than the that caused by the part parameter of the search().list method above.) We then filtered the results to only keep the videos; we could have obtained the same result using the type parameter for the search().list call. This provides two advantages: It avoids transferring useless results that would be just discarded, and actually retrieves as many results as we had asked for (by setting the maxResults parameter). If we perform the filtering after, we will likely end up with less than maxResults videos. The less efficient way, however, was a good chance to show you interesting concepts like filtering and lambdas, but you’ll find the most efficient code in the gist with complete example. Finally, apply the result_transform function to every result left after filtering, and return its result. At this point, all you have to do is put it all together and test it. Here you can see the final result. This example should give you enough info to get started using YouTube Data APIs and experimenting with different features. In our next post, we’ll show you how to put all this knowledge to good use by showing you how to design your own RESTful API that returns video results.
Marcello La Rocca As a developer I'm focusing on JavaScript, Python and Java (Android), but I have a weakness for algorithms. Lately, apparently I also became a tech blogger! My personal blog: mlarocca.github.io - Follow me on Google+

Comments