[Editor's Note: This is a guest post from search engine expert Vanessa Fox, who created Google’s Webmaster Central, is now contributing editor at Search Engine Land, and hosts events like the Jane and Robot Search Developer Summit.]
Search engines such as Google have been making life a lot easier for developers. They have huge development resources that startups just don't have, and when they make their APIs available, everyone benefits. Google held Google I/O a couple of weeks ago CEO Eric Schmidt kicked things off by saying: "My message to you is that this is the beginning of the real win of cloud computing, of applications, of the internet, which is changing the paradigm that we’ve all grown up with so that it just works … regardless of platform or hardware you’re using.”
For instance, Google’s AJAX APIs are created with, well, AJAX, which is notoriously difficult to index. A core issue with AJAX is that it dynamically changes the content of the page. The URL doesn’t change as the new content loads. Often, the URL is appended with a hash mark (#). Historically, a # in a URL has denoted a named anchor within a page, and thus search engines generally drop everything in a URL beginning with a # as not to index the same page multiple times.
Other AJAX implementations don’t append anything to the URL, but simply dynamically load new content on the page based on clicks. Take a look at an example from the Google Code Playground.
In this example, each of the three tabs (Local, Web, and Blog) contains unique content. The URL of the page doesn’t change as the content reloads. You might expect one of two things to happen:
- search engines associate the content from the tab that appears when the page first loads with the URL
- search engines associate the content from all of the tabs with the URL.
You can see this in action for an implementation showcased on the Google AJAX APIs blog. The sample site uses a tabbed architecture to organize results, but all that shows up in Google search results is “loading”:
A similar thing happens with this page that uses AJAX to provide navigation through photos:
All Google sees is “loading photos, please wait…”
There are a variety of approaches to tackling these problems including: doing away with AJAX in some instances and use CSS and divs instead; you can gracefully degrade your AJAX implementation or use progressive enhancement techniques; or you can implement a technique that Jeremy Keith describes as Hijax (returning false from the onClick handler and include a crawlable URL in the href).
Last week at SMX Advanced in Seattle, I followed up this discussion with a presentation about conducting a technical audit of your site, particularly when diagnosing search issues that includes technical checklists that illustrate the potential pitfalls with technologies such as Flex and AJAX:
I also did a video interview with WebPro News about the top considerations in crawling and indexing problems and how to pinpoint them.
If you're interested in making your web applications crawlable then come out to the Jane and Robot Search Developer Summit I'm hosting in San Francisco tomorrow, June 12th to dive into these topics a bit more. We have lots of great speakers and experts who can help drill into any issues you might be having and come up with real solutions. If you can't be there in person, you can follow along using the hashtag #janeandrobot on Twitter and Flickr. You can also follow @janeandrobot and me, @vanessafox, on Twitter.