PW Forum / Architecture for a new Search Engine

You are not logged in.


#1 2009-11-14 11:44:52

skyrenderx
New member
Registered: 2009-11-14
Posts: 1

Architecture for a new Search Engine

I plan on building a search engine using the new Bing API.  Through the API, Microsoft has made available free access to their search engine results.  My idea is to combine Bing search results with cached images from the websites returned in the results.  This would be somewhat similar to duckduckgo.com (although they use the Yahoo! BOSS API).

The only problem is that although I知 an excellent coder, I知 mostly do desktop apps.  I知 trying to decide on a very high-level architecture for my system, but I知 not familiar enough with web technologies to know what tools are available, and which ones would be best suited for my project.  What I would like to do is describe what I currently have in mind and get feedback on the proposed architecture.

Everything will run on a single dedicated webserver.  There will be two components, the first being an Apache HTTP server.  The Apache server will run my custom mod (written in python) which will use the Bing API to generate web results based on the user input.  The mod will take the XML output from the Bing API, parse it, and generate a new page containing the search results which will be sent back to the user.  The search results will be combined with the cached images, stored in a SQL database (mySQL perhaps).  The XML parsing along with the subsequent search results page generation will be implemented in pure python (i.e., no special tools, I値l just parse the XML by scanning character by character, and generate the corresponding HTML by writing out the required characters). 

The second component will be an Image Indexer, implemented as a stand-alone python application.  It will retrieve and cache images from sites which have not yet been indexed.  This will run as a separate process on the webserver.  Whenever the Apache mod retrieves search results containing a website which has not yet been indexed, the mod will inform the Indexer which will subsequently visit the website and retrieve and cache an appropriate image.

Any thoughts, comments?

Offline

 

Board footer

Powered by PunBB
© Copyright 2002–2005 Rickard Andersson

Follow the PW team on Twitter

ProgrammableWeb
APIs, mashups and code. Because the world's your programmable oyster.

John Musser
Founder, ProgrammableWeb

Adam DuVander
Executive Editor, ProgrammableWeb. Author, Map Scripting 101. Lover, APIs.