soup rocks

Quite inspired by my sister writing articles on blogger.com,I too join the race and would try and write some interesting ones here.So here I go...

Saturday, May 27, 2006

Gooogling

I have been working on search engines this summer; doing a software project at IIT K.The work of these engines has impressed me a lot. I’d like to share some of my experience in this article. Mind you it’s the way I think the search engine works.
Google has a huge database. It scans your keywords in its database and not only gives you the relevant links but also gives it to you in the correct preference order! How does it do this?
Google searches html\PDF\word (and many more) documents in its database. It possibly converts all the pages into a single document type with single format(XML).This text mat be divided into many sub parts-title, body, references, author...etc. Now the keyword when being searched, are searched through each part of the text seperately. There is a grading of the hits. Hit is the presence of keyword in the text. If the keyword is found in the title, its graded higher, if its found in the body then is graded a bit lower and so on. All the hits in a link are cumulated to form a net rating of the page. These ratings are then compared to a standard rating to decide wether the search should be displayed or not.(yet we get thousands of results!).After all this the results are displayed in the order of their ratings. Isn't this interesting?
Well for the moment I am working on google scholar (scholar.google.com).I am trying to reduce my results to only the results relevant to the user. Lets see where I land up this summers.
keep rocking