soup rocks

Quite inspired by my sister writing articles on blogger.com,I too join the race and would try and write some interesting ones here.So here I go...

Sunday, June 18, 2006

MINING RESEARCH PAPERS: making of an interface to use PATMAP for mining research papers

Research papers have been an important mode of research transfer.
In recent years, there has been substantial public and private interest in the concept of technology transfer, especially, but not exclusively, at universities. This is important to inventors, researchers and small entrepreneurs looking to develop innovative technology, as well as technology firms striving to create new innovations, manufacturers conducting research and development (R&D) to generate new products, investors looking for new growth companies, and government officials seeking to find ways to spur and support economic development.

There are many search engines currently providing databases of various publications, eg google scholar (www.scholar.google.com) , proquest (www.proquest.com) etc. We have concentrated our work to google scholar. Google Scholar enables specific searches of scholarly literature, including peer-reviewed papers, theses, books, pre-prints, abstracts, and technical reports. Content includes a range of publishers and aggregators with whom Google already has standing arrangements, e.g., the Association for Computing Machinery, IEEE, OCLC’s Open WorldCat library locator service, etc. Result displays will show different version clusters, citation analysis, and library location (currently books only). Although claiming coverage “from all broad areas of research,” early evaluation seems to show a clear emphasis on science and technology, rather than the arts, humanities, or social sciences. Google does not specify the number of journals or publications it has included.

Searching these databases has been a major problem as
1. Search engines give thousands of research papers in which it is difficult to find the research paper of your relevance.
2. Searches are performed on the basis of keywords or a Boolean combination of them(AND,OR,NOT etc) being entered by the user, rather than the content of the paper.

Advanced search may allow the user to specify keywords in the sections namely title, abstract, author, publisher etc. Search can be done as per the dates too.

Mining process aims at extraction of the technology development path and technology mapping from the database of research papers. Text mining based on co-citation co-classification or co-word analysis can reveal relevant links. Text mining is semi automatic in nature and requires human intervention. Data / text mining is defined as the non-trivial, semiautomatic extraction of implicit, previously unknown, and potentially useful information from the data.

We have focused on reducing the irrelevant search results obtained by user after entering his set of keywords. Currently there are thousands of results being displayed. We aim at creating an intermediate platform for user to allow him to reduce his search to the areas of his interest.

PATMAP: a medium to find your patents in USPTO DATABASE
Refer for details: The article in Directions written on text mining. The same concept has been used to map research papers. An interface has been created to make the search more accurate and to enable the software to take in research papers instead of Patents


Vesutek: a medium to get your research paper

We have made an attempt to work on the results of google scholar and to reduce the results to user specified research papers.

The problems faced and the solutions to those:
• Papers might be present in different formats. That is text, html, pdf etc. All different documents can’t be parsed through. So all the documents are reduced to HTML format and the an attempt is made to
parse them through hyperlinks without storing all the data.
• Semi automatic process may prove to be very tedious. The process can be sorted out to automatic process (though there is a loss of precision) if desired by the user through clustering. The top ten or twenty results proposed by the google can be entered into the database as the relevant links. Now the irrelevant pool consists of both, the results considered relevant and the rest results. These are mined. The process can be repeated as many times as requested by user. This can make the work of user easier
• The research papers lying in one class should not lie under any other class. These classes can be dealt as topics. The user can choose the topic for his search and can thus reduce his results substantially.

• Classifications being made may just be eliminated in the case of automatization of user interface.



Work done:
1. Link grabber has been made to catch the hyperlinks given by google. The work of mining is proposed to be done without downloading the HTML pages.
2. The HTML document has been made to enter the database for a specific publication.(Synergy Blackwell)
3. The previous works on mining of patents have been appreciated and an attempt to use the concept in case of research papers has been made.

Work proposed:

1. To make the process quicker and to reduce the complexity (in terms of time and memory) an algorithm is to be made to pick in top ten results of google scholar or any other search engine for that matter and to perform iteration considering those results relevant. No of iterations can be user specified.
2. The work would always have higher precision with discriminant analysis rather than just using vectors to represent documents (reference: vinod’s thesis).

Scope for future work:

• Working on proquest could be much more beneficial than working on google scholar for the same methodology. That would enable the user to scan through the entire database as proquest provides all the research papers in a single format.
• Although an attempt has been made to make the ware user friendly but that area still has scope for further work using clustering.

……………x……………………x…………………………x……………

Special thanks to Dr. Veena Bansal for giving me an opportunity to explore the world of data mining.I would even like to thank Sunita Shukla for her support and for doing all the coding without which nothing would have been possible.I would also like to thank vinod singh rathore for sharing his experience in this field with me
It was a pleasure to work at IME department IITK.

References
1. Thesis by Bhuvan Mirdha IME department IITK
2. Thesis by Vinod Singh Rathore IME department IITK
3. An article on data mining by Dr. Veena Bansal and Dr.A.K. Mittal in Directions volume 7 no. 3.

if there has been a problem regarding understanding PATMAP.....wait for my next post

Wednesday, June 14, 2006

Relations!

I've had pretty few relations with people which have been fun. But have learned a good deal in the process. Expectation has been the motivation driving all relations.

This winters I was at IIT B. There I met one of my old pals, Vijay. He had written an interesting quote on his wall(with a sketch pen I guess),"don't expect anything great from others and try to live up to others expectations". I had hardly been in his room for an hour or so and had just given a glimpse to those lines. But those lines had certain depth in it or had been charged to a certain potential that they just struck me. Being social seems to be a meer formality to live up in the society. Its pretty tough to find people in this world who have the same frequency as you have. It is a game of harmonics and overtone. Ceratin people are in phase with you while others are out of phase. Its not just a matter of finding good people although they too are few, but it's a matter of finding people who can stay with you and enjoy your company rather than just adjust with you. To understand this you must consider your self as a wave(phisics can't help it) and determine your wavelength in terms of your personality. It does help in a lot of issues away from just building relations. It makes you determine your strenghts, weaknesses and how flexible you are.

All alone in Kanpur! Devil hits me.

This post is not ment for everyone.You would be able to appreciate it only if you have been in touch with me.

For the last five years I have been away from my home. For the first(2001-03) two years(11th and 12th)I was physically at home but was mentally lost in my PCM books. Those two years literally dried many emotions in me. The desire to get through JEE had made me indifferent to everything else happening around me except my acads. The only answer I could imagine to all my problems was getting through my pre engineering entrance. I had lost my communication skills; I literally couldn't speak at social gatherings. I forgot what happiness actually was. There was a good long period of over two years that I had not laughed my heart out, and truly speaking I couldn't sense that during that period. I had even lost my sense of humor (which I still haven't regained) which I used to have in the school days. I used to be a jolly character in school troubling teachers throughout. At home I used to tease my sister and kept giggling whenever mama was angry. My sister got married during that two year period too. I hardly remember spending any time with her in those two years. I still feel disheartened for that. I did get through JEE,its a three year old story now but......the cost I paid was pretty expensive. The kind of pressure that is created on students in these competitive exams is not healthy. It actually reduces the potential of a student.

Now for the last three years I have been away from home physically. The desire to achieve no more sprigs in me. The word competition has lost its meaning; I don't find anyone around competing with me in any sense. Now all my actions are just concentrated to one aim, that’s developing myself the way I want it to be. To be emotionally stable and to give weight age to all the small events which pass by. Fun for the first time has found place in my being. I have analyzed my actions pretty late, but now that I have analyzed, I'll stick to the job that I should do, rather than doing what is idealistic. One really has small needs, our aim should concentrate to satiate them. So, I for a change have started attending social gatherings,marriages etc. All those changes need no description.

IT IS IMPORTANT TO DETERMINE YOUR FUTURE NEEDS AND YOU MUST ACHIEVE THEM WITHOUT SACRIFISING YOUR PRESENT.
BEING AT THE TOP IS NO ISSUE, DOING EVERY THING REMAINING ABOVE AVERAGE IS MORE IMPORTANT.