MedSearch: A Specialized Search Engine for Medical Information

In this paper, we present MedSearch, a specialized Web search engine that can assist ordinary Internet users to search for medical information. Today, people of different ages are thirsty for medical information to address their various healthcare needs. Existing Web search engines, however, cannot handle medical search well for several reasons. First, a medical information searcher is often uncertain about his exact questions, especially in the early stage of symptom development. Thus, he prefers to receive comprehensive, relevant information from the search results. However, existing Web search engines are optimized for precision by concentrating their search results on only a few topics. Second, an ordinary user is usually unfamiliar with medical terminology and has little medical background. A natural way for him to use a medical Web search engine is to pose long queries that describe his detailed symptoms in plain English – a way similar to what he would talk to a doctor. However, all existing Web search engines impose certain limits on query length. To overcome these obstacles, MedSearch uses several key techniques to improve its usability and the quality of search results. First, it accepts queries of extended length and reforms long queries into shorter queries by extracting a subset of important and representative words. This not only significantly increases the query processing speed but also improves the quality of search results. Second, it provides diversified search results based on information extracted offline from a large medical document set crawled from high-quality Web sites. Lastly, it suggests related medical phrases, extracted from the popular MeSH ontology, to help the user quickly digest search results and refine the query. We have evaluated MedSearch using medical questions posted on medical discussion forums. The results show that MedSearch can handle various medical queries effectively and efficiently.

By: Gang Luo; Chunqiang Tang; Hao Yang; Xing Wei

Published in: RC24205 in 2007

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc24205.pdf

Questions about this service can be mailed to reports@us.ibm.com .