Search Engine Project
An offline search engine that I built in Java for my Information Retrieval & Web Search course. This Search Engine can Parse, Index, Rank, and finally perform an efficient search for thousands of HTML webpages parsed from XML Files in a few seconds.
Parsing
XML Files were parsed using Document Builder and the following tags were extracted <HTML>
Indexing
Dictionary and posting list is saved in the following data structure : Map<String, List
Search
With the help of posting list we perform the search across 10700 html documents
What I Learnt
- Built an inverted index
- Importance of ranking in a search engine.
- Techniques to improve speed and efficiency in a search engine.
- Dealt with complications of parsing searchable information from a few cluttered and unorganized HTML web pages.