Introduction to Information Retrieval
CMPE 493Instructor: Prof. Arzucan Özgür, Office: Computer Engineering Building BM 18
Semester: Spring 2024
Credits: (3+0+2) 4 ECTS 6
Course Description
There has been a striking growth in text data such as web pages, news articles, e-mail messages, social media data, and scientific publications in the recent years. Developing tools for accessing, managing, and utilizing this huge amount of textual information is getting increasingly important. This course will cover the basic technology underlying search engines, focusing on a wide range of topics including methods for processing, indexing, querying, and organizing textual data, as well as methods for web search, crawling, and link analysis.
Course Objectives
- Understand how search engines work
- Learn to process, index, retrieve, and analyze textual data
- Learn to evaluate information retrieval systems
- Learn about web search, crawling and link analysis
- Develop IR systems for finding useful information on the Web and other textual collections
Course Information
- Schedule: Mondays between 13:00-14:50 and Tuesdays between 15:00-15:50
- Website: Course content will be available at Moodle
Textbook
- Introduction to Information Retrieval
- Authors: Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze
- Publisher: Cambridge University Press, 2008
- Website: http://nlp.stanford.edu/IR-book/information-retrieval-book.html
Reference Books (Optional)
- SPEECH and LANGUAGE PROCESSING
- Authors: Daniel Jurafsky and James H. Martin
- Edition: Draft of 3rd edition available at
- Website: https://web.stanford.edu/~jurafsky/slp3/
Tentative List of Topics
- Boolean model; text pre-processing; inverted indices
- Approximate string matching and bi-gram retrieval
- Index construction and compression
- Vector space model; text-similarity metrics; term weighting; ranked retrieval
- Evaluating information retrieval systems
- Relevance feedback; query expansion
- Probabilistic Models for information retrieval
- Text classification and clustering
- Word Embeddings for IR
- Web search and crawling
- Link analysis (e.g., hubs and authorities, Google PageRank)
Course Requirements
The lectures will take place on Mondays between 13:00-14:50 and Tuesdays between 15:00-15:50. You are encouraged to attend and actively participate in the lectures.
The programming assignments will be done individually and will involve intermediate-level programming where you will implement and test some of the techniques that we cover in class using Python.
As term project, each team will design and implement a system related to IR. The teams will give short project progress and project final presentations in front of the class describing the methods used, the results obtained, and the challenges encountered. Each team will consist of two or three people.
Grading
- Programming Assignments: 50%
- Term Project: 30%
- Exam: 20%