A mod of Terrier-4.0 with some additions and modification for doing IR experiments using TREC data. These notes supplement the existing documentation and clarify a few things.
More often than not, search engines like these are used as black-boxes in experiments, and the lack of documentation describing the system-internals makes it hard to interpret the results or debug experiments. The collected notes here is an attempt to look under the hood and help the experimenter be a more informed user of this tool.
- Settings for indexing TREC CD 1 & 2.
- Settings for indexing TREC CD 4 & 5.
- Settings recommended for indexing all text within the DOC tag of a TREC document. See the Javadoc comment block preceding the 'TagSet' class definition.
- Stop word list.
- Stemmer implementations available.
- S-Stemmer implementation.
- A vague term frequency normalization constant mentioned in the 'Weighting Models and Parameters' section.
- org.terrier.matching.models; Terrier-4.0 model list.