CH11 Reranker

Reranker is a key component used in the modern two-step search system (Two-Stage Retrieval System). Designed to perform efficient and accurate searches on large datasets, it primarily serves to re-rank the documents found by Retriever, the first step.

summary

Reranker works in the second stage of the search system, aiming to improve the accuracy of the initial search results. After Retriever quickly extracts relevant candidate documents from a large set of documents, Reranker analyzes these candidate documents more elaborately to determine the final ranking.

How it works

  1. Receive initial search results from Retriever.

  2. Queries and each candidate document are paired to process.

  3. Evaluate the relevance of each query-document pair using complex models (mainly transformer based).

  4. Readjust documents according to evaluation results.

  5. Outputs the final resorted result.

Technical features

architecture

  • Mainly using transformer-based models such as BERT and RoBERTa

  • Cross-encoder structure adoption

Input format

  • Generally [CLS] Query [SEP] Document [SEP] In form

Learning method

  1. Pointwise: predict the relevance score of individual query-document pairs

  2. Pairwise: Comparison of relative relevance between two documents

  3. Listwise: Optimize the entire ranking list at once

Difference from Retriever

characteristic

Retriever

Reranker

purpose

Quick search for related documents

Accurate ranking

Processing method

Simple similarity calculation

Complex semantic analysis

Model structure

Single encoder

Cross encoder

Operational complexity

low

High

Priority

speed

accuracy

Input form

Query and document individual processing

Query-document pair processing

output

Large set of candidate documents

Exact rank and score

scalability

High

Limited

pros and cons

Advantages

  • Significant improvement in search accuracy

  • Complex semantic relationship modeling possible

  • Complementing the limits of the first-step search

Disadvantages

  • Calculation cost increase

  • Processing time increase

  • Difficulty applying directly to large data sets

Last updated