The core principle of the paper, titled "Hypothetical Document Embeddings (HyDE) for Zero-Shot Dense Retrieval," presented at arXiv:2212.10496, introduces a method called HyDE to achieve zero-shot dense retrieval. Zero-shot learning refers to the ability of a model to handle new tasks without specific labeled data for that task. In the context of information retrieval, this typically means that the system must retrieve the most relevant documents to a query without any labeled data regarding the relevance of queries and documents.
The core principle of HyDE can be broken down into the following steps:
Generating Hypothetical Documents: Given a query, HyDE first employs a large language model capable of following instructions, such as InstructGPT, to generate a hypothetical document. This document attempts to capture information relevant to the query, but it does not exist in reality and may contain erroneous details.
Unsupervised Encoding: Next, HyDE uses an unsupervised contrastive encoder, such as Contriever, to encode this hypothetical document into a vector. This encoding process can be seen as a compression process that filters out erroneous details from the document, retaining only the key information.
Retrieving Similar Documents: The encoded vector is then used to retrieve the most similar real documents from a document repository. This step is based on the vector space model, using the inner product (or cosine similarity) to measure the similarity between documents.
Retrieval Results: Ultimately, the system returns a set of real documents most relevant to the original query.
The innovation of this process lies in the fact that it decomposes the query-to-document retrieval task into two subtasks: one handled by the language model for generation, and the other by the contrastive encoder for document similarity. In this way, HyDE can achieve retrieval without any relevance labels by generating and encoding hypothetical documents, resulting in good performance across various tasks and languages.
To put it metaphorically, it can be likened to a detective game: the detective (language model) constructs a hypothetical crime scene (hypothetical document) based on clues (query), and then uses this scenario to find an actual crime scene (relevant document). This process does not require prior knowledge of the specifics of each scene (relevance labels) but instead infers based on the similarity of the scenarios.
Top comments (0)