Reranking
- When querying the vector database we usually get back the top 6 retrieved documents.
- But sometimes we want to get item 17.
- We could retrieve back all the relevant records but LLMs have a context window. Also we want to retrieve the best data possible to decrease the chances of hallucination.
- A reranker will reorder the records to get the most relevant items to the top of the list.
- Why would a reranker do a better job then just picking another retrieval model
- cosine similarity
- At query time
Hybrid Search
- retrieval that involves combining results from both semantic search (i.e. embedding similarity) and keyword search
Vertex AI Matching Engine

Query Transform Methods
Llama Index Reference
A user query can be transformed and decomposed in many ways before being executed as part of a RAG query engine, agent, or any other pipeline.
- Routing
- Keep original query, but identify a subset of tools that query applies.
- Routing query to specific vector store, db or api to get context