- 学术相关
学术相关
近期梳理的一些RAG相关的综述笔记,具体内容由claude code基于笔记内容生成 Information Retrieval, RAG, and Intelligent Literature Systems: A Technical Survey
1. A Brief Review of Traditional Information Retrieval
Modern large-scale information retrieval (IR) systems almost universally adopt a retrieval + ranking two-stage pipeline (retrieve-then-rerank architecture), achieving a fundamental balance between efficiency and effectiveness [Hambarde et al., 2023; Xu et al., 2025].
1.1 The Evolution of Retrieval Methods
Retrieval methods have evolved from term-based matching to semantic vector-based approaches, which can be summarized into four categories:
| Category | Core Idea | Representative Methods |
|---|---|---|
| Traditional | Term matching, query expansion, topic models | TF-IDF, BM25, Query Expansion, Topic Model |
| Sparse | Represent documents/queries as sparse vectors, activating few dimensions | Neural weight prediction, document expansion, BM25 variants |
| Dense | Encode queries and documents as dense vectors, retrieve via vector similarity | Word2Vec, Sentence-BERT, DPR, ColBERT |
| Hybrid | Combine the strengths of sparse and dense representations | SPLADE, ColBERT, and other late-fusion schemes |
To enhance retrieval effectiveness, researchers have proposed two augmentation strategies. Query Augmentation enriches the query representation on the query side through expansion, reformulation, and feedback, but is subject to query drift and overfitting risks [Hambarde et al., 2023]. Document Augmentation augments or semantically rewrites indexed content on the document side to narrow the semantic gap between queries and documents.
1.2 The Evolution of Ranking Methods
The ranking stage is responsible for fine-grained relevance scoring of candidate documents returned by retrieval, and has similarly undergone a transformation from traditional methods to deep learning approaches:
| Category | Description | Representative Methods |
|---|---|---|
| Learning-to-Rank | Classified by loss function into pointwise / pairwise / listwise | LambdaRank, LambdaMART |
| Representation-based | Encode query and document separately, compute global similarity | DSSM, Siamese Network |
| Interaction-based | Model query-document term-level interactions before scoring | DRMM, BERT-based Re-ranker |
| Hybrid | Combine the efficiency of representation-based with the accuracy of interaction-based, dual-network parallel | DUET |
The introduction of continuous vector representations enabled text ranking to transcend the ceiling of exact term matching, while neural networks eliminated the need for manual feature engineering [Hambarde et al., 2023]. In practical systems, representation-based models typically pre-encode documents into vectors during offline stages, making them suitable for efficient initial retrieval; interaction-based models jointly process queries and documents during the online stage, achieving deeper and more precise relevance matching but at higher computational cost, making them better suited for re-ranking a smaller set of candidate results [Xu et al., 2025]. Recognizing the complementary strengths of these two architectures, researchers further proposed hybrid models that combine the efficiency of representation-based methods with the effectiveness of interaction-based methods [Xu et al., 2025].
1.3 Pre-trained Transformers and the Representation Learning Revolution in IR
The advent of pre-trained Transformers has further reshaped IR model architectures. By network structure, they can be broadly categorized into three types:
| Architecture | Representative Models | Characteristics |
|---|---|---|
| Encoder-only | BERT (Cross-Encoder) | Deep interaction, high ranking quality, but computationally expensive |
| Encoder-decoder | T5, BART | Balances understanding and generation capabilities |
| Decoder-only | GPT series | Strong generation capability |
To bridge the effectiveness gap between efficient Bi-Encoders and powerful Cross-Encoders, Knowledge Distillation (KD) has been widely adopted in the IR domain. However, regardless of architectural evolution, retrieval systems always face a fundamental trade-off between effectiveness (recall, MRR, nDCG) and resource efficiency (latency, throughput, memory footprint, indexing and update costs). While neural retrievers can significantly improve effectiveness, they require additional computational and indexing engineering investment. Furthermore, Calibration mechanisms aim to align model output scores with true relevance probabilities, ensuring that confidence scores faithfully reflect correctness.
At this point, information retrieval has fully entered the representation learning era of deep learning. Pre-trained models have endowed retrieval systems with powerful semantic understanding capabilities, yet the fundamental paradigm remains user issues a query → system returns a ranked list of relevant documents. The emergence of large language models is poised to fundamentally alter this paradigm.
2. IR Systems in the LLM Era: The Technical Evolution Path of RAG
The fundamental paradigm of traditional information retrieval systems is to return a ranked list of relevant documents in response to user queries; the system itself is not responsible for understanding document content and generating coherent answers. Large Language Models (LLMs), while possessing powerful language understanding and generation capabilities, exhibit significant limitations when relying solely on their internal parametric knowledge. This complementarity gave rise to Retrieval-Augmented Generation (RAG), which combines the external knowledge acquisition capability of IR with the text generation capability of LLMs, forming an entirely new information processing paradigm.
2.1 Why RAG Is Needed
Before RAG, the field of information access went through two major phases, each with fundamental shortcomings.
Limitations of Pure LLMs (Chatbot Mode). When LLMs rely solely on parametric knowledge learned during pre-training to answer user queries, they face three core challenges: first, hallucination, where the model may generate plausible but inaccurate content; second, lack of timeliness, as the model has no knowledge of events beyond its training data cutoff and cannot provide up-to-date information; and third, context window limitations, where even with massive parameter counts, the context length the model can process in a single pass remains limited (typically 2K–32K tokens), making it difficult to comprehensively understand complex queries requiring integration of extensive background knowledge [Zhang et al., 2025b].
Limitations of Traditional IR Systems. While traditional search engines or retrieval systems can return large volumes of relevant documents based on indexing, their responsibility ends there — they are designed to “find” information, not to “understand” and “answer.” After obtaining a document list, users must still read, screen, and synthesize on their own to form a complete answer. For complex questions requiring cross-document correlation analysis or synthetic reasoning, the cognitive burden of this process is extremely high.
The Core Motivation of RAG. The central idea of RAG is Retrieve-then-Read: first use a retrieval system to retrieve relevant document fragments from an external knowledge base, then concatenate these retrieved texts as context into the LLM’s input prompt, guiding the model to generate responses based on these external factual inputs. This approach leverages the powerful language understanding and generation capabilities of LLMs while harnessing the retrieval system to ensure the factuality and timeliness of responses, simultaneously alleviating hallucination to some extent [Zhang et al., 2025b; Gao et al., 2024].
2.2 Naive RAG: The Simplest Form
The most rudimentary implementation of RAG is exceedingly straightforward: a user query enters the system, relevant fragments are retrieved from the document collection via keyword matching (such as TF-IDF or BM25), these fragments are concatenated with the original query to form a prompt, which is then passed to the LLM to generate a final answer. This design is conceptually clear and easy to implement, and can achieve reasonable results for simple factual question answering.
However, this naive “one-shot retrieval + direct generation” architecture quickly reveals its fragility in complex scenarios: the retrieval stage may return insufficiently relevant fragments due to inherent limitations in similarity computation; the generation stage struggles to deeply integrate retrieved content with the query, as multi-source similar information tends to produce redundancy or incoherent content; and the entire pipeline is static and linear, lacking the ability to adjust based on intermediate results [Gao et al., 2024]. These limitations prompted researchers to explore more advanced RAG variants.
2.3 The Technical Evolution Path of RAG
From the historical perspective of information retrieval, the way humans access information has evolved from Web Search (static keyword-based search) to LLM as Chatbot (pure parametric knowledge generation) to LLM with RAG (retrieval-augmented generation), and is now advancing toward Multi-hop Retrieval (iterative multi-hop retrieval) and Deep Research [Zhang et al., 2025b]. Deep Research emphasizes a dynamic feedback loop between reasoning and search: the reasoning process actively influences search strategies (e.g., refining queries based on intermediate deductions), while retrieved information in turn recursively refines the reasoning process.
From a technical perspective, the evolution of RAG can be summarized as a clear five-stage progression [Gao et al., 2024]:
1
Naive RAG → Advanced RAG → Modular RAG → Graph RAG → Agentic RAG
| Stage | Key Characteristics | Core Improvements | Limitations |
|---|---|---|---|
| Naive RAG | Keyword retrieval (TF-IDF, BM25), results directly concatenated to prompt | Simple architecture, easy to implement | Lacks context awareness, fragmented output, limited scalability |
| Advanced RAG | Introduces dense vector retrieval models, vector search, context re-ranking, multi-hop iterative retrieval | Significantly improves retrieval quality and context relevance | Increased computational overhead, scalability still limited |
| Modular RAG | Decouples retrieval, ranking, and generation into independent reusable modules, supports hybrid retrieval strategies and external tool integration | Substantially improved system flexibility and scalability | Inter-module collaborative optimization still requires manual design |
| Graph RAG | Leverages graph node connectivity and edge relations to organize knowledge, enabling hierarchical knowledge management and structured navigation | Supports cross-document correlation, multi-hop reasoning, and global context understanding | High graph construction cost, scalability limited by graph scale, reliance on high-quality data |
| Agentic RAG | Introduces autonomous agents for iterative query refinement, adaptive retrieval strategy selection, and dynamic workflow orchestration | Possesses multi-step reasoning and autonomous decision-making capabilities | High coordination complexity, significant computational overhead, increased latency |
Meanwhile, Li et al. (2025), from the perspective of the interplay between RAG and Reasoning, proposed a higher-level framework that categorizes their integration into three paradigms:
| Paradigm | Core Idea | Direction |
|---|---|---|
| Reasoning-Enhanced RAG | Leverages reasoning capabilities to optimize RAG’s retrieval, integration, and generation stages | Reasoning → RAG |
| RAG-Enhanced Reasoning | Leverages external knowledge from retrieval to enhance LLM reasoning capabilities | RAG → Reasoning |
| Synergized RAG-Reasoning | Deep coupling of retrieval and reasoning, iterative alternating execution, forming a dynamic feedback loop | RAG ↔ Reasoning |
These three paradigms reveal that RAG is not merely about “attaching an external retriever to an LLM,” but rather a process of continuous integration and mutual enhancement between retrieval and reasoning capabilities. Early RAG systems primarily belonged to the first two unidirectional enhancement modes, while current frontier trends are advancing toward the third mode of deep synergization.
2.4 Core Limitations of RAG
Despite continuous advances in RAG technology, traditional RAG systems (primarily Naive and Advanced RAG) face fundamental challenges across multiple dimensions. Synthesizing findings from multiple surveys, these limitations can be summarized along three dimensions:
Retrieval Dimension. A single retrieval pass cannot guarantee acquisition of all relevant information needed to resolve a query, and assessing the relevance of retrieval results is itself challenging. When queries involve complex knowledge requiring synthesis across multiple data sources, simultaneously satisfying the sufficiency and accuracy of retrieval becomes difficult [Li et al., 2025; Gao et al., 2024]. Furthermore, deep-seated limitations at the embedding level — including biases in training data leading to high-frequency task/language/domain preferences, poor performance of general-purpose corpus models in specialized domains (e.g., biomedicine, scientific literature), and insufficient capture of long-range structural information — further constrain the upper bound of retrieval quality [Zhang et al., 2025c].
Reasoning Dimension. Traditional RAG lacks genuine multi-step reasoning capabilities. The system cannot dynamically refine retrieval strategies based on intermediate insights or user feedback, making it difficult to handle tasks requiring complex logical chains and deep contextual understanding [Singh et al., 2026; Li et al., 2025]. Errors in early reasoning paths may propagate through subsequent steps, affecting the completeness of the final output [Zhang et al., 2025b]. Additionally, models struggle to maintain fidelity to retrieved evidence when conflicts arise between external retrieved evidence and internal parametric knowledge.
System Dimension. RAG pipelines are typically static and linear, lacking adaptive adjustment mechanisms, making it difficult to accommodate queries of varying complexity and dynamically updating knowledge bases [Li et al., 2025; Singh et al., 2026]. The entire workflow — from preprocessing and index construction to real-time retrieval and generation — faces efficiency bottlenecks in large-scale deployment [Zhang et al., 2025b]. Moreover, effectively integrating similar information retrieved from multiple sources, avoiding redundant output while ensuring stylistic and tonal consistency of generated content, remains a persistent practical challenge [Gao et al., 2024].
The Emergence of Enhanced Retrieval Modes. To overcome the limitations of single-pass retrieval, researchers proposed three enhanced retrieval modes [Gao et al., 2024]: Iterative Retrieval gradually enriches context through multiple retrieval-generation cycles; Recursive Retrieval progressively decomposes complex problems for deeper retrieval; and Adaptive Retrieval dynamically controls retrieval and generation behavior on demand. While these ideas had preliminary exploration in the first two generations of RAG, their potential was far from fully realized due to the static nature of system architecture. True breakthroughs came from two directions: first, using graph structures to explicitly model knowledge associations (Chapter 3, Graph-based RAG); and second, using autonomous agents to dynamically orchestrate retrieval and reasoning processes (Chapter 4, Agentic Search).
3. Graph-based RAG
When RAG needs to handle cross-document correlation, multi-hop reasoning, and global knowledge integration, traditional text-fragment-based indexing and retrieval methods face structural limitations. Graph-based RAG elevates RAG from “retrieving text fragments” to “navigating and reasoning within structured knowledge networks” by introducing graph structures to explicitly model inter-knowledge associations.
3.1 Definitions and Taxonomy of Graph-based RAG
3.1.1 Text-Attributed Graphs and Formal Definition
Graph-based RAG uniformly represents graph data as Text-Attributed Graphs (TAGs) [Peng et al., 2024]:
\[G = (V, E,\{x_v\}_{v\in V}, \{e_{i,j}\}_{i,j \in E})\]where $V$ is the node set, $E \subseteq V \times V$ is the edge set, $A$ is the adjacency matrix, and ${x_v}$ and ${e_{i,j}}$ are the text attributes of nodes and edges, respectively. The objective of Graph-based RAG is to find the optimal answer $a^*$ given a query $q$ and a TAG $G$:
\[a^* = \arg\max p(a|q, G)\]Through joint modeling, the generation probability of the answer can be decomposed as the product of the probability of retrieving a subgraph and the probability of generating an answer based on that subgraph:
\[p(a|q, G) \approx p_\phi(a|q, G^*)p_\theta(G^*|q, G)\]3.1.2 Three-Category Taxonomy
Based on the role the graph plays in the RAG system, Graph-based RAG can be classified into three types [Zhang et al., 2025a]:
| Type | Core Idea | Characteristics |
|---|---|---|
| Knowledge-based | Graph as knowledge carrier | Explicitly models domain knowledge and semantic relations; understands complex relations through graph transformations |
| Index-based | Graph as indexing tool | Organizes raw text through graphs; optimizes retrieval and global navigation |
| Hybrid | Combines strengths of both | Provides more advanced solutions for complex reasoning tasks |
Graph-based RAG systems can reduce token usage by 26% to 97% compared to conventional methods when generating answers, with significant improvements in both speed and resource utilization [Zhang et al., 2025a].
3.2 The Three-Stage Workflow of Graph-based RAG
The workflow of Graph-based RAG can be summarized as three core stages: G-Indexing (graph-based index construction), G-Retrieval (graph-guided retrieval), and G-Generation (graph-enhanced generation) [Peng et al., 2024; Zhang et al., 2025a; Huang et al., 2026]. Rich optimization methods exist before and after each stage, enabling systematic improvement of retrieval and generation quality.
3.2.1 G-Indexing: Graph-based Index Construction
Pre-indexing: Data Preprocessing and Index Optimization
Index quality is directly determined by the effectiveness of raw data processing. Before constructing the graph index, systematic preprocessing and optimization of the data are required [Huang et al., 2026; Gao et al., 2024]:
| Optimization Direction | Specific Methods | Description |
|---|---|---|
| Data Cleaning | Remove irrelevant/redundant information, supplement additional information | Improves index purity, reduces noise interference |
| Document Segmentation | Sliding Window, Fine-grained Segmentation | Balances context completeness with index granularity |
| Metadata Augmentation | Metadata Incorporation | Enriches index information with metadata, supporting subsequent filtering and routing |
These preprocessing techniques establish the data foundation for subsequent graph index construction.
Index Method Taxonomy
Based on the degree of graph structure preservation, indexing methods can be classified into four categories [Peng et al., 2024; Zhang et al., 2025a]:
| Method | Characteristics | Algorithms |
|---|---|---|
| Graph Indexing | Preserves complete graph structure | BFS, Shortest Path |
| Text Indexing | Converts graph data into text descriptions | Sparse Retrieval, Dense Retrieval |
| Vector Indexing | Converts to vector representations for efficiency | LSH (Locality Sensitive Hashing) |
| Hybrid Indexing | Combines all three approaches | — |
Beyond graph indexing, Graph-based RAG can additionally leverage hierarchical index structures to establish multi-granularity retrieval support for documents, as well as knowledge graph indexing that uses knowledge graphs to organize document relations and enable structured navigation [Gao et al., 2024].
Knowledge Graphs as Structural Indices
In Graph-based RAG, knowledge graphs serve not merely as a knowledge representation form but as a powerful structural indexing mechanism for organizing inter-document relations and supporting structured navigation [Peng et al., 2024; Zhang et al., 2025a].
KG Construction Challenges. At the knowledge graph construction level, different scenarios face distinct challenges. Domain-specific corpora face a triple challenge: complex knowledge dependencies (domain knowledge progresses from foundational to advanced concepts layer by layer, requiring cross-reference analysis), domain specificity (dense technical terminology and abbreviations), and limited reference knowledge (private technical documents are difficult to obtain externally) [Sun et al., 2025]. For general-purpose knowledge graphs, Li et al. (2026) note that while LLM pipelines can extract entities and relations at scale, the resulting graphs often lack a shared schema, with entity types and relation vocabularies being ad hoc. To address this, ontology-oriented construction methods emphasize building schema as a first-class resource for downstream tasks from the outset, rather than as a byproduct of graph construction.
Two-View KG: The Ontological Dual-Layer Architecture
The Two-View KG concept proposed by Hao et al. (2019) reveals the ontological dual-layer architecture of knowledge graphs, simultaneously representing two complementary views:
- Ontology View: The abstract concept layer, defining types, relation schemas, and other meta-knowledge
- Instance View: The concrete entity layer, containing actual fact triples
Between the two views exist cross-view links that connect ontological concepts with their instantiated entities, while satisfying mutual disjointness constraints (the entity vocabulary set and concept vocabulary set are disjoint, as are the relation set and meta-relation set). This dual-layer structure provides Graph-based RAG systems with navigation paths from abstract concepts to concrete instances, enabling the system to both understand high-level semantic patterns and locate specific knowledge details.
Yeom et al. (2024) further note that a large number of knowledge graphs are essentially Two-View KGs: abstract classes in the ontology view form tree-like hierarchical structures through class inheritance, while concrete entities in the instance view are instantiated from ontological classes. This structured dual-layer representation holds significant value for Graph-based RAG tasks that must simultaneously leverage abstract concept reasoning and concrete fact verification.
3.2.2 G-Retrieval: Graph-Guided Retrieval
Pre-retrieval: Query Optimization
Before formal retrieval, the system can optimize queries through various means to improve retrieval effectiveness [Gao et al., 2024; Huang et al., 2026]:
| Category | Sub-category | Description |
|---|---|---|
| Query Expansion | Multi-Query | Generates multiple related queries to expand retrieval coverage |
| Sub-Query | Decomposes complex queries into sub-queries | |
| CoVe (Chain-of-Verification) | Verification chain expansion, progressively confirms query completeness | |
| Query Transformation | Query Rewrite | Rewrites queries to improve retrieval effectiveness |
| Query Routing | Routes queries to different processing paths based on query characteristics (via metadata filtering or semantic similarity) |
Additionally, RRR (Rewrite-Retrieve-Read) uses dedicated small language models for query rewriting, with some methods further introducing external adapters to assist retriever-generator alignment [Gao et al., 2024]. Query optimization essentially clarifies and expands information needs before retrieval, reducing the semantic gap between queries and the index.
Retriever Types
By degree of parameterization, retrievers can be classified into three categories [Peng et al., 2024]:
- Non-parametric Retriever: High retrieval efficiency, no training required
- LM-based Retriever: E.g., fine-tuned RoBERTa models, balancing semantic understanding with efficiency
- GNN-based Retriever: Leverages graph neural networks to capture structural information, but with higher computational cost
Retrieval Techniques
By the mode of interaction between queries and graph data, retrieval techniques can be classified into three categories [Zhang et al., 2025a]:
| Type | Core Idea | Representative Methods |
|---|---|---|
| Semantics Similarity-based | Models similarity in discrete space (substring matching, regex) or embedding space (TF-IDF, Word2Vec) | Substring matching, TF-IDF, Word2Vec |
| Logical Reasoning-based | Uses rule mining, inductive logic programming, constraint satisfaction to reveal implicit insights | Rule Mining, ILP, Constraint Satisfaction |
| GNN-based | Uses graph neural networks for graph modeling and mining | GCN, GAT |
While semantic similarity methods are simple to implement, they cannot fully exploit graph structure information, resulting in significant underestimation of the inherent advantages of graph databases [Zhang et al., 2025a].
Retrieval Paradigms and Granularity
Retrieval paradigms include Once Retrieval, Iterative Retrieval (with non-adaptive and adaptive variants), and Multi-Stage Retrieval [Huang et al., 2026]. Retrieval granularity spans four levels: Nodes, Triplets, Paths, and Subgraphs. The goal of granularity optimization is to balance relevance with efficiency: coarse-grained units provide richer context but may introduce redundancy and noise; fine-grained units are more semantically focused but may lack completeness and increase retrieval burden [Zhang et al., 2025a].
Retrieval Augmentation
To enhance retrieval, the system can perform Query Enhancement (query expansion, query decomposition) before retrieval or Knowledge Enhancement (knowledge merging, knowledge pruning) after retrieval.
Post-retrieval: Re-ranking and Context Compression
After retrieval and before generation, the retrieval results typically require processing to improve final generation quality [Gao et al., 2024]:
- Reranking: Fine-grained relevance scoring of retrieved candidate results
- Context Compression: Removing redundant information and compressing retrieval results to a context length suitable for LLM processing
Redundant information disrupts LLM generation quality, while excessively long contexts may cause the LLM to exhibit the “Lost in the Middle” problem — difficulty effectively utilizing information in the middle portions of long contexts [Gao et al., 2024].
3.2.3 G-Generation: Graph-enhanced Generation and Knowledge Integration
Knowledge Integration
Retrieved graph data must be transformed into natural language responses through appropriate generation strategies. G-Integration (knowledge integration) is the process of effectively fusing retrieval results with LLM generation capabilities. Zhang et al. (2025a) summarize the conventional RAG pipeline as comprising three core components: knowledge organization, knowledge retrieval, and knowledge integration. Knowledge integration occupies the pre-generation stage, and its quality directly impacts the accuracy and coherence of the final answer.
Generation Paradigms
Graph-enhanced generation can be categorized into three types [Peng et al., 2024]:
| Type | Paradigm | Description |
|---|---|---|
| GNNs | — | First processes graph data with GNNs, encapsulating structural and relational information for LM comprehension |
| LMs | — | Language models generate the final text response |
| Hybrid | Cascaded Paradigm | Models sequentially process different aspects of the data |
| Hybrid | Parallel Paradigm | Models simultaneously receive inputs, process collaboratively; outputs merged via rules or another model |
Generation Control: Context-Aware and Grounded Constraints
During generation, the system can actively control output quality through reasoning capabilities [Li et al., 2025]:
| Control Direction | Description | Representative Methods |
|---|---|---|
| Context-Aware Generation | Selectively utilizes context, avoids interference from irrelevant information | Open-RAG, RARE, Self-Reasoning |
| Grounded Generation Control | Fact verification, citation generation, ensures output fidelity to retrieval evidence | RARR, TRACE, AlignRAG |
Post-generation: Iterative Refinement and Verification
Generation is not the endpoint. Through Test-Time Scaling, the system can further refine output quality after generation [Zhang et al., 2025a]:
- Iterative Self-Refinement: Multiple rounds of self-improvement, progressively correcting errors in generated content [Madaan et al., 2023]
- Self-Consistency Decoding: Consistency verification across multiple decoding paths to select the most reliable answer [Hao et al., 2023]
These methods transform generation from a one-shot output into an iteratively optimizable process, which is particularly important in complex reasoning tasks.
3.3 Applicability of Graph-based RAG: When Are Graph Structures Needed?
Not all tasks require introducing graph structures. Xiang et al. (2026) systematically compared vanilla RAG and Graph-based RAG across different task complexity levels through GraphRAG-Bench, proposing a series of key observations.
3.3.1 Task Complexity and the Graph-based RAG Advantage Threshold
| Observation | Core Conclusion | Task Type |
|---|---|---|
| Obs.1 | Basic RAG and Graph-based RAG perform comparably on simple factual retrieval tasks | Simple factual retrieval |
| Obs.2 | Graph-based RAG excels in complex tasks | Complex reasoning |
| Obs.3 | Graph-based RAG ensures higher factual reliability in creative tasks | Creative generation |
| Obs.4 | RAG is adept at extracting discrete facts from simple questions not requiring complex logic | Simple QA |
| Obs.5 | As questions grow increasingly complex, the advantage of Graph-based RAG becomes evident | Increasing complexity |
RAG performs excellently in scenarios requiring rapid access to discrete information, while Graph-based RAG excels at tasks requiring nuanced understanding of interconnected data [Xiang et al., 2026].
In terms of retrieval performance, a trade-off exists: Global Graph-based RAG achieves superior Evidence Recall (83.1%), accessing more relevant information; whereas RAG achieves superior Context Relevance (78.8%), with more focused retrieval results and less redundancy. This indicates that while Graph-based RAG retrieves broader information, its retrieval method inevitably introduces some redundancy [Xiang et al., 2026].
Additionally, different Graph-based RAG implementations produce index graphs with significant structural differences; compared to vanilla RAG, Graph-based RAG substantially increases prompt length, with prompt length exhibiting a clear upward trend as task complexity increases.
3.3.2 Practical Implications
Based on the above empirical findings, four scenario-specific recommendations can be summarized:
- Simple Factual Retrieval: Vanilla RAG is sufficient; there is no need to incur the additional overhead of Graph-based RAG.
- Complex Reasoning / Multi-hop Queries: Graph-based RAG offers clear advantages; graph structures can explicitly model cross-document associations.
- Creative Generation: Graph-based RAG provides higher factual reliability, but attention must be paid to the trade-off between retrieval redundancy and context relevance.
- Efficiency-Sensitive Scenarios: Prompt inflation in Graph-based RAG is a significant constraint, particularly in long-context or token-constrained environments.
4. Agentic Search
Core Paradigm Shift: From Fixed Pipelines to Autonomous Multi-turn RAG
| Traditional RAG | Agentic Search | |
|---|---|---|
| Retrieval Turns | Single turn | Multi-turn |
| Retrieval Timing | Fixed (one retrieval before generation) | Dynamic (triggered on-demand during reasoning) |
| Query Construction | Raw query used directly | Agent dynamically constructs based on intermediate reasoning |
| Decision-Making Entity | Predefined workflow | LLM autonomous decision-making |
| Typical Paradigm | Retrieve → Generate | Reason ⟷ Retrieve ⟷ Reason ⟷ … → Generate |
4.1 Definition and Core Components
The core characteristics of Agentic Search include: autonomous reasoning — the agent dynamically plans and adjusts its approach based on intermediate results rather than following preset patterns; on-demand retrieval — dynamically triggered based on uncertainty or information needs during the reasoning process; and iterative synthesis — retrieved information recursively refines reasoning, forming a feedback loop where reasoning and retrieval mutually reinforce each other.
The system comprises four core components:
| Component | Function |
|---|---|
| LLM | Core reasoning engine, providing role definition and task understanding capabilities |
| Memory | Short-term memory maintains current reasoning context; long-term memory stores cross-session knowledge and preferences |
| Planning | Dynamically plans task step sequences through Reflection and Self-Critique |
| Tools | Retrieval backends (Dense RAG, GraphRAG, web search, etc.) form core infrastructure; additionally includes external capabilities such as API calls |
Planning is the core differentiating component of Agentic Search. While traditional RAG follows a fixed “retrieve-then-generate” pattern, agents through Planning capability can autonomously decompose complex problems, prioritize information needs, and adjust strategies based on feedback during execution, transforming retrieval from passive data supply into an active reasoning resource.
4.2 Workflow Patterns: From Linear Reasoning to Graph-structured Exploration
The workflow design of Agentic Search determines how the system handles complex queries. From the macro control logic perspective, Singh et al. (2026) summarize five general patterns: Prompt Chaining (improving accuracy through sequential processing), Routing (routing to different processing strategies based on input characteristics), Parallelization (processing independent sub-tasks in parallel), Orchestrator-Workers (dynamically assigning tasks to worker threads), and Evaluator-Optimizer (iteratively evaluating and optimizing outputs).
Li et al. (2025), from the reasoning structure perspective, categorize workflows into theoretically more significant classes:
| Workflow Type | Structural Characteristics | Representative Methods | Advantages | Limitations | Applicable Scenarios |
|---|---|---|---|---|---|
| Chain-based | Linear sequence, one retrieval per reasoning step | IRCoT, Rat, CoV-RAG, RAFT | Low latency, low token cost, easy caching | Error propagation, rapid context growth | Single-hop or short multi-hop QA |
| Tree-based (ToT) | Parallel exploration of multiple branches to hedge early errors | RATT, Tree of Clarifications, AirRAG | High recall, transparent hypothesis analysis | Quadratic cost, multiple retrieval calls | Ambiguous or multi-path tasks |
| Tree-based (MCTS) | Budget-aware exploration, focusing on promising branches | MCTS-RAG, SeRTS | Graceful anytime stopping | Parameter-dependent, may converge to suboptima | Deep search under strict budgets |
| Graph-based (Walk-on-Graph) | Efficient walks on explicit KG/document graphs | QA-GNN, LightRAG | Efficient on KGs, short paths | Requires high-quality KG, limited flexibility | Domain QA with existing KGs |
| Graph-based (Think-on-Graph) | LLM dynamically updates evidence graph, adaptive and verifiable | ToG, ToG-2.0, Graph-CoT | Node-level citation checking, high accuracy | High latency, search space explosion risk | Open-domain deep research |
Chain-based methods, exemplified by Chain-of-Thought (CoT), structure reasoning into linear sequences of intermediate steps. However, relying solely on LLM parametric knowledge readily leads to error propagation, where small deviations at each step are amplified in subsequent steps. Tree-based methods hedge early errors through parallel exploration of multiple branches; Tree-of-Thought (ToT) allows multiple hypotheses to coexist and be evaluated simultaneously, while Monte Carlo Tree Search (MCTS) focuses exploration on the most promising branches through budget-aware strategies.
Graph-based methods represent deeper reasoning-retrieval coupling. Walk-on-Graph methods primarily rely on graph learning techniques for retrieval and reasoning, including GNNs (leveraging graph neural networks for graph modeling and retrieval reasoning) and lightweight graph techniques (vector indexing, PageRank, and other link-structure-based ranking methods). Think-on-Graph methods embed graph structures directly into the LLM reasoning loop, enabling the LLM to serve as a “reasoning field” on the graph, dynamically deciding which connected entity or relation to explore next, progressively constructing paths to the answer. The significant advantage of this approach lies in node-level citation checking and higher accuracy, at the cost of higher latency and potentially exploding search spaces.
4.3 Agent Orchestration and Training Paradigms
4.3.1 System Architecture Taxonomy
Based on the comprehensive classification of Singh et al. (2026) and Li et al. (2025), Agentic Search systems can be categorized by architectural complexity into multiple levels:
| Type | Core Characteristics | Representative Methods | Applicable Scenarios |
|---|---|---|---|
| Single-Agent (Prompt-only) | ReAct loop, simple implementation | ReAct, Search-O1 | Prototype demonstrations, simple queries |
| Single-Agent (SFT/RL) | Fine-tuning or reinforcement learning enhances retrieval and reasoning capabilities | Toolformer, Search-R1 | Production systems, open-domain research |
| Multi-Agent (Decentralized) | Parallel expert collaboration, high recall | M-RAG, MDocAgent | Large-scale evidence aggregation across heterogeneous sources |
| Multi-Agent (Centralized) | Hierarchical manager coordinates sub-tasks | Chain of Agents | Complex tasks under strict budgets |
| Hierarchical | Strategic decision-making → delegation → aggregation | — | Scenarios requiring multi-level task decomposition |
| Adaptive | Dynamically selects strategies based on query complexity | — | Systems with diverse query types |
Corrective and Adaptive are two behavioral enhancement modes: Corrective introduces self-correction mechanisms to improve document utilization; Adaptive uses a classifier to assess query complexity and dynamically switches between single-step, multi-step, or skip-retrieval modes.
4.3.2 Evolution of Training Paradigms
Agent capability acquisition has undergone a three-stage evolution:
| Paradigm | Core Mechanism | Advantages | Limitations |
|---|---|---|---|
| Prompt-based | Prompt engineering defines retrieval and reasoning behavior | Simple, no training required | Constrained by fixed instruction patterns |
| SFT | Fine-tuning on reasoning-retrieval joint data | Higher precision than prompting | Requires large amounts of synthetic data, prone to overfitting |
| RL | Reward functions incentivize strategy discovery and adaptive optimization | Genuine agentic behavior | Difficult to define reward signals, expensive training |
The fundamental distinction: Prompt-based and SFT rely on offline supervision and fixed patterns; RL-trained agents are incentivized to autonomously discover search strategies rather than being told how to search.
4.4 Are Graph Structures Necessary for Agentic Search? — RAGSearch Empirical Evidence
The core question posed by Fan et al. (2026) addresses a key debate in Agentic Search: Do we still need GraphRAG? — that is, can Agentic Search compensate for the absence of explicit graph structures through dynamic multi-turn retrieval and reasoning, thereby reducing dependence on high-cost GraphRAG?
Core Conclusion: Agentic search can partially compensate for missing structural information in dense RAG through iterative retrieval, but explicit graph retrieval remains essential for robust multi-hop reasoning. GraphRAG consistently provides stronger performance and greater stability in complex settings, while dense RAG, with its lower construction cost, remains a practical choice for general-purpose QA.
The following analysis supports this conclusion across three dimensions: the formal framework, eight empirical findings, and system case studies.
4.4.1 Formal Framework
RAGSearch formalizes Agentic Search as follows: given a query $q$, an LLM-equipped agent interacts with a retrieval backend $B$ (dense RAG or GraphRAG) over multiple turns. At each step, the agent decides whether to trigger retrieval or generate an answer based on reasoning history, with retrieved information appended to the reasoning sequence. Its core characteristics are: retrieval is executed dynamically rather than as one-time preprocessing; the same control logic can operate across different backends.
Two Implementation Paths:
| Path | Mechanism | Representative Methods |
|---|---|---|
| Training-Free | Reasoning-driven on-demand search or Orchestrated multi-agent workflows | Search-o1, GraphSearch |
| RL-Based | GRPO training, Outcome-based + Format-based reward design | Search-R1, Graph-R1 |
4.4.2 Eight Core Findings
RAGSearch revealed the relationship between Agentic Search and retrieval backends through systematic experiments:
| Finding | Conclusion | Practical Implication |
|---|---|---|
| Obs.1 | Under single inference, dense RAG is effective for general QA; GraphRAG primarily provides decisive improvements in multi-hop QA | Task complexity determines GraphRAG necessity |
| Obs.2 | Agentic search can enhance dense RAG and partially close the gap with GraphRAG, but effectiveness depends on agent design | Structured agentic design is key, not simply increasing interaction turns |
| Obs.3 | RL-based training generally improves performance, but well-designed training-free pipelines remain competitive | Training cost and performance require trade-off |
| Obs.4 | In training-free workflows, explicit graph structures provide consistent and significant benefits for multi-hop QA | The robust advantage of graph structures in zero-training settings cannot be overlooked |
| Obs.5 | RL-based agentic performance is highly backend-dependent: graph retrievers gain larger improvements on multi-hop QA | RL + GraphRAG exhibits synergistic effects |
| Obs.6 | GraphRAG is more robust and stable than dense RAG in agentic search | Explicit structures reduce agentic control uncertainty |
| Obs.7 | GRPO is a favorable training paradigm for RL-based agentic systems | Validates GRPO’s effectiveness in retrieval augmentation |
| Obs.8 | Larger backbones not only improve reasoning performance but also narrow the performance gap between GraphRAG and dense RAG | Larger models may reduce the marginal benefit of graph structures |
4.4.3 System Case Studies
This conclusion is corroborated in concrete system designs. Agent-G dynamically assigns retrieval tasks to specialized agents, simultaneously leveraging both graph knowledge bases and text documents. GeAR enhances conventional retrievers through graph expansion and introduces an agent framework for managing graph-structured data retrieval tasks. These systems demonstrate the deep synergy between graph structures and Agentic Search: explicit graph structures not only provide higher-quality retrieval results but also offer interpretable knowledge relation paths for agent decision-making, reducing the uncertainty of agentic control.
4.5 Open Challenges and Future Directions
4.5.1 Core Challenges
Agentic Search should not be viewed as a universal replacement for traditional RAG. While it provides superior adaptability and multi-step reasoning capabilities, it also introduces coordination complexity, latency, and computational costs. Core challenges include:
- Evaluation Difficulty — Output-level metrics are insufficient to measure Agentic system quality; multi-dimensional evaluation of reasoning trajectories, planning depth, adaptability, robustness, and cost-effectiveness is needed.
- Long-term Memory Design — Knowledge drift, bias reinforcement, and frequent updates may amplify hallucination risks.
- Coordination Complexity — Multi-agent collaboration introduces communication and consensus overhead.
- Computational Overhead — Additional latency from agent reasoning is non-negligible in practical deployment.
Furthermore, a fundamental constraint is that agentic reasoning cannot compensate for persistently poor retrieval. Failures often stem from insufficient retrieval coverage, poorly constructed indices, or inadequate integration of structured and unstructured knowledge.
Optimal Application Domains: Agentic Search gains the strongest benefits in domains with structured knowledge and explicit constraints. Healthcare, finance, and legal analysis particularly benefit from combining retrieval with rule-based reasoning and graph-structured knowledge.
4.5.2 Efficiency and Latency Optimization
While Synergized RAG-Reasoning systems excel in complex reasoning, their iterative retrieval and multi-step reasoning loops may cause significant latency. Optimization directions include: budget-aware query planning — optimizing query strategies under strict API call or token budgets; memory-aware mechanisms — caching prior evidence or belief states to reduce redundant access.
4.5.3 Trustworthiness and Adversarial Robustness
Agentic Search systems remain vulnerable to adversarial attacks through poisoned or misleading external knowledge sources. Ensuring the trustworthiness of retrieved content is critical for maintaining fully reliable downstream reasoning. Systems need to establish credibility verification mechanisms for retrieved content, particularly in high-stakes scientific research and legal analysis scenarios.
4.5.4 Structured Data and Multi-agent Deep Research
Key future development directions include: iterative data organization — structuring intermediate search and reasoning content from agentic retrieval and reasoning processes to help agents maintain coherence and relevance in long-term contexts; and multi-agent deep research — leveraging graph structures to understand task requirements and role relationships, enabling effective task assignment and coordination. Graph structures help agents understand task requirements based on roles and relationships, support complex task decomposition, and make multi-agent collaboration more efficient.
5. Literature Retrieval Systems in Scientific Research
The volume of scientific literature is growing at an exponential rate — according to statistics, the number of scientific papers doubles every 17 years. This trend has rendered traditional literature retrieval methods increasingly inadequate, giving rise to a new generation of AI-driven academic search platforms and research intelligence frameworks.
5.1 Academic Search Platforms and Tools
The current academic search ecosystem can be categorized by function into two types: search and synthesis platforms (focused on literature discovery and content comprehension) and recommendation systems (focused on personalized delivery and trend tracking).
| Platform | Core Functionality | Technical Characteristics |
|---|---|---|
| Elicit | Semantic search, paper summary extraction | AI-enhanced academic search |
| Consensus | Evidence synthesis, trend analysis | LLM-based scientific question answering |
| OpenScholar | Large-scale academic literature retrieval | Semantic search + open access |
| SciSpace | Paper summarization, multi-document information synthesis | Cross-document understanding and synthesis |
| Connected Papers | Visual literature graph exploration | Visualization based on bibliographic coupling and co-citation |
| ORKG ASK | Structured knowledge access | KG-organized structured retrieval, more interpretable than conventional LLM QA |
| Arxiv Sanity | Paper recommendation | Personalized delivery based on ML and IR techniques |
| Scholar Inbox | Personalized academic information subscription | Interest-customized literature streams |
| ResearchTrend.ai | Research trend discovery | Trend analysis and emerging direction identification |
| Research Rabbit | Visual literature exploration | Similarity-based literature network mapping |
Mainstream technical approaches for recommendation systems include content-based filtering, collaborative filtering, and hybrid approaches. Notably, graph-structured systems such as ORKG ASK organize research contributions as structured data rather than unstructured text, offering unique advantages in interpretability.
5.2 Four Core Limitations of Academic Search
Despite the important role these platforms play in literature discovery, academic search still faces systemic challenges:
| Limitation | Description |
|---|---|
| Data Quality and Coverage Gaps | Incomplete, non-standard, or outdated data sources lead to inaccurate and inconsistent retrieval information |
| Model Bias | Search and ranking algorithms inherit biases from training data, affecting the visibility of certain research domains |
| Scalability and Real-time Processing | Efficiently processing large-scale datasets while maintaining low latency and high retrieval accuracy is challenging |
| Matthew Effect Reinforcement | Established researchers receive disproportionate attention; algorithms may exacerbate academic inequality |
Additionally, existing systems generally lack rigorous filtering options and advanced relevance ranking mechanisms. Many AI-assisted research tools rely on proprietary data, closed APIs, or evolving LLM backends, making strict reproducibility and long-term comparability difficult to ensure.
5.3 Research AI Frameworks and Evaluation Benchmarks
To address the above challenges, researchers have proposed a series of AI frameworks targeting the full research pipeline, covering literature retrieval, survey generation, hypothesis generation, and experimental automation:
| Framework | Core Functionality | Technical Characteristics |
|---|---|---|
| LitSearch | Literature retrieval evaluation benchmark | Evaluates complex literature retrieval queries in ML and NLP domains |
| ResearchArena | Academic survey LLM Agent evaluation | Three-stage: Information Discovery → Selection → Organization |
| SciLitLLM | Scientific literature understanding enhancement | CPT + SFT hybrid strategy, domain knowledge injection |
| CiteME | Citation management | Automated citation discovery and management |
| ResearchAgent | Research hypothesis generation | Multi-hop reasoning-assisted research ideation |
| Agent Laboratory | End-to-end research automation | High success rates in data preparation, experimentation, and report writing; weak in literature review |
The central tension facing current research AI frameworks is the trade-off between end-to-end automation capability and domain depth. Systems such as Agent Laboratory perform excellently in data preparation, experiment execution, and report writing, but exhibit significant performance degradation during the literature review stage — precisely the phase requiring the most structured evaluation and domain expertise. While SciLitLLM and ResearchArena demonstrate promising results, they remain insufficient for tasks demanding deep domain knowledge and nuanced understanding. These limitations indicate that automated literature review remains a far-from-solved challenge, requiring better balance between structured evaluation, domain expertise, and reproducibility.
6. References
[1] Abou Ali, et al. (2025). Agentic AI: a comprehensive survey of architectures, applications, and future directions.
[2] Bai, et al. (2023). Advancing abductive reasoning in knowledge graphs through complex logical hypothesis generation.
[3] Borrego, et al. (2025). Research hypothesis generation over scientific knowledge graphs.
[4] Eger, et al. (2025). Transforming science with large language models: a survey on AI-assisted scientific discovery, experimentation, content generation, and evaluation.
[5] Fan, et al. (2026). Do we still need GraphRAG? Benchmarking RAG and GraphRAG for agentic search systems.
[6] Gao, et al. (2024). Retrieval-augmented generation for large language models: a survey.
[7] Gridach, et al. (2025). Agentic AI for scientific discovery: a survey of progress, challenges, and future directions.
[8] Hambarde, et al. (2023). Information retrieval: recent advances and beyond.
[9] Hao, et al. (2019). Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts.
[10] Huang, et al. (2026). A survey on retrieval-augmented text generation for large language models.
[11] Li, et al. (2025). Towards agentic RAG with deep reasoning: a survey of RAG-reasoning systems in LLMs.
[12] Li, et al. (2026). OntoKG: ontology-oriented knowledge graph construction with intrinsic-relational routing.
[13] Niu, et al. (2026). A comprehensive survey of knowledge graph reasoning: approaches and applications.
[14] Peng, et al. (2024). Graph retrieval-augmented generation: a survey.
[15] Singh, et al. (2026). Agentic retrieval-augmented generation: a survey on agentic RAG.
[16] Sun, et al. (2025). LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing.
[17] Thakur, et al. (2021). BEIR: a heterogenous benchmark for zero-shot evaluation of information retrieval models.
[18] Xiang, et al. (2026). When to use graphs in RAG: a comprehensive analysis for graph retrieval-augmented generation.
[19] Xu, et al. (2025). A Survey of Model Architectures in Information Retrieval.
[20] Yeom, et al. (2024). Embedding two-view knowledge graphs with class inheritance and structural similarity.
[21] Zhang, et al. (2025a). A survey of graph retrieval-augmented generation for customized large language models.
[22] Zhang, et al. (2025b). From web search towards agentic deep research: incentivizing search with reasoning agents.
[23] Zhang, et al. (2025c). On the role of pretrained language models in general-purpose text embeddings: a survey.