本周阅读摘选 2026-04-27 → 2026-05-03 目录

学术相关

学术相关

近期梳理的一些RAG相关的综述笔记，具体内容由claude code基于笔记内容生成 Information Retrieval, RAG, and Intelligent Literature Systems: A Technical Survey

1. A Brief Review of Traditional Information Retrieval

Modern large-scale information retrieval (IR) systems almost universally adopt a retrieval + ranking two-stage pipeline (retrieve-then-rerank architecture), achieving a fundamental balance between efficiency and effectiveness [Hambarde et al., 2023; Xu et al., 2025].

1.1 The Evolution of Retrieval Methods

Retrieval methods have evolved from term-based matching to semantic vector-based approaches, which can be summarized into four categories:

Category	Core Idea	Representative Methods
Traditional	Term matching, query expansion, topic models	TF-IDF, BM25, Query Expansion, Topic Model
Sparse	Represent documents/queries as sparse vectors, activating few dimensions	Neural weight prediction, document expansion, BM25 variants
Dense	Encode queries and documents as dense vectors, retrieve via vector similarity	Word2Vec, Sentence-BERT, DPR, ColBERT
Hybrid	Combine the strengths of sparse and dense representations	SPLADE, ColBERT, and other late-fusion schemes

To enhance retrieval effectiveness, researchers have proposed two augmentation strategies. Query Augmentation enriches the query representation on the query side through expansion, reformulation, and feedback, but is subject to query drift and overfitting risks [Hambarde et al., 2023]. Document Augmentation augments or semantically rewrites indexed content on the document side to narrow the semantic gap between queries and documents.

1.2 The Evolution of Ranking Methods

The ranking stage is responsible for fine-grained relevance scoring of candidate documents returned by retrieval, and has similarly undergone a transformation from traditional methods to deep learning approaches:

Category	Description	Representative Methods
Learning-to-Rank	Classified by loss function into pointwise / pairwise / listwise	LambdaRank, LambdaMART
Representation-based	Encode query and document separately, compute global similarity	DSSM, Siamese Network
Interaction-based	Model query-document term-level interactions before scoring	DRMM, BERT-based Re-ranker
Hybrid	Combine the efficiency of representation-based with the accuracy of interaction-based, dual-network parallel	DUET

The introduction of continuous vector representations enabled text ranking to transcend the ceiling of exact term matching, while neural networks eliminated the need for manual feature engineering [Hambarde et al., 2023]. In practical systems, representation-based models typically pre-encode documents into vectors during offline stages, making them suitable for efficient initial retrieval; interaction-based models jointly process queries and documents during the online stage, achieving deeper and more precise relevance matching but at higher computational cost, making them better suited for re-ranking a smaller set of candidate results [Xu et al., 2025]. Recognizing the complementary strengths of these two architectures, researchers further proposed hybrid models that combine the efficiency of representation-based methods with the effectiveness of interaction-based methods [Xu et al., 2025].

1.3 Pre-trained Transformers and the Representation Learning Revolution in IR

The advent of pre-trained Transformers has further reshaped IR model architectures. By network structure, they can be broadly categorized into three types:

Architecture	Representative Models	Characteristics
Encoder-only	BERT (Cross-Encoder)	Deep interaction, high ranking quality, but computationally expensive
Encoder-decoder	T5, BART	Balances understanding and generation capabilities
Decoder-only	GPT series	Strong generation capability

To bridge the effectiveness gap between efficient Bi-Encoders and powerful Cross-Encoders, Knowledge Distillation (KD) has been widely adopted in the IR domain. However, regardless of architectural evolution, retrieval systems always face a fundamental trade-off between effectiveness (recall, MRR, nDCG) and resource efficiency (latency, throughput, memory footprint, indexing and update costs). While neural retrievers can significantly improve effectiveness, they require additional computational and indexing engineering investment. Furthermore, Calibration mechanisms aim to align model output scores with true relevance probabilities, ensuring that confidence scores faithfully reflect correctness.

At this point, information retrieval has fully entered the representation learning era of deep learning. Pre-trained models have endowed retrieval systems with powerful semantic understanding capabilities, yet the fundamental paradigm remains user issues a query → system returns a ranked list of relevant documents. The emergence of large language models is poised to fundamentally alter this paradigm.

2. IR Systems in the LLM Era: The Technical Evolution Path of RAG

The fundamental paradigm of traditional information retrieval systems is to return a ranked list of relevant documents in response to user queries; the system itself is not responsible for understanding document content and generating coherent answers. Large Language Models (LLMs), while possessing powerful language understanding and generation capabilities, exhibit significant limitations when relying solely on their internal parametric knowledge. This complementarity gave rise to Retrieval-Augmented Generation (RAG), which combines the external knowledge acquisition capability of IR with the text generation capability of LLMs, forming an entirely new information processing paradigm.

2.1 Why RAG Is Needed

Before RAG, the field of information access went through two major phases, each with fundamental shortcomings.

Limitations of Pure LLMs (Chatbot Mode). When LLMs rely solely on parametric knowledge learned during pre-training to answer user queries, they face three core challenges: first, hallucination, where the model may generate plausible but inaccurate content; second, lack of timeliness, as the model has no knowledge of events beyond its training data cutoff and cannot provide up-to-date information; and third, context window limitations, where even with massive parameter counts, the context length the model can process in a single pass remains limited (typically 2K–32K tokens), making it difficult to comprehensively understand complex queries requiring integration of extensive background knowledge [Zhang et al., 2025b].

Limitations of Traditional IR Systems. While traditional search engines or retrieval systems can return large volumes of relevant documents based on indexing, their responsibility ends there — they are designed to “find” information, not to “understand” and “answer.” After obtaining a document list, users must still read, screen, and synthesize on their own to form a complete answer. For complex questions requiring cross-document correlation analysis or synthetic reasoning, the cognitive burden of this process is extremely high.

The Core Motivation of RAG. The central idea of RAG is Retrieve-then-Read: first use a retrieval system to retrieve relevant document fragments from an external knowledge base, then concatenate these retrieved texts as context into the LLM’s input prompt, guiding the model to generate responses based on these external factual inputs. This approach leverages the powerful language understanding and generation capabilities of LLMs while harnessing the retrieval system to ensure the factuality and timeliness of responses, simultaneously alleviating hallucination to some extent [Zhang et al., 2025b; Gao et al., 2024].

2.2 Naive RAG: The Simplest Form

The most rudimentary implementation of RAG is exceedingly straightforward: a user query enters the system, relevant fragments are retrieved from the document collection via keyword matching (such as TF-IDF or BM25), these fragments are concatenated with the original query to form a prompt, which is then passed to the LLM to generate a final answer. This design is conceptually clear and easy to implement, and can achieve reasonable results for simple factual question answering.

However, this naive “one-shot retrieval + direct generation” architecture quickly reveals its fragility in complex scenarios: the retrieval stage may return insufficiently relevant fragments due to inherent limitations in similarity computation; the generation stage struggles to deeply integrate retrieved content with the query, as multi-source similar information tends to produce redundancy or incoherent content; and the entire pipeline is static and linear, lacking the ability to adjust based on intermediate results [Gao et al., 2024]. These limitations prompted researchers to explore more advanced RAG variants.

2.3 The Technical Evolution Path of RAG

From the historical perspective of information retrieval, the way humans access information has evolved from Web Search (static keyword-based search) to LLM as Chatbot (pure parametric knowledge generation) to LLM with RAG (retrieval-augmented generation), and is now advancing toward Multi-hop Retrieval (iterative multi-hop retrieval) and Deep Research [Zhang et al., 2025b]. Deep Research emphasizes a dynamic feedback loop between reasoning and search: the reasoning process actively influences search strategies (e.g., refining queries based on intermediate deductions), while retrieved information in turn recursively refines the reasoning process.

From a technical perspective, the evolution of RAG can be summarized as a clear five-stage progression [Gao et al., 2024]:

Naive RAG → Advanced RAG → Modular RAG → Graph RAG → Agentic RAG

Stage	Key Characteristics	Core Improvements	Limitations
Naive RAG	Keyword retrieval (TF-IDF, BM25), results directly concatenated to prompt	Simple architecture, easy to implement	Lacks context awareness, fragmented output, limited scalability
Advanced RAG	Introduces dense vector retrieval models, vector search, context re-ranking, multi-hop iterative retrieval	Significantly improves retrieval quality and context relevance	Increased computational overhead, scalability still limited
Modular RAG	Decouples retrieval, ranking, and generation into independent reusable modules, supports hybrid retrieval strategies and external tool integration	Substantially improved system flexibility and scalability	Inter-module collaborative optimization still requires manual design
Graph RAG	Leverages graph node connectivity and edge relations to organize knowledge, enabling hierarchical knowledge management and structured navigation	Supports cross-document correlation, multi-hop reasoning, and global context understanding	High graph construction cost, scalability limited by graph scale, reliance on high-quality data
Agentic RAG	Introduces autonomous agents for iterative query refinement, adaptive retrieval strategy selection, and dynamic workflow orchestration	Possesses multi-step reasoning and autonomous decision-making capabilities	High coordination complexity, significant computational overhead, increased latency

Meanwhile, Li et al. (2025), from the perspective of the interplay between RAG and Reasoning, proposed a higher-level framework that categorizes their integration into three paradigms:

Paradigm	Core Idea	Direction
Reasoning-Enhanced RAG	Leverages reasoning capabilities to optimize RAG’s retrieval, integration, and generation stages	Reasoning → RAG
RAG-Enhanced Reasoning	Leverages external knowledge from retrieval to enhance LLM reasoning capabilities	RAG → Reasoning
Synergized RAG-Reasoning	Deep coupling of retrieval and reasoning, iterative alternating execution, forming a dynamic feedback loop	RAG ↔ Reasoning

These three paradigms reveal that RAG is not merely about “attaching an external retriever to an LLM,” but rather a process of continuous integration and mutual enhancement between retrieval and reasoning capabilities. Early RAG systems primarily belonged to the first two unidirectional enhancement modes, while current frontier trends are advancing toward the third mode of deep synergization.

2.4 Core Limitations of RAG

Despite continuous advances in RAG technology, traditional RAG systems (primarily Naive and Advanced RAG) face fundamental challenges across multiple dimensions. Synthesizing findings from multiple surveys, these limitations can be summarized along three dimensions:

Retrieval Dimension. A single retrieval pass cannot guarantee acquisition of all relevant information needed to resolve a query, and assessing the relevance of retrieval results is itself challenging. When queries involve complex knowledge requiring synthesis across multiple data sources, simultaneously satisfying the sufficiency and accuracy of retrieval becomes difficult [Li et al., 2025; Gao et al., 2024]. Furthermore, deep-seated limitations at the embedding level — including biases in training data leading to high-frequency task/language/domain preferences, poor performance of general-purpose corpus models in specialized domains (e.g., biomedicine, scientific literature), and insufficient capture of long-range structural information — further constrain the upper bound of retrieval quality [Zhang et al., 2025c].

Reasoning Dimension. Traditional RAG lacks genuine multi-step reasoning capabilities. The system cannot dynamically refine retrieval strategies based on intermediate insights or user feedback, making it difficult to handle tasks requiring complex logical chains and deep contextual understanding [Singh et al., 2026; Li et al., 2025]. Errors in early reasoning paths may propagate through subsequent steps, affecting the completeness of the final output [Zhang et al., 2025b]. Additionally, models struggle to maintain fidelity to retrieved evidence when conflicts arise between external retrieved evidence and internal parametric knowledge.

System Dimension. RAG pipelines are typically static and linear, lacking adaptive adjustment mechanisms, making it difficult to accommodate queries of varying complexity and dynamically updating knowledge bases [Li et al., 2025; Singh et al., 2026]. The entire workflow — from preprocessing and index construction to real-time retrieval and generation — faces efficiency bottlenecks in large-scale deployment [Zhang et al., 2025b]. Moreover, effectively integrating similar information retrieved from multiple sources, avoiding redundant output while ensuring stylistic and tonal consistency of generated content, remains a persistent practical challenge [Gao et al., 2024].

The Emergence of Enhanced Retrieval Modes. To overcome the limitations of single-pass retrieval, researchers proposed three enhanced retrieval modes [Gao et al., 2024]: Iterative Retrieval gradually enriches context through multiple retrieval-generation cycles; Recursive Retrieval progressively decomposes complex problems for deeper retrieval; and Adaptive Retrieval dynamically controls retrieval and generation behavior on demand. While these ideas had preliminary exploration in the first two generations of RAG, their potential was far from fully realized due to the static nature of system architecture. True breakthroughs came from two directions: first, using graph structures to explicitly model knowledge associations (Chapter 3, Graph-based RAG); and second, using autonomous agents to dynamically orchestrate retrieval and reasoning processes (Chapter 4, Agentic Search).

3. Graph-based RAG

When RAG needs to handle cross-document correlation, multi-hop reasoning, and global knowledge integration, traditional text-fragment-based indexing and retrieval methods face structural limitations. Graph-based RAG elevates RAG from “retrieving text fragments” to “navigating and reasoning within structured knowledge networks” by introducing graph structures to explicitly model inter-knowledge associations.

3.1 Definitions and Taxonomy of Graph-based RAG

3.1.1 Text-Attributed Graphs and Formal Definition

Graph-based RAG uniformly represents graph data as Text-Attributed Graphs (TAGs) [Peng et al., 2024]:

\[G = (V, E,\{x_v\}_{v\in V}, \{e_{i,j}\}_{i,j \in E})\]

where $V$ is the node set, $E \subseteq V \times V$ is the edge set, $A$ is the adjacency matrix, and ${x_v}$ and ${e_{i,j}}$ are the text attributes of nodes and edges, respectively. The objective of Graph-based RAG is to find the optimal answer $a^*$ given a query $q$ and a TAG $G$:

\[a^* = \arg\max p(a|q, G)\]

Through joint modeling, the generation probability of the answer can be decomposed as the product of the probability of retrieving a subgraph and the probability of generating an answer based on that subgraph:

\[p(a|q, G) \approx p_\phi(a|q, G^*)p_\theta(G^*|q, G)\]

3.1.2 Three-Category Taxonomy

Based on the role the graph plays in the RAG system, Graph-based RAG can be classified into three types [Zhang et al., 2025a]:

Type	Core Idea	Characteristics
Knowledge-based	Graph as knowledge carrier	Explicitly models domain knowledge and semantic relations; understands complex relations through graph transformations
Index-based	Graph as indexing tool	Organizes raw text through graphs; optimizes retrieval and global navigation
Hybrid	Combines strengths of both	Provides more advanced solutions for complex reasoning tasks

Graph-based RAG systems can reduce token usage by 26% to 97% compared to conventional methods when generating answers, with significant improvements in both speed and resource utilization [Zhang et al., 2025a].

3.2 The Three-Stage Workflow of Graph-based RAG

The workflow of Graph-based RAG can be summarized as three core stages: G-Indexing (graph-based index construction), G-Retrieval (graph-guided retrieval), and G-Generation (graph-enhanced generation) [Peng et al., 2024; Zhang et al., 2025a; Huang et al., 2026]. Rich optimization methods exist before and after each stage, enabling systematic improvement of retrieval and generation quality.

3.2.1 G-Indexing: Graph-based Index Construction

Pre-indexing: Data Preprocessing and Index Optimization

Index quality is directly determined by the effectiveness of raw data processing. Before constructing the graph index, systematic preprocessing and optimization of the data are required [Huang et al., 2026; Gao et al., 2024]:

Optimization Direction	Specific Methods	Description
Data Cleaning	Remove irrelevant/redundant information, supplement additional information	Improves index purity, reduces noise interference
Document Segmentation	Sliding Window, Fine-grained Segmentation	Balances context completeness with index granularity
Metadata Augmentation	Metadata Incorporation	Enriches index information with metadata, supporting subsequent filtering and routing

These preprocessing techniques establish the data foundation for subsequent graph index construction.

Index Method Taxonomy

Based on the degree of graph structure preservation, indexing methods can be classified into four categories [Peng et al., 2024; Zhang et al., 2025a]:

Method	Characteristics	Algorithms
Graph Indexing	Preserves complete graph structure	BFS, Shortest Path
Text Indexing	Converts graph data into text descriptions	Sparse Retrieval, Dense Retrieval
Vector Indexing	Converts to vector representations for efficiency	LSH (Locality Sensitive Hashing)
Hybrid Indexing	Combines all three approaches	—

Beyond graph indexing, Graph-based RAG can additionally leverage hierarchical index structures to establish multi-granularity retrieval support for documents, as well as knowledge graph indexing that uses knowledge graphs to organize document relations and enable structured navigation [Gao et al., 2024].

Knowledge Graphs as Structural Indices

In Graph-based RAG, knowledge graphs serve not merely as a knowledge representation form but as a powerful structural indexing mechanism for organizing inter-document relations and supporting structured navigation [Peng et al., 2024; Zhang et al., 2025a].

KG Construction Challenges. At the knowledge graph construction level, different scenarios face distinct challenges. Domain-specific corpora face a triple challenge: complex knowledge dependencies (domain knowledge progresses from foundational to advanced concepts layer by layer, requiring cross-reference analysis), domain specificity (dense technical terminology and abbreviations), and limited reference knowledge (private technical documents are difficult to obtain externally) [Sun et al., 2025]. For general-purpose knowledge graphs, Li et al. (2026) note that while LLM pipelines can extract entities and relations at scale, the resulting graphs often lack a shared schema, with entity types and relation vocabularies being ad hoc. To address this, ontology-oriented construction methods emphasize building schema as a first-class resource for downstream tasks from the outset, rather than as a byproduct of graph construction.

Two-View KG: The Ontological Dual-Layer Architecture

The Two-View KG concept proposed by Hao et al. (2019) reveals the ontological dual-layer architecture of knowledge graphs, simultaneously representing two complementary views:

Ontology View: The abstract concept layer, defining types, relation schemas, and other meta-knowledge
Instance View: The concrete entity layer, containing actual fact triples

Between the two views exist cross-view links that connect ontological concepts with their instantiated entities, while satisfying mutual disjointness constraints (the entity vocabulary set and concept vocabulary set are disjoint, as are the relation set and meta-relation set). This dual-layer structure provides Graph-based RAG systems with navigation paths from abstract concepts to concrete instances, enabling the system to both understand high-level semantic patterns and locate specific knowledge details.

Yeom et al. (2024) further note that a large number of knowledge graphs are essentially Two-View KGs: abstract classes in the ontology view form tree-like hierarchical structures through class inheritance, while concrete entities in the instance view are instantiated from ontological classes. This structured dual-layer representation holds significant value for Graph-based RAG tasks that must simultaneously leverage abstract concept reasoning and concrete fact verification.

3.2.2 G-Retrieval: Graph-Guided Retrieval

Pre-retrieval: Query Optimization

Before formal retrieval, the system can optimize queries through various means to improve retrieval effectiveness [Gao et al., 2024; Huang et al., 2026]:

Category	Sub-category	Description
Query Expansion	Multi-Query	Generates multiple related queries to expand retrieval coverage
	Sub-Query	Decomposes complex queries into sub-queries
	CoVe (Chain-of-Verification)	Verification chain expansion, progressively confirms query completeness
Query Transformation	Query Rewrite	Rewrites queries to improve retrieval effectiveness
	Query Routing	Routes queries to different processing paths based on query characteristics (via metadata filtering or semantic similarity)

Additionally, RRR (Rewrite-Retrieve-Read) uses dedicated small language models for query rewriting, with some methods further introducing external adapters to assist retriever-generator alignment [Gao et al., 2024]. Query optimization essentially clarifies and expands information needs before retrieval, reducing the semantic gap between queries and the index.

Retriever Types

By degree of parameterization, retrievers can be classified into three categories [Peng et al., 2024]:

Non-parametric Retriever: High retrieval efficiency, no training required
LM-based Retriever: E.g., fine-tuned RoBERTa models, balancing semantic understanding with efficiency
GNN-based Retriever: Leverages graph neural networks to capture structural information, but with higher computational cost

Retrieval Techniques

By the mode of interaction between queries and graph data, retrieval techniques can be classified into three categories [Zhang et al., 2025a]:

Type	Core Idea	Representative Methods
Semantics Similarity-based	Models similarity in discrete space (substring matching, regex) or embedding space (TF-IDF, Word2Vec)	Substring matching, TF-IDF, Word2Vec
Logical Reasoning-based	Uses rule mining, inductive logic programming, constraint satisfaction to reveal implicit insights	Rule Mining, ILP, Constraint Satisfaction
GNN-based	Uses graph neural networks for graph modeling and mining	GCN, GAT

While semantic similarity methods are simple to implement, they cannot fully exploit graph structure information, resulting in significant underestimation of the inherent advantages of graph databases [Zhang et al., 2025a].

Retrieval Paradigms and Granularity

Retrieval paradigms include Once Retrieval, Iterative Retrieval (with non-adaptive and adaptive variants), and Multi-Stage Retrieval [Huang et al., 2026]. Retrieval granularity spans four levels: Nodes, Triplets, Paths, and Subgraphs. The goal of granularity optimization is to balance relevance with efficiency: coarse-grained units provide richer context but may introduce redundancy and noise; fine-grained units are more semantically focused but may lack completeness and increase retrieval burden [Zhang et al., 2025a].

Retrieval Augmentation

To enhance retrieval, the system can perform Query Enhancement (query expansion, query decomposition) before retrieval or Knowledge Enhancement (knowledge merging, knowledge pruning) after retrieval.

Post-retrieval: Re-ranking and Context Compression

After retrieval and before generation, the retrieval results typically require processing to improve final generation quality [Gao et al., 2024]:

Reranking: Fine-grained relevance scoring of retrieved candidate results
Context Compression: Removing redundant information and compressing retrieval results to a context length suitable for LLM processing

Redundant information disrupts LLM generation quality, while excessively long contexts may cause the LLM to exhibit the “Lost in the Middle” problem — difficulty effectively utilizing information in the middle portions of long contexts [Gao et al., 2024].

3.2.3 G-Generation: Graph-enhanced Generation and Knowledge Integration

Knowledge Integration

Retrieved graph data must be transformed into natural language responses through appropriate generation strategies. G-Integration (knowledge integration) is the process of effectively fusing retrieval results with LLM generation capabilities. Zhang et al. (2025a) summarize the conventional RAG pipeline as comprising three core components: knowledge organization, knowledge retrieval, and knowledge integration. Knowledge integration occupies the pre-generation stage, and its quality directly impacts the accuracy and coherence of the final answer.

Generation Paradigms

Graph-enhanced generation can be categorized into three types [Peng et al., 2024]:

Type	Paradigm	Description
GNNs	—	First processes graph data with GNNs, encapsulating structural and relational information for LM comprehension
LMs	—	Language models generate the final text response
Hybrid	Cascaded Paradigm	Models sequentially process different aspects of the data
Hybrid	Parallel Paradigm	Models simultaneously receive inputs, process collaboratively; outputs merged via rules or another model

Generation Control: Context-Aware and Grounded Constraints

During generation, the system can actively control output quality through reasoning capabilities [Li et al., 2025]:

Control Direction	Description	Representative Methods
Context-Aware Generation	Selectively utilizes context, avoids interference from irrelevant information	Open-RAG, RARE, Self-Reasoning
Grounded Generation Control	Fact verification, citation generation, ensures output fidelity to retrieval evidence	RARR, TRACE, AlignRAG

Post-generation: Iterative Refinement and Verification

Generation is not the endpoint. Through Test-Time Scaling, the system can further refine output quality after generation [Zhang et al., 2025a]:

Iterative Self-Refinement: Multiple rounds of self-improvement, progressively correcting errors in generated content [Madaan et al., 2023]
Self-Consistency Decoding: Consistency verification across multiple decoding paths to select the most reliable answer [Hao et al., 2023]

These methods transform generation from a one-shot output into an iteratively optimizable process, which is particularly important in complex reasoning tasks.

3.3 Applicability of Graph-based RAG: When Are Graph Structures Needed?

Not all tasks require introducing graph structures. Xiang et al. (2026) systematically compared vanilla RAG and Graph-based RAG across different task complexity levels through GraphRAG-Bench, proposing a series of key observations.

3.3.1 Task Complexity and the Graph-based RAG Advantage Threshold

Observation	Core Conclusion	Task Type
Obs.1	Basic RAG and Graph-based RAG perform comparably on simple factual retrieval tasks	Simple factual retrieval
Obs.2	Graph-based RAG excels in complex tasks	Complex reasoning
Obs.3	Graph-based RAG ensures higher factual reliability in creative tasks	Creative generation
Obs.4	RAG is adept at extracting discrete facts from simple questions not requiring complex logic	Simple QA
Obs.5	As questions grow increasingly complex, the advantage of Graph-based RAG becomes evident	Increasing complexity

RAG performs excellently in scenarios requiring rapid access to discrete information, while Graph-based RAG excels at tasks requiring nuanced understanding of interconnected data [Xiang et al., 2026].

In terms of retrieval performance, a trade-off exists: Global Graph-based RAG achieves superior Evidence Recall (83.1%), accessing more relevant information; whereas RAG achieves superior Context Relevance (78.8%), with more focused retrieval results and less redundancy. This indicates that while Graph-based RAG retrieves broader information, its retrieval method inevitably introduces some redundancy [Xiang et al., 2026].

Additionally, different Graph-based RAG implementations produce index graphs with significant structural differences; compared to vanilla RAG, Graph-based RAG substantially increases prompt length, with prompt length exhibiting a clear upward trend as task complexity increases.

3.3.2 Practical Implications

Based on the above empirical findings, four scenario-specific recommendations can be summarized:

Simple Factual Retrieval: Vanilla RAG is sufficient; there is no need to incur the additional overhead of Graph-based RAG.
Complex Reasoning / Multi-hop Queries: Graph-based RAG offers clear advantages; graph structures can explicitly model cross-document associations.
Creative Generation: Graph-based RAG provides higher factual reliability, but attention must be paid to the trade-off between retrieval redundancy and context relevance.
Efficiency-Sensitive Scenarios: Prompt inflation in Graph-based RAG is a significant constraint, particularly in long-context or token-constrained environments.

4. Agentic Search

Core Paradigm Shift: From Fixed Pipelines to Autonomous Multi-turn RAG

	Traditional RAG	Agentic Search
Retrieval Turns	Single turn	Multi-turn
Retrieval Timing	Fixed (one retrieval before generation)	Dynamic (triggered on-demand during reasoning)
Query Construction	Raw query used directly	Agent dynamically constructs based on intermediate reasoning
Decision-Making Entity	Predefined workflow	LLM autonomous decision-making
Typical Paradigm	Retrieve → Generate	Reason ⟷ Retrieve ⟷ Reason ⟷ … → Generate

4.1 Definition and Core Components

The core characteristics of Agentic Search include: autonomous reasoning — the agent dynamically plans and adjusts its approach based on intermediate results rather than following preset patterns; on-demand retrieval — dynamically triggered based on uncertainty or information needs during the reasoning process; and iterative synthesis — retrieved information recursively refines reasoning, forming a feedback loop where reasoning and retrieval mutually reinforce each other.

The system comprises four core components:

Component	Function
LLM	Core reasoning engine, providing role definition and task understanding capabilities
Memory	Short-term memory maintains current reasoning context; long-term memory stores cross-session knowledge and preferences
Planning	Dynamically plans task step sequences through Reflection and Self-Critique
Tools	Retrieval backends (Dense RAG, GraphRAG, web search, etc.) form core infrastructure; additionally includes external capabilities such as API calls

Planning is the core differentiating component of Agentic Search. While traditional RAG follows a fixed “retrieve-then-generate” pattern, agents through Planning capability can autonomously decompose complex problems, prioritize information needs, and adjust strategies based on feedback during execution, transforming retrieval from passive data supply into an active reasoning resource.

4.2 Workflow Patterns: From Linear Reasoning to Graph-structured Exploration

The workflow design of Agentic Search determines how the system handles complex queries. From the macro control logic perspective, Singh et al. (2026) summarize five general patterns: Prompt Chaining (improving accuracy through sequential processing), Routing (routing to different processing strategies based on input characteristics), Parallelization (processing independent sub-tasks in parallel), Orchestrator-Workers (dynamically assigning tasks to worker threads), and Evaluator-Optimizer (iteratively evaluating and optimizing outputs).

Li et al. (2025), from the reasoning structure perspective, categorize workflows into theoretically more significant classes:

Workflow Type	Structural Characteristics	Representative Methods	Advantages	Limitations	Applicable Scenarios
Chain-based	Linear sequence, one retrieval per reasoning step	IRCoT, Rat, CoV-RAG, RAFT	Low latency, low token cost, easy caching	Error propagation, rapid context growth	Single-hop or short multi-hop QA
Tree-based (ToT)	Parallel exploration of multiple branches to hedge early errors	RATT, Tree of Clarifications, AirRAG	High recall, transparent hypothesis analysis	Quadratic cost, multiple retrieval calls	Ambiguous or multi-path tasks
Tree-based (MCTS)	Budget-aware exploration, focusing on promising branches	MCTS-RAG, SeRTS	Graceful anytime stopping	Parameter-dependent, may converge to suboptima	Deep search under strict budgets
Graph-based (Walk-on-Graph)	Efficient walks on explicit KG/document graphs	QA-GNN, LightRAG	Efficient on KGs, short paths	Requires high-quality KG, limited flexibility	Domain QA with existing KGs
Graph-based (Think-on-Graph)	LLM dynamically updates evidence graph, adaptive and verifiable	ToG, ToG-2.0, Graph-CoT	Node-level citation checking, high accuracy	High latency, search space explosion risk	Open-domain deep research

Chain-based methods, exemplified by Chain-of-Thought (CoT), structure reasoning into linear sequences of intermediate steps. However, relying solely on LLM parametric knowledge readily leads to error propagation, where small deviations at each step are amplified in subsequent steps. Tree-based methods hedge early errors through parallel exploration of multiple branches; Tree-of-Thought (ToT) allows multiple hypotheses to coexist and be evaluated simultaneously, while Monte Carlo Tree Search (MCTS) focuses exploration on the most promising branches through budget-aware strategies.

Graph-based methods represent deeper reasoning-retrieval coupling. Walk-on-Graph methods primarily rely on graph learning techniques for retrieval and reasoning, including GNNs (leveraging graph neural networks for graph modeling and retrieval reasoning) and lightweight graph techniques (vector indexing, PageRank, and other link-structure-based ranking methods). Think-on-Graph methods embed graph structures directly into the LLM reasoning loop, enabling the LLM to serve as a “reasoning field” on the graph, dynamically deciding which connected entity or relation to explore next, progressively constructing paths to the answer. The significant advantage of this approach lies in node-level citation checking and higher accuracy, at the cost of higher latency and potentially exploding search spaces.

4.3 Agent Orchestration and Training Paradigms

4.3.1 System Architecture Taxonomy

Based on the comprehensive classification of Singh et al. (2026) and Li et al. (2025), Agentic Search systems can be categorized by architectural complexity into multiple levels:

Type	Core Characteristics	Representative Methods	Applicable Scenarios
Single-Agent (Prompt-only)	ReAct loop, simple implementation	ReAct, Search-O1	Prototype demonstrations, simple queries
Single-Agent (SFT/RL)	Fine-tuning or reinforcement learning enhances retrieval and reasoning capabilities	Toolformer, Search-R1	Production systems, open-domain research
Multi-Agent (Decentralized)	Parallel expert collaboration, high recall	M-RAG, MDocAgent	Large-scale evidence aggregation across heterogeneous sources
Multi-Agent (Centralized)	Hierarchical manager coordinates sub-tasks	Chain of Agents	Complex tasks under strict budgets
Hierarchical	Strategic decision-making → delegation → aggregation	—	Scenarios requiring multi-level task decomposition
Adaptive	Dynamically selects strategies based on query complexity	—	Systems with diverse query types

Corrective and Adaptive are two behavioral enhancement modes: Corrective introduces self-correction mechanisms to improve document utilization; Adaptive uses a classifier to assess query complexity and dynamically switches between single-step, multi-step, or skip-retrieval modes.

4.3.2 Evolution of Training Paradigms

Agent capability acquisition has undergone a three-stage evolution:

Paradigm	Core Mechanism	Advantages	Limitations
Prompt-based	Prompt engineering defines retrieval and reasoning behavior	Simple, no training required	Constrained by fixed instruction patterns
SFT	Fine-tuning on reasoning-retrieval joint data	Higher precision than prompting	Requires large amounts of synthetic data, prone to overfitting
RL	Reward functions incentivize strategy discovery and adaptive optimization	Genuine agentic behavior	Difficult to define reward signals, expensive training

The fundamental distinction: Prompt-based and SFT rely on offline supervision and fixed patterns; RL-trained agents are incentivized to autonomously discover search strategies rather than being told how to search.

4.4 Are Graph Structures Necessary for Agentic Search? — RAGSearch Empirical Evidence

The core question posed by Fan et al. (2026) addresses a key debate in Agentic Search: Do we still need GraphRAG? — that is, can Agentic Search compensate for the absence of explicit graph structures through dynamic multi-turn retrieval and reasoning, thereby reducing dependence on high-cost GraphRAG?

Core Conclusion: Agentic search can partially compensate for missing structural information in dense RAG through iterative retrieval, but explicit graph retrieval remains essential for robust multi-hop reasoning. GraphRAG consistently provides stronger performance and greater stability in complex settings, while dense RAG, with its lower construction cost, remains a practical choice for general-purpose QA.

The following analysis supports this conclusion across three dimensions: the formal framework, eight empirical findings, and system case studies.

4.4.1 Formal Framework

RAGSearch formalizes Agentic Search as follows: given a query $q$, an LLM-equipped agent interacts with a retrieval backend $B$ (dense RAG or GraphRAG) over multiple turns. At each step, the agent decides whether to trigger retrieval or generate an answer based on reasoning history, with retrieved information appended to the reasoning sequence. Its core characteristics are: retrieval is executed dynamically rather than as one-time preprocessing; the same control logic can operate across different backends.

Two Implementation Paths:

Path	Mechanism	Representative Methods
Training-Free	Reasoning-driven on-demand search or Orchestrated multi-agent workflows	Search-o1, GraphSearch
RL-Based	GRPO training, Outcome-based + Format-based reward design	Search-R1, Graph-R1

4.4.2 Eight Core Findings

RAGSearch revealed the relationship between Agentic Search and retrieval backends through systematic experiments:

Finding	Conclusion	Practical Implication
Obs.1	Under single inference, dense RAG is effective for general QA; GraphRAG primarily provides decisive improvements in multi-hop QA	Task complexity determines GraphRAG necessity
Obs.2	Agentic search can enhance dense RAG and partially close the gap with GraphRAG, but effectiveness depends on agent design	Structured agentic design is key, not simply increasing interaction turns
Obs.3	RL-based training generally improves performance, but well-designed training-free pipelines remain competitive	Training cost and performance require trade-off
Obs.4	In training-free workflows, explicit graph structures provide consistent and significant benefits for multi-hop QA	The robust advantage of graph structures in zero-training settings cannot be overlooked
Obs.5	RL-based agentic performance is highly backend-dependent: graph retrievers gain larger improvements on multi-hop QA	RL + GraphRAG exhibits synergistic effects
Obs.6	GraphRAG is more robust and stable than dense RAG in agentic search	Explicit structures reduce agentic control uncertainty
Obs.7	GRPO is a favorable training paradigm for RL-based agentic systems	Validates GRPO’s effectiveness in retrieval augmentation
Obs.8	Larger backbones not only improve reasoning performance but also narrow the performance gap between GraphRAG and dense RAG	Larger models may reduce the marginal benefit of graph structures

4.4.3 System Case Studies

This conclusion is corroborated in concrete system designs. Agent-G dynamically assigns retrieval tasks to specialized agents, simultaneously leveraging both graph knowledge bases and text documents. GeAR enhances conventional retrievers through graph expansion and introduces an agent framework for managing graph-structured data retrieval tasks. These systems demonstrate the deep synergy between graph structures and Agentic Search: explicit graph structures not only provide higher-quality retrieval results but also offer interpretable knowledge relation paths for agent decision-making, reducing the uncertainty of agentic control.

4.5 Open Challenges and Future Directions

4.5.1 Core Challenges

Agentic Search should not be viewed as a universal replacement for traditional RAG. While it provides superior adaptability and multi-step reasoning capabilities, it also introduces coordination complexity, latency, and computational costs. Core challenges include:

Evaluation Difficulty — Output-level metrics are insufficient to measure Agentic system quality; multi-dimensional evaluation of reasoning trajectories, planning depth, adaptability, robustness, and cost-effectiveness is needed.
Long-term Memory Design — Knowledge drift, bias reinforcement, and frequent updates may amplify hallucination risks.
Coordination Complexity — Multi-agent collaboration introduces communication and consensus overhead.
Computational Overhead — Additional latency from agent reasoning is non-negligible in practical deployment.

Furthermore, a fundamental constraint is that agentic reasoning cannot compensate for persistently poor retrieval. Failures often stem from insufficient retrieval coverage, poorly constructed indices, or inadequate integration of structured and unstructured knowledge.

Optimal Application Domains: Agentic Search gains the strongest benefits in domains with structured knowledge and explicit constraints. Healthcare, finance, and legal analysis particularly benefit from combining retrieval with rule-based reasoning and graph-structured knowledge.

4.5.2 Efficiency and Latency Optimization

While Synergized RAG-Reasoning systems excel in complex reasoning, their iterative retrieval and multi-step reasoning loops may cause significant latency. Optimization directions include: budget-aware query planning — optimizing query strategies under strict API call or token budgets; memory-aware mechanisms — caching prior evidence or belief states to reduce redundant access.

4.5.3 Trustworthiness and Adversarial Robustness

Agentic Search systems remain vulnerable to adversarial attacks through poisoned or misleading external knowledge sources. Ensuring the trustworthiness of retrieved content is critical for maintaining fully reliable downstream reasoning. Systems need to establish credibility verification mechanisms for retrieved content, particularly in high-stakes scientific research and legal analysis scenarios.

4.5.4 Structured Data and Multi-agent Deep Research

Key future development directions include: iterative data organization — structuring intermediate search and reasoning content from agentic retrieval and reasoning processes to help agents maintain coherence and relevance in long-term contexts; and multi-agent deep research — leveraging graph structures to understand task requirements and role relationships, enabling effective task assignment and coordination. Graph structures help agents understand task requirements based on roles and relationships, support complex task decomposition, and make multi-agent collaboration more efficient.

5. Literature Retrieval Systems in Scientific Research

The volume of scientific literature is growing at an exponential rate — according to statistics, the number of scientific papers doubles every 17 years. This trend has rendered traditional literature retrieval methods increasingly inadequate, giving rise to a new generation of AI-driven academic search platforms and research intelligence frameworks.

5.1 Academic Search Platforms and Tools

The current academic search ecosystem can be categorized by function into two types: search and synthesis platforms (focused on literature discovery and content comprehension) and recommendation systems (focused on personalized delivery and trend tracking).

Platform	Core Functionality	Technical Characteristics
Elicit	Semantic search, paper summary extraction	AI-enhanced academic search
Consensus	Evidence synthesis, trend analysis	LLM-based scientific question answering
OpenScholar	Large-scale academic literature retrieval	Semantic search + open access
SciSpace	Paper summarization, multi-document information synthesis	Cross-document understanding and synthesis
Connected Papers	Visual literature graph exploration	Visualization based on bibliographic coupling and co-citation
ORKG ASK	Structured knowledge access	KG-organized structured retrieval, more interpretable than conventional LLM QA
Arxiv Sanity	Paper recommendation	Personalized delivery based on ML and IR techniques
Scholar Inbox	Personalized academic information subscription	Interest-customized literature streams
ResearchTrend.ai	Research trend discovery	Trend analysis and emerging direction identification
Research Rabbit	Visual literature exploration	Similarity-based literature network mapping

Mainstream technical approaches for recommendation systems include content-based filtering, collaborative filtering, and hybrid approaches. Notably, graph-structured systems such as ORKG ASK organize research contributions as structured data rather than unstructured text, offering unique advantages in interpretability.

5.2 Four Core Limitations of Academic Search

Despite the important role these platforms play in literature discovery, academic search still faces systemic challenges:

Limitation	Description
Data Quality and Coverage Gaps	Incomplete, non-standard, or outdated data sources lead to inaccurate and inconsistent retrieval information
Model Bias	Search and ranking algorithms inherit biases from training data, affecting the visibility of certain research domains
Scalability and Real-time Processing	Efficiently processing large-scale datasets while maintaining low latency and high retrieval accuracy is challenging
Matthew Effect Reinforcement	Established researchers receive disproportionate attention; algorithms may exacerbate academic inequality

Additionally, existing systems generally lack rigorous filtering options and advanced relevance ranking mechanisms. Many AI-assisted research tools rely on proprietary data, closed APIs, or evolving LLM backends, making strict reproducibility and long-term comparability difficult to ensure.

5.3 Research AI Frameworks and Evaluation Benchmarks

To address the above challenges, researchers have proposed a series of AI frameworks targeting the full research pipeline, covering literature retrieval, survey generation, hypothesis generation, and experimental automation:

Framework	Core Functionality	Technical Characteristics
LitSearch	Literature retrieval evaluation benchmark	Evaluates complex literature retrieval queries in ML and NLP domains
ResearchArena	Academic survey LLM Agent evaluation	Three-stage: Information Discovery → Selection → Organization
SciLitLLM	Scientific literature understanding enhancement	CPT + SFT hybrid strategy, domain knowledge injection
CiteME	Citation management	Automated citation discovery and management
ResearchAgent	Research hypothesis generation	Multi-hop reasoning-assisted research ideation
Agent Laboratory	End-to-end research automation	High success rates in data preparation, experimentation, and report writing; weak in literature review

The central tension facing current research AI frameworks is the trade-off between end-to-end automation capability and domain depth. Systems such as Agent Laboratory perform excellently in data preparation, experiment execution, and report writing, but exhibit significant performance degradation during the literature review stage — precisely the phase requiring the most structured evaluation and domain expertise. While SciLitLLM and ResearchArena demonstrate promising results, they remain insufficient for tasks demanding deep domain knowledge and nuanced understanding. These limitations indicate that automated literature review remains a far-from-solved challenge, requiring better balance between structured evaluation, domain expertise, and reproducibility.

6. References

[1] Abou Ali, et al. (2025). Agentic AI: a comprehensive survey of architectures, applications, and future directions.

[2] Bai, et al. (2023). Advancing abductive reasoning in knowledge graphs through complex logical hypothesis generation.

[3] Borrego, et al. (2025). Research hypothesis generation over scientific knowledge graphs.

[4] Eger, et al. (2025). Transforming science with large language models: a survey on AI-assisted scientific discovery, experimentation, content generation, and evaluation.

[5] Fan, et al. (2026). Do we still need GraphRAG? Benchmarking RAG and GraphRAG for agentic search systems.

[6] Gao, et al. (2024). Retrieval-augmented generation for large language models: a survey.

[7] Gridach, et al. (2025). Agentic AI for scientific discovery: a survey of progress, challenges, and future directions.

[8] Hambarde, et al. (2023). Information retrieval: recent advances and beyond.

[9] Hao, et al. (2019). Universal representation learning of knowledge bases by jointly embedding instances and ontological concepts.

[10] Huang, et al. (2026). A survey on retrieval-augmented text generation for large language models.

[11] Li, et al. (2025). Towards agentic RAG with deep reasoning: a survey of RAG-reasoning systems in LLMs.

[12] Li, et al. (2026). OntoKG: ontology-oriented knowledge graph construction with intrinsic-relational routing.

[13] Niu, et al. (2026). A comprehensive survey of knowledge graph reasoning: approaches and applications.

[14] Peng, et al. (2024). Graph retrieval-augmented generation: a survey.

[15] Singh, et al. (2026). Agentic retrieval-augmented generation: a survey on agentic RAG.

[16] Sun, et al. (2025). LKD-KGC: Domain-Specific KG Construction via LLM-driven Knowledge Dependency Parsing.

[17] Thakur, et al. (2021). BEIR: a heterogenous benchmark for zero-shot evaluation of information retrieval models.

[18] Xiang, et al. (2026). When to use graphs in RAG: a comprehensive analysis for graph retrieval-augmented generation.

[19] Xu, et al. (2025). A Survey of Model Architectures in Information Retrieval.

[20] Yeom, et al. (2024). Embedding two-view knowledge graphs with class inheritance and structural similarity.

[21] Zhang, et al. (2025a). A survey of graph retrieval-augmented generation for customized large language models.

[22] Zhang, et al. (2025b). From web search towards agentic deep research: incentivizing search with reasoning agents.

[23] Zhang, et al. (2025c). On the role of pretrained language models in general-purpose text embeddings: a survey.