NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews

Published in arXiv preprint arXiv:2503.00278, 2025

This paper presents NeuroLit Navigator, a neurosymbolic workflow that supports faster and more reproducible first-iteration article retrieval for systematic reviews.

Why it matters

Systematic review query development averages 8-15 hours, so the first search iteration is labor-intensive.
Systematic review publications are growing at 26% per year, increasing demand for scalable and reproducible search workflows.
General LLM tools struggle with domain-specific vocabulary, controlled terminology, and citation reliability.
Keyword-based and LLM-only retrieval systems often trade precision for recall (or vice versa) and often lack reproducibility.
The first iteration of article retrieval strongly influences downstream screening effort and overall systematic review quality.

What we did

Introduced NeuroLit Navigator, a neurosymbolic system combining domain-specific LLMs with biomedical knowledge graphs (MeSH, UMLS).
Used named entity recognition, knowledge graph expansion (up to two hops), and ClinicalBERT-based query expansion.
Applied semantic re-ranking with MPNet embeddings, which achieved 36% relevance, the highest among tested embedding choices.
Retrieved and presented the top k = 5 articles for structured librarian feedback.
Demonstrated a 90% reduction in initial search time while supporting controlled vocabulary, interpretability, and reproducibility.

How it works

User input + sentinel article: extract key biomedical entities via SciSpacy NER.
Vocabulary extension: expand entities using MeSH/UMLS terms within two hops, filtered by semantic similarity.
Query expansion: mask and substitute terms using ClinicalBERT to generate related variants.
Iterative query refinement: start specific and progressively relax constraints until a minimum result set is retrieved.
Retrieval + re-ranking: query PubMed via Entrez API and re-rank with MPNet embeddings, then return top k = 5 articles for librarian review.

Three-step neurosymbolic NeuroLit Navigator pipeline with NER, knowledge-graph vocabulary expansion, query expansion, and semantic re-ranking. — Figure 1. Three-step neurosymbolic pipeline combining NER, KG-based vocabulary extension, LLM query expansion, and semantic re-ranking.

The full neurosymbolic pipeline is illustrated in Figure 1.

Key contributions

A neurosymbolic SR retrieval pipeline integrating controlled vocabulary and LLM-based expansion.
A zero-shot query formulation approach that eliminates manual setup for first-iteration searches.
Empirical comparison of biomedical embedding models showing 36% relevance with MPNet.
Demonstrated 90% reduction in initial search time in librarian-facing deployment.
Comparative evaluation showing NeuroLit Navigator uniquely combines relevance, reproducibility, interpretability, and controlled vocabulary support.

System	Relevance %	R	I	CV
Scite	33%	No	No	No
Consensus	38%	No	No	No
Perplexity	33%	No	No	No
GEAR-Up	26.6%	No	Yes	No
NeuroLit Navigator	36%	Yes	Yes	Yes

Table 1. Comparison of relevance %, reproducibility (R), interpretability (I), and controlled vocabulary (CV) across SR tools.

As shown in Table 1, the system uniquely combines reproducibility and controlled vocabulary with competitive relevance.

Recommended citation: Vedant Khandelwal, Kaushik Roy, Valerie Lookingbill, Ritvik Garimella, Harshul Surana, Heather Heckman, and Amit Sheth. (2025). "NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews." arXiv preprint arXiv:2503.00278.
Download Paper