NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews

Published in arXiv preprint arXiv:2503.00278, 2025

This paper presents NeuroLit Navigator, a neurosymbolic workflow that supports faster and more reproducible first-iteration article retrieval for systematic reviews.

Why it matters

  • Systematic review query development averages 8-15 hours, so the first search iteration is labor-intensive.
  • Systematic review publications are growing at 26% per year, increasing demand for scalable and reproducible search workflows.
  • General LLM tools struggle with domain-specific vocabulary, controlled terminology, and citation reliability.
  • Keyword-based and LLM-only retrieval systems often trade precision for recall (or vice versa) and often lack reproducibility.
  • The first iteration of article retrieval strongly influences downstream screening effort and overall systematic review quality.

What we did

  • Introduced NeuroLit Navigator, a neurosymbolic system combining domain-specific LLMs with biomedical knowledge graphs (MeSH, UMLS).
  • Used named entity recognition, knowledge graph expansion (up to two hops), and ClinicalBERT-based query expansion.
  • Applied semantic re-ranking with MPNet embeddings, which achieved 36% relevance, the highest among tested embedding choices.
  • Retrieved and presented the top k = 5 articles for structured librarian feedback.
  • Demonstrated a 90% reduction in initial search time while supporting controlled vocabulary, interpretability, and reproducibility.

How it works

  • User input + sentinel article: extract key biomedical entities via SciSpacy NER.
  • Vocabulary extension: expand entities using MeSH/UMLS terms within two hops, filtered by semantic similarity.
  • Query expansion: mask and substitute terms using ClinicalBERT to generate related variants.
  • Iterative query refinement: start specific and progressively relax constraints until a minimum result set is retrieved.
  • Retrieval + re-ranking: query PubMed via Entrez API and re-rank with MPNet embeddings, then return top k = 5 articles for librarian review.
Three-step neurosymbolic NeuroLit Navigator pipeline with NER, knowledge-graph vocabulary expansion, query expansion, and semantic re-ranking.
Figure 1. Three-step neurosymbolic pipeline combining NER, KG-based vocabulary extension, LLM query expansion, and semantic re-ranking.

The full neurosymbolic pipeline is illustrated in Figure 1.

Key contributions

  • A neurosymbolic SR retrieval pipeline integrating controlled vocabulary and LLM-based expansion.
  • A zero-shot query formulation approach that eliminates manual setup for first-iteration searches.
  • Empirical comparison of biomedical embedding models showing 36% relevance with MPNet.
  • Demonstrated 90% reduction in initial search time in librarian-facing deployment.
  • Comparative evaluation showing NeuroLit Navigator uniquely combines relevance, reproducibility, interpretability, and controlled vocabulary support.
SystemRelevance %RICV
Scite33%NoNoNo
Consensus38%NoNoNo
Perplexity33%NoNoNo
GEAR-Up26.6%NoYesNo
NeuroLit Navigator36%YesYesYes
Table 1. Comparison of relevance %, reproducibility (R), interpretability (I), and controlled vocabulary (CV) across SR tools.

As shown in Table 1, the system uniquely combines reproducibility and controlled vocabulary with competitive relevance.

Recommended citation: Vedant Khandelwal, Kaushik Roy, Valerie Lookingbill, Ritvik Garimella, Harshul Surana, Heather Heckman, and Amit Sheth. (2025). "NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews." arXiv preprint arXiv:2503.00278.
Download Paper