NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews
Published in arXiv preprint arXiv:2503.00278, 2025
This paper presents NeuroLit Navigator, a neurosymbolic workflow that supports faster and more reproducible first-iteration article retrieval for systematic reviews.
Why it matters
- Systematic review query development averages 8-15 hours, so the first search iteration is labor-intensive.
- Systematic review publications are growing at 26% per year, increasing demand for scalable and reproducible search workflows.
- General LLM tools struggle with domain-specific vocabulary, controlled terminology, and citation reliability.
- Keyword-based and LLM-only retrieval systems often trade precision for recall (or vice versa) and often lack reproducibility.
- The first iteration of article retrieval strongly influences downstream screening effort and overall systematic review quality.
What we did
- Introduced NeuroLit Navigator, a neurosymbolic system combining domain-specific LLMs with biomedical knowledge graphs (MeSH, UMLS).
- Used named entity recognition, knowledge graph expansion (up to two hops), and ClinicalBERT-based query expansion.
- Applied semantic re-ranking with MPNet embeddings, which achieved 36% relevance, the highest among tested embedding choices.
- Retrieved and presented the top k = 5 articles for structured librarian feedback.
- Demonstrated a 90% reduction in initial search time while supporting controlled vocabulary, interpretability, and reproducibility.
How it works
- User input + sentinel article: extract key biomedical entities via SciSpacy NER.
- Vocabulary extension: expand entities using MeSH/UMLS terms within two hops, filtered by semantic similarity.
- Query expansion: mask and substitute terms using ClinicalBERT to generate related variants.
- Iterative query refinement: start specific and progressively relax constraints until a minimum result set is retrieved.
- Retrieval + re-ranking: query PubMed via Entrez API and re-rank with MPNet embeddings, then return top k = 5 articles for librarian review.

The full neurosymbolic pipeline is illustrated in Figure 1.
Key contributions
- A neurosymbolic SR retrieval pipeline integrating controlled vocabulary and LLM-based expansion.
- A zero-shot query formulation approach that eliminates manual setup for first-iteration searches.
- Empirical comparison of biomedical embedding models showing 36% relevance with MPNet.
- Demonstrated 90% reduction in initial search time in librarian-facing deployment.
- Comparative evaluation showing NeuroLit Navigator uniquely combines relevance, reproducibility, interpretability, and controlled vocabulary support.
| System | Relevance % | R | I | CV |
|---|---|---|---|---|
| Scite | 33% | No | No | No |
| Consensus | 38% | No | No | No |
| Perplexity | 33% | No | No | No |
| GEAR-Up | 26.6% | No | Yes | No |
| NeuroLit Navigator | 36% | Yes | Yes | Yes |
As shown in Table 1, the system uniquely combines reproducibility and controlled vocabulary with competitive relevance.
Recommended citation: Vedant Khandelwal, Kaushik Roy, Valerie Lookingbill, Ritvik Garimella, Harshul Surana, Heather Heckman, and Amit Sheth. (2025). "NeuroLit Navigator: A Neurosymbolic Approach to Scholarly Article Searches for Systematic Reviews." arXiv preprint arXiv:2503.00278.
Download Paper
