Retrieval-Augmented Generation Assistant for Anatomical Pathology Laboratories
Downloads
Accurate and efficient access to laboratory protocols is essential in Anatomical Pathology (AP), where up to 70% of medical decisions depend on laboratory diagnoses. However, static documentation such as printed manuals or PDFs is often outdated, fragmented, and difficult to search, creating risks of workflow errors and diagnostic delays. This study proposes and evaluates a Retrieval-Augmented Generation (RAG) assistant tailored to AP laboratories, designed to provide technicians with context-grounded answers to protocol-related queries. We curated a novel corpus of 99 AP protocols from a Portuguese healthcare institution and constructed 323 question-answer pairs for systematic evaluation. Ten experiments were conducted, varying chunking strategies, retrieval methods, and embedding models. Performance was assessed using the RAGAS framework (faithfulness, answer relevance, context recall) alongside top-k retrieval metrics. Results show that recursive chunking and hybrid retrieval delivered the strongest baseline performance. Incorporating a biomedical-specific embedding model (MedEmbed) further improved answer relevance (0.74), faithfulness (0.70), and context recall (0.77), showing the importance of domain-specialized embeddings. Top-k analysis revealed that retrieving a single top-ranked chunk (k=1) maximized efficiency and accuracy, reflecting the modular structure of AP protocols. These findings highlight critical design considerations for deploying RAG systems in healthcare and demonstrate their potential to transform static documentation into dynamic, reliable knowledge assistants, thus improving laboratory workflow efficiency and supporting patient safety.
Downloads
[1] W.H.O. (2019). Guide for establishing a pathology laboratory in the context of cancer control (Tech. Rep.). World Health Organization (W.H.O.), Geneva, Switzerland.
[2] Suvarna, S. K., Layton, C., & Bancroft, J. D. (2019). Bancroft's theory and practice of histological techniques (8th Ed.). Elsevier, Amsterdam, Netherlands. doi:10.1016/C2015-0-00143-5.
[3] Paulino, A., Pedro, A. R., Roque, R., & Dias, S. (2022). Quality and performance indicators in Portuguese anatomical pathology laboratories: a panel validation by qualitative Delphi technique. BMJ Open Quality, 11(3), 1726. doi:10.1136/bmjoq-2021-001726.
[4] Labware. (2021). Overcoming paperless laboratory risks: A comprehensive guide. Labware, Wilmington, United States. Available online: https://www.labware.com/blog/risks-of-paper-based-laboratory-data (accessed on November 2025).
[5] eLabNext. (2024). Our guide to digitalizing lab protocols. eLabNext, Groningen, Netherlands. Available online: https://www.elabnext.com/blog/guide-digitalising-lab-protocols (accessed on November 2025).
[6] Dammavalam, S. R., Nukala, C., Thakkallapally, R. R., Anegama, L., & Ravikanti, M. K. (2022). Chatbot for healthcare system using artificial intelligence. International Journal of Research in Engineering, Science and Management, 5(8), 69–73.
[7] Yang, H. S., Wang, F., Greenblatt, M. B., Huang, S. X., & Zhang, Y. (2023). AI Chatbots in Clinical Laboratory Medicine: Foundations and Trends. Clinical Chemistry, 69(11), 1238–1246. doi:10.1093/clinchem/hvad106.
[8] Oche, A. J., Folashade, A. G., Ghosal, T., & Biswas, A. (2025). A systematic review of key retrieval-augmented generation (RAG) systems: Progress, gaps, and future directions. arXiv Preprint, arXiv:2507.18910. doi:10.48550/arXiv.2507.18910.
[9] Jang, C., Lee, H., Lee, S., & Lee, J. (2024). Calibrated decision-making through LLM-assisted retrieval. arXiv Preprint, arXiv:2411.08891. doi:10.48550/arXiv.2411.08891.
[10] Gargari, O. K., & Habibi, G. (2025). Enhancing medical AI with retrieval-augmented generation: A mini narrative review. DIGITAL HEALTH, 11. doi:10.1177/20552076251337177.
[11] Amugongo, L. M., Mascheroni, P., Brooks, S., Doering, S., & Seidel, J. (2025). Retrieval augmented generation for large language models in healthcare: A systematic review. PLOS Digital Health, 4(6), e0000877. doi:10.1371/journal.pdig.0000877.
[12] Quidwai, M. A., & Lagana, A. (2024). A RAG Chatbot for Precision Medicine of Multiple Myeloma. Medrxiv (Preprint), 1-13. doi:10.1101/2024.03.14.24304293.
[13] Bernardi, M. L., & Cimitile, M. (2024). Report Generation from X-Ray imaging by Retrieval-Augmented Generation and improved Image-Text Matching. 2024 International Joint Conference on Neural Networks (IJCNN), 1–8. doi:10.1109/ijcnn60899.2024.10650332.
[14] Pillay, T. S. (2025). Increasing the impact and value of laboratory medicine through effective and AI-assisted communication. EJIFCC, 36(1), 12.
[15] Martin, J. H., & Jurafsky, D. (2009). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition. Pearson/Prentice Hall, Upper Saddle River, United States.
[16] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30, 4-9 December 2017, Long Beach, United States.
[17] Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of Machine Learning Research, 21(140), 1-67.
[18] Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240. doi:10.1093/bioinformatics/btz682.
[19] Park, Y. J., Pillai, A., Deng, J., Guo, E., Gupta, M., Paget, M., & Naugler, C. (2024). Assessing the research landscape and clinical utility of large language models: a scoping review. BMC Medical Informatics and Decision Making, 24(1), 1–14. doi:10.1186/s12911-024-02459-6.
[20] Wang, C., Li, M., He, J., Wang, Z., Darzi, E., Chen, Z., Ye, J., Li, T., Su, Y., Ke, J., Qu, K., Li, S., Yu, Y., Liò, P., Wang, T., Wang, Y. G., & Shen, Y. (2025). A survey for large language models in biomedicine. Artificial Intelligence in Medicine, 170, 103268. doi:10.1016/j.artmed.2025.103268.
[21] Sudhi, V., Bhat, S. R., Rudat, M., & Teucher, R. (2024). RAG-Ex: A Generic Framework for Explaining Retrieval Augmented Generation. Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2776–2780. doi:10.1145/3626772.3657660.
[22] Niu, C., Wu, Y., Zhu, J., Xu, S., Shum, K., Zhong, R., Song, J., & Zhang, T. (2024). RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 10862–10878. doi:10.18653/v1/2024.acl-long.585.
[23] Marvin, G., Hellen, N., Jjingo, D., & Nakatumba-Nabende, J. (2024). Prompt Engineering in Large Language Models. Data Intelligence and Cognitive Informatics. ICDICI 2023, Algorithms for Intelligent Systems, Springer, Singapore. doi:10.1007/978-981-99-7962-2_30.
[24] Howard, J., & Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 328–339. doi:10.18653/v1/p18-1031.
[25] Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., ... & Kiela, D. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. Advances in Neural Information Processing Systems, 33, 9459-9474.
[26] Zhao, X., Liu, S., Yang, S.-Y., & Miao, C. (2025). MedRAG: Enhancing Retrieval-augmented Generation with Knowledge Graph-Elicited Reasoning for Healthcare Copilot. Proceedings of the ACM on Web Conference 2025, 4442–4457. doi:10.1145/3696410.3714782.
[27] Qu, R., Tu, R., & Bao, F. S. (2025). Is Semantic Chunking Worth the Computational Cost? Findings of the Association for Computational Linguistics: NAACL 2025, 2155–2177. doi:10.18653/v1/2025.findings-naacl.114.
[28] LangChain Docs. (2025). How to split text based on semantic similarity. LangChain Docs, San Francisco, United States. Available online: https://python.langchain.com/docs/how_to/semantic-chunker/ (accessed on November 2025).
[29] Trotman, A., Puurula, A., & Burgess, B. (2014). Improvements to BM25 and Language Models Examined. Proceedings of the 2014 Australasian Document Computing Symposium, 58–65. doi:10.1145/2682862.2682863.
[30] Reimers, N., & Gurevych, I. (2019). Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks. Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 3980–3990. doi:10.18653/v1/d19-1410.
[31] Balachandran, A. (2024). MedEmbed: Fine-tuned embedding models for medical/clinical IR. Hugging Face Blog, New York, United States. Available online: https://huggingface.co/blog/abhinand/medembed-finetuned-embedding-models-for-medical-ir (accessed on November 2025).
[32] Es, S., James, J., Espinosa Anke, L., & Schockaert, S. (2024). RAGAs: Automated Evaluation of Retrieval Augmented Generation. Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations, 150–158. doi:10.18653/v1/2024.eacl-demo.16.
[33] Ragas. (2025). List of available metrics. Ragas, San Francisco, United States. Available online: https://docs.ragas.io/en/stable/concepts/metrics/available_metrics/ (accessed on November 2025).
[34] Jiang, Z., Ma, X., & Chen, W. (2024). Longrag: Enhancing retrieval-augmented generation with long-context LLMs. arXiv Preprint, arXiv:2406.15319. doi:10.48550/arXiv.2406.15319.
[35] Finardi, P., Avila, L., Castaldoni, R., Gengo, P., Larcher, C., Piau, M., ... & Caridá, V. (2024). The chronicles of rag: The retriever, the chunk and the generator. arXiv Preprint, arXiv:2401.07883. doi:10.48550/arXiv.24012.07883.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.




















