C. Zongjie, L. Chunming
Radiology reports often include detailed findings and a concise impression that conveys the main clinical interpretation. This study evaluates whether transformer-based language models can generate impression-style summaries from the findings sections of chest X-ray reports. We fine-tuned BERT-to-BERT (BERT2BERT), GPT-2, and FLAN-T5 on the Indiana University Chest X-ray (IU~X-Ray) collection, using findings as source texts and impressions as reference summaries. After preprocessing, 3,108 report pairs were retained and split into training, validation, and test sets. We evaluated generated summaries with BLEU-4, ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore. FLAN-T5 achieved the strongest individual performance among the reported aggregate metrics, with a ROUGE-L of 0.375 and a BERTScore F1 of 0.879. A consensus re-ranking ensemble that selected the candidate most similar to the other model outputs achieved the highest overall scores, including ROUGE-L of 0.403 and BERTScore F1 of 0.891. These results suggest that model complementarity can improve automatic radiology report summarization, although clinical deployment would require radiologist review, prospective evaluation, and safeguards against factual omissions.
Pearl Academic Publishing. All rights reserved.
Content is licensed under a Creative Commons Attribution 4.0 License (CC-BY).
Privacy Policy | Terms of Service