Thyro-LMD: A Benchmark Dataset and Sample-Driven Data Loading, Attention, and Regularization for Long-Tailed Multi-Label Thyroid Ultrasound Diagnosis.
Developing robust and effective computer-aided diagnostic (CAD) methods for thyroid ultrasound (TUS) remains a key challenge in medical imaging. Prior work has largely focused on binary or multi-class lesion classification, whereas real-world diagnosis follows standardized guidelines based on combinations of lexicon-level descriptors. These combinations naturally exhibit long-tailed distributions due to epidemiological patterns, limiting the robustness and generalizability of existing methods. Motivated by this, we introduce Thyro-LMD, the first long-tailed multi-label dataset for TUS. Using histopathology as the reference, Thyro-LMD provides retrospective, fine-grained annotations aligned with ACR TI-RADS lexicons and reveals a highly imbalanced label distribution. We benchmark representative methods, including end-to-end models, general-purpose multimodal large models (e.g., GPT-4o), and pretrained foundation models. While some methods show reasonable head-class performance, they struggle with body and tail classes. We therefore propose SynTUS-Net, a purpose-built baseline comprising collaborative modules addressing long-tailed multi-label challenges across data loading, feature encoding, and prediction regularization. SynTUS-Net achieves leading performance on Thyro-LMD, outperforming conventional traditional SOTA models by 5.3 Micro-F1 and 11.83 Macro-F1, and exceeding GPT-4o by 42.76 on Tail-F1. Extensive ablation studies confirm the contribution of each module. We believe Thyro-LMD and SynTUS-Net establish a clinically grounded benchmark and a new paradigm for interpretable and generalizable AI in ultrasound. Code and data will be released here.