ESSAY: Artificial Intelligence in Veterinary Radiology – Development, Performance and Ethics

Introduction

Artificial intelligence (AI) refers to the ability of machines to perform tasks that typically require human intelligence, such as learning, reasoning, and adapting. Common examples of AI include facial recognition software and the natural language processing algorithms that power large language models. A branch of AI known as machine learning allows systems to improve performance over time based on data. Among the various machine learning techniques, Convolutional Neural Networks (CNNs) stand out for their ability to mimic the neural pathways of the human brain when analysing images.1

AI, and specifically CNNs, have seen rapid growth in the field of veterinary radiology for dogs and cats in the past decade. They are being increasingly used by veterinarians to assist in radiographic analysis, acting as a “second pair of eyes”. This rise in AI use brings key concerns regarding their training, validation, performance, clinical utility and ultimately our ethical obligations as client-facing clinicians. 

In this article, we discuss the development and performance of AI applications in veterinary radiology, along with the challenges and opportunities presented by commercial services.

Training of AI algorithms

Convolutional neural networks (CNNs) function by learning and processing data through interconnected layers of nodes. This is akin to how we build interconnected layers of neurons in the brain. The typical role of AI when presented with a set of radiographs is to provide a diagnosis based on the probability of disease being present or absent. This type of machine learning is called classification. Another function is segmentation which can automatically define regions of interest (ROI), highlighting areas of concern on a radiograph. The learning process and dataset are therefore of vital importance because it will only perform as well as it is trained.

Training AI for radiology involves exposing the algorithm to a very large number of images labelled with the ground truth data that accurately represents the true pathology present. Ideally, ground truths are derived from confirmatory tests, such as histopathology, necropsy, surgery, and advanced imaging. If the dataset is too small or the model is overly simplistic, the algorithm may not capture the underlying patterns in the data, which is known as ‘underfitting’2, i.e., it hasn’t learned enough from the training set. Conversely, overfitting happens when the AI is too complex or trained for too long, causing it to memorise the training data, including noise and outliers resulting in excellent performance on the training data, but poor generalisation to new data. Both underfitting and overfitting reduce the algorithm’s ability to perform on different datasets e.g. real clinical cases. Once trained, algorithms are tested with a smaller subset of images and their performance measured with several metrics. These include measures such as sensitivity, specificity, predictive values, accuracy, and area under the receiver operating curve.1

Development of AI in veterinary radiology

The last decade has seen a surge of research around CNN development. Early studies in veterinary medicine were relatively simple and in 2013, McEvoy and Amigo developed and compared two algorithms to recognise canine pelvic radiographs.3 This was one of the first examples of classification using a ROI for anatomical detection. Two hundred images were used for training, comprising 120 “hip” and 80 “non-hip” radiographs, and 56 images were used for testing. Although these numbers are considered far too few to draw generalisable conclusions, simple metrics, namely classification error rate, sensitivity and specificity, revealed high performance for both. Despite the limitations, the study introduced the concept that AI could aid veterinarians in radiology. 

It wasn’t for another five years that AI was applied to canine thoracic radiographs. Yoon and colleagues showed that their algorithms could predict the presence of select pathologies (cardiomegaly, lung patterns, mediastinal shift, pleural effusion, pneumothorax) to high levels of accuracy and sensitivity, up to 96.9 per cent and 100 per cent, respectively.4 A larger dataset was used for training and testing, comprising 3,142 orthogonal images from 1,143 dogs, with half having cardiomegaly and half with a normal cardiac silhouette. The reported performance metrics are attractive, but care must be taken before presuming the findings can be directly extrapolated into daily general practice for a few reasons. Firstly, cases were retrospectively collected from one institution and only consisted of well-positioned, good quality radiographs. This markedly reduces generalisability because the AI algorithm is only trained to recognise pathology on high quality images that are obtained with specific radiographic settings and with a specific patient population. Secondly, the ground truth labels used for cases during training were the consensus opinions of three radiologists at the institution. No sampling, surgery or advanced imaging were performed for confirmation. This means that the algorithm will only perform as well as the radiologists who might have variable levels of expertise and are subject to human bias and error. Furthermore, cases where there was no general consensus were removed. These cases were most likely difficult to interpret, meaning the AI would probably also perform poorly with ambiguous radiographs. 

Further studies that have emerged over the last 5 years have also reported high performance but have similar drawbacks to those discussed above. For veterinary radiography, these include CNNs that have been created and tested to detect specific thoracic and musculoskeletal lesions in dogs, thoracic pathologies in cats and dental disease.1 

As well as research that compares AI to radiologists, the performance of AI has also been compared to veterinarians in general practice. Boissady and colleagues developed a CNN to detect 15 thoracic lesions in dogs and cats.5 Almost 16,000 radiographs from approximately 6,600 dogs were used for training the algorithm, with radiologist reports used for ground truth labelling. The AI algorithm’s error rates were compared with those of veterinarians with AI assistance and veterinarians alone. AI made fewer errors (10.7 per cent) than AI-assisted veterinarians (16.8 per cent) and veterinarians alone (17.2 per cent), who performed similarly. It was suggested that the slightly higher error rate of AI-assisted veterinarians may be due to a lack of trust surrounding AI. 

Kim and colleagues were the first to compare a commercial AI software to radiologists in the diagnosis of canine cardiogenic pulmonary oedema.6 The commercial AI software had high negative predictive values and low positive predictive values, which suggest it could be used for screening, but further investigation would be warranted when a positive diagnosis of pulmonary oedema was made. Subsequent studies have similarly indicated that AI has the potential to be used for lesion screening but are unable to replace radiologists. Of particular note is AI’s inability to make a global diagnosis for a patient, such as congestive heart failure which might constitute cardiomegaly, engorged pulmonary vasculature and pulmonary patterns. However, it is likely that in the future AI will be able to reliably augment clinician interpretation and triage when a radiologist is not readily available.

Current challenges of using commercial AI software in veterinary medicine

There are currently at least five companies offering commercially available AI software. They are attractive because clinicians can submit radiographs and receive an AI-generated written report within minutes, usually at a much lower cost compared to radiologist reports. However, delving into current research has exposed several potential pitfalls to their use. 

In addition to the lack of generalisability which is likely to result in lower performance metrics than advertised, clinicians risk having to face the ethical and legal obligations of an incorrect diagnosis that negatively impacts patient welfare. CNNs have been referred to as a “black box” because their complexity renders the exact process by which they learn elusive, even to their manufacturers.7 This means that mistakes made by the machine may be unknown and unfixable. Errors are especially detrimental when patient welfare is compromised, such as AI falsely diagnosing a foreign body obstruction requiring surgery or conversely missing the diagnosis of a foreign body obstruction that requires immediate surgery. In these situations, the question of who bears the responsibility of the misdiagnosis becomes very important. Commercial AI companies state they are not responsible, so the onus lies with the client-facing clinicians. For this reason, veterinarians need to make informed decisions when integrating AI into their workflow. 

In human medicine, AI is termed Software as a Medical Device (SaMD) and there are strict regulations for the creation, testing, and maintenance of these products, such as by the U.S. Food and Drug Administration (FDA).8,9 In contrast, there is currently no regulatory body for commercialisation of AI in veterinary medicine and these products are released by companies with no oversight. Additionally, there is no central nor public database for veterinary radiographs used to train AI, and companies lack data transparency. These preclude assessment of the algorithm’s applicability to realistic caseloads and limits our ability to improve data. 

Opportunities in the veterinary profession should revolve around tackling these challenges and in particular, regulations and performance standards should be developed to ensure reliability of AI algorithms.7 At the minimum, external validation of existing commercial applications should be performed before their widespread use. Finally, a multi-institutional and publicised database of radiographs is vital for rapid AI improvement.7 This would also increase transparency and trustworthiness of commercialised applications.

Conclusions

AI has made its way into the veterinary industry with services now available for radiology of dogs and cats, offering tools that could enhance radiographic interpretation for clinicians. However, there are serious ethical and legal considerations that veterinarians must address as they serve as the bridge between AI technology, patients and clients. The primary challenges involve accountability, responsibility, transparency and the absence of regulatory guidelines for AI use in veterinary medicine. It is essential for the veterinary community to develop a foundational understanding of AI and exercise caution when integrating these tools into clinical practice, as would be the approach to other new diagnostics or pharmaceutical treatment developments. By proactively engaging with these technologies and advocating for the establishment of regulations and performance standards, we can ensure that AI is used responsibly to improve patient care while upholding our ethical obligations.

References 

1. A. M. Hespel, Y. Zhang, P. S. Basran, Vet. Radiol. Ultrasound 63(S1), 817-827 (2022).

2. E. Hennessey, M. DiFazio, R. Hennessay, N. Cassel, Vet. Radio. Ultrasound 63(S1), 851-870 (2022).  

3. F. J. McEvoy, J. M. Amigo, Vet. Radiol. Ultrasound 54(2), 122-126 (2013).  

4. Y. Yoon, T. Hwang, H. Lee, Vet. J. 237, 43-48 (2018).

5. E. Boissady, A. de La Comble, X. Zhu, A. M. Hespel, Vet. Radiol. Ultrasound 61(6), 619-627 (2020).

6. E. Kim, A. J. Fischetti, P. Sreetharan, J. G. Weltman, P. R. Fox, Vet. Radiol. Ultrasound 63(3), 292-297 (2022).

7. E. B. Cohen, I. K. Gordon, Vet. Radiol. Ultrasound 63(S1), 840-850 (2022).

8. M. T. Contaldo, G. Pasceri, G. Vignati, L. Bracchi, S. Triggiani, G. Carrafiello, Diagnostics 14(14), 1506 (2024).9. H. B. Harvey, V. Gowda, Acad. Radiol 27(1), 58-61 (2020).

Doris Ma BVSc(dist) MvetClinStud – Radiology Resident – The Animal Hospital at Murdoch University

Doris Ma BVSc(dist) MvetClinStud – Radiology Resident -The Animal Hospital at Murdoch University


Josephine Faulkner MA VetMB PhD DipECVDI – Senior Lecturer Veterinary Diagnostic Imaging

School of Veterinary Medicine – Murdoch University

and
Steve Joslyn BSc BVMS DipECVDI MRCVS MANZCVS – Radiologist – The Animal Hospital at Murdoch University

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.