Advertisement
Mayo Clinic Proceedings Home
MCP Digital Health Home
Original Article|Articles in Press

Clinical Implementation of an Artificial Intelligence Algorithm for Magnetic Resonance–Derived Measurement of Total Kidney Volume

Open AccessPublished:March 16, 2023DOI:https://doi.org/10.1016/j.mayocp.2022.12.019

      Abstract

      Objective

      To evaluate the performance of an internally developed and previously validated artificial intelligence (AI) algorithm for magnetic resonance (MR)–derived total kidney volume (TKV) in autosomal dominant polycystic kidney disease (ADPKD) when implemented in clinical practice.

      Patients and Methods

      The study included adult patients with ADPKD seen by a nephrologist at our institution between November 2019 and January 2021 and undergoing an MR imaging examination as part of standard clinical care. Thirty-three nephrologists ordered MR imaging, requesting AI-based TKV calculation for 170 cases in these 161 unique patients. We tracked implementation and performance of the algorithm over 1 year. A radiologist and a radiology technologist reviewed all cases (N=170) for quality and accuracy. Manual editing of algorithm output occurred at radiology or radiology technologist discretion. Performance was assessed by comparing AI-based and manually edited segmentations via measures of similarity and dissimilarity to ensure expected performance. We analyzed ADPKD severity class assignment of algorithm-derived vs manually edited TKV to assess impact.

      Results

      Clinical implementation was successful. Artificial intelligence algorithm–based segmentation showed high levels of agreement and was noninferior to interobserver variability and other methods for determining TKV. Of manually edited cases (n=84), the AI-algorithm TKV output showed a small mean volume difference of –3.3%. Agreement for disease class between AI-based and manually edited segmentation was high (five cases differed).

      Conclusion

      Performance of an AI algorithm in real-life clinical practice can be preserved if there is careful development and validation and if the implementation environment closely matches the development conditions.

      Abbreviations and Acronyms:

      AI (artificial intelligence), ADPKD (autosomal dominant polycystic kidney disease), eGFR (estimated glomerular filtration rate), TKV (total kidney volume)
      With the rapid advancement and increasing availability of artificial intelligence (AI) algorithms in medicine and in radiology specifically, there has been growing interest and investigation into their potential clinical implementation. Much of the literature to date is focused on pre-implementation topics, including algorithm development and validation, usually in a controlled setting far removed from the clinical workflow. Full clinical implementation has not yet been widely achieved among radiology practices as it requires not only algorithm development and validation, but also integration into an already complex clinical imaging environment. Process evaluations regarding translating AI innovations from discovery and validation to an integrated component of the clinical workflow are currently lacking. This process involves new challenges, including how the algorithm is ordered, how it is triggered, how it is routed, how it is monitored, and how to educate all those who will be involved at various stages of the workflow. It is important that real-life performance, which exposes the process to a myriad of unpredictable variables, matches that of a more controlled pre-implementation environment.
      At our institution we have investigated a previously validated AI algorithm for magnetic resonance (MR)–derived measurement of total kidney volume (TKV) in autosomal dominant polycystic kidney disease (ADPKD) in clinical practice. Autosomal dominant polycystic kidney disease is the most common genetic cause of chronic kidney disease and TKV is an important prognostic biomarker.
      • Gabow P.A.
      Autosomal dominant polycystic kidney disease.
      • Harris P.C.
      • Torres V.E.
      Polycystic kidney disease.
      • Torres V.E.
      • Harris P.C.
      • Pirson Y.
      Autosomal dominant polycystic kidney disease.
      • Grantham J.J.
      • Chapman A.B.
      • Torres V.E.
      Volume progression in autosomal dominant polycystic kidney disease: the major factor determining clinical outcomes.
      • Fick-Brosnahan G.M.
      • Belz M.M.
      • McFann K.K.
      • Johnson A.M.
      • Schrier R.W.
      Relationship between renal volume growth and renal function in autosomal dominant polycystic kidney disease: a longitudinal study.
      • Grantham J.J.
      • Torres V.E.
      • Chapman A.B.
      • et al.
      Volume progression in polycystic kidney disease.
      • Bae K.T.
      • Shi T.
      • Tao C.
      • et al.
      Expanded imaging classification of autosomal dominant polycystic kidney disease.
      • Bae K.T.
      • Tao C.
      • Wang J.
      • et al.
      Novel approach to estimate kidney and cyst volumes using mid-slice magnetic resonance images in polycystic kidney disease.
      • Kistler A.D.
      • Poster D.
      • Krauer F.
      • et al.
      Increases in kidney volume in autosomal dominant polycystic kidney disease can be detected within 6 months.
      • King B.F.
      • Reed J.E.
      • Bergstralh E.J.
      • Sheedy 2nd, P.F.
      • Torres V.E.
      Quantification and longitudinal trends of kidney, renal cyst, and renal parenchyma volumes in autosomal dominant polycystic kidney disease.
      Along with age, TKV reliably predicts estimated glomerular filtration rate (eGFR) decline and is used to identify patients who would benefit from specific novel therapies.
      • Fick-Brosnahan G.M.
      • Belz M.M.
      • McFann K.K.
      • Johnson A.M.
      • Schrier R.W.
      Relationship between renal volume growth and renal function in autosomal dominant polycystic kidney disease: a longitudinal study.
      ,
      • Tangri N.
      • Hougen I.
      • Alam A.
      • Perrone R.
      • McFarlane P.
      • Pei Y.
      Total kidney volume as a biomarker of disease progression in autosomal dominant polycystic kidney disease.
      The process of clinical implementation of an AI algorithm, such as MR-derived measurement of TKV in ADPKD, involves multiple intersecting systems and people, including but not limited to patients, imaging equipment, technologists, digital data, radiologists, and referring clinicians. Successful exam ordering, image acquisition, algorithm processing, output reporting, and continuous quality assurance are all necessary for successful execution of the AI-assisted workflow.
      The potential for clinical implementation of AI algorithms is what drives scientific inquiry in this field but remains an understudied step. The purpose of this study is to evaluate the performance of an internally developed and previously validated AI algorithm for TKV in ADPKD when implemented in clinical practice.

      Patients And Methods

      The study was performed with institutional review board approval. Details regarding this AI algorithm have been published previously.
      • Kline T.L.
      • Edwards M.E.
      • Korfiatis P.
      • Akkus Z.
      • Torres V.E.
      • Erickson B.J.
      Semiautomated segmentation of polycystic kidneys in T2-weighted MR images.
      • Kline T.L.
      • Korfiatis P.
      • Edwards M.E.
      • et al.
      Image texture features predict renal function decline in patients with autosomal dominant polycystic kidney disease.
      • Kline T.L.
      • Korfiatis P.
      • Edwards M.E.
      • et al.
      Automatic total kidney volume measurement on follow-up magnetic resonance images to facilitate monitoring of autosomal dominant polycystic kidney disease progression.
      • Gregory A.V.
      • Anaam D.A.
      • Vercnocke A.J.
      • et al.
      Semantic instance segmentation of kidney cysts in MR images: a fully automated 3D approach developed through active learning.
      • Edwards M.E.
      • Blais J.D.
      • Czerwiec F.S.
      • Erickson B.J.
      • Torres V.E.
      • Kline T.L.
      Standardizing total kidney volume measurements for clinical trials of autosomal dominant polycystic kidney disease.
      • Edwards M.E.
      • Periyanan S.
      • Anaam D.
      • Gregory A.V.
      • Kline T.L.
      Automated total kidney volume measurements in pre-clinical magnetic resonance imaging for resourcing imaging data, annotations, and source code.
      • Kline T.L.
      • Edwards M.E.
      • Fetzer J.
      • et al.
      Automatic semantic segmentation of kidney cysts in MR images of patients affected by autosomal-dominant polycystic kidney disease.
      • Edwards M.E.
      • Chebib F.T.
      • Irazabal M.V.
      • et al.
      Long-term administration of tolvaptan in autosomal dominant polycystic kidney disease.
      Referring providers had the option to order AI-based TKV measurements when placing an abdominal imaging exam order (Table 1). One sequence in the exam, a routine clinical single-shot fast spin echo coronal sequence, was used by the AI algorithm (Seimens HASTE or GE SSFSE with fat saturation) for TKV calculation. AI-based segmented images were first reviewed by a medical image analyst (a certified computed tomography or MR technologist with extra training and expertise in three-dimensional image analysis and anatomic segmentation) and either accepted without any manual editing if AI segmentation was deemed to be optimal by visually comparing the output segmentation overlay to the organ borders slice-by-slice (“pass”) or manually edited (“rework”). This step was performed despite prior algorithm validation due to our commitment to extract and evaluate real-life performance metrics. A second quality control (QC) check of the output was performed by the reading radiologist. This radiologist could trigger the manual rework pathway if the radiology technologist had not, or the radiologist could accept the algorithm output or the radiology technologist–triggered manually edited segmentation if it had already been reworked. Segmentations were then approved and used to provide a report of right, left, and total kidney volumes.
      Table 1Scanner, Location, and Demographic Information
      BMI, body mass index; F, female; M, male.
      ,
      Values shown are n (%) and means ± SD as appropriate.
      InformationPass (n=86)Rework (n=84)
      Scanner
       Manufacturer and model
      Percentages for the manufacturer model are of the whole not the specific manufacturer.
      GE Medical Systems48 (55.8)56 (66.7)
      Optima MR450w27 (31.4)23 (27.4)
      Signa HDxt12 (14)20 (23.8)
      Discovery MR750w6 (7)12 (14.3)
      Discovery MR4501 (1.2)1 (1.2)
      Discovery MR7502 (2.3)0 (0)
      Siemens38 (44.2)28 (33.3)
      Skyra19 (22.1)14 (16.7)
      MAGNETOM Vida11 (12.8)9 (10.7)
      Aera7 (8.1)4 (4.8)
      MAGNETOM Sola1 (1.2)1 (1.2)
       Field strength, T
      1.548 (55.8)49 (58.3)
      338 (44.2)35 (41.7)
       Slice thickness, mm
      472 (83.7)71 (84.5)
      514 (16.3)13 (15.5)
       Location, Mayo Clinic
      Rochester67 (77.9)67 (79.8)
      Arizona8 (9.3)11 (13.1)
      Florida11 (12.8)6 (7.1)
       Demographics
      Sex
      F49 (57)62 (73.8)
      M37 (43)22 (26.2)
      Age, y43.4±13.547±15.4
       Race
      White78 (90.7)79 (94)
      Asian3 (3.5)2 (2.4)
      Black or African American0 (0)1 (1.2)
      Other3 (3.5)1 (1.2)
      Unknown2 (2.3)1 (1.2)
      Height, cm173±9.9169.7±9.5
      Weight, kg82.3±19.676.7±17.8
      BMI, kg/m223.7±223±2
       Kidney disease subtype
      Typical80 (93)70 (83.3)
      Atypical2 (2.3)6 (7.1)
      Unknown4 (4.7)8 (9.5)
      a BMI, body mass index; F, female; M, male.
      b Values shown are n (%) and means ± SD as appropriate.
      c Percentages for the manufacturer model are of the whole not the specific manufacturer.
      For inclusion, patients were required to be older than 18 years of age, have a previous diagnosis of ADPKD, and have an MR imaging examination ordered as part of standard clinical care. Patient International Classification of Disease -10 and -9 diagnosis codes were extracted from a Mayo Clinic internal database to confirm ADPKD diagnosis. A small subset of patients where ADPKD diagnosis could not be confirmed were grouped as “other,” including cystic and noncystic kidney disease, non–polycystic kidney disease (PKD) patients, as well as kidney transplant patients and those with autosomal recessive PKD diagnoses. There are two main subclassifications for ADPKD based on presentation: typical and atypical.
      • Bae K.T.
      • Shi T.
      • Tao C.
      • et al.
      Expanded imaging classification of autosomal dominant polycystic kidney disease.
      ,
      • Schönauer R.
      • Baatz S.
      • Nemitz-Kliemchen M.
      • et al.
      Matching clinical and genetic diagnoses in autosomal dominant polycystic kidney disease reveals novel phenocopies and potential candidate genes.
      Typical diffuse cystic ADPKD is classified by using height-adjusted TKV and age to identify patients with the highest risk of disease progression.
      • Irazabal M.V.
      • Rangel L.J.
      • Bergstralh E.J.
      • et al.
      Imaging classification of autosomal dominant polycystic kidney disease: a simple model for selecting patients for clinical trials.
      The five-group classification scale ranges from least severe (class 1A) to most severe (class 1E). Subtype and classification of ADPKD were assigned by a trained observer according to previous criteria.
      • Irazabal M.V.
      • Rangel L.J.
      • Bergstralh E.J.
      • et al.
      Imaging classification of autosomal dominant polycystic kidney disease: a simple model for selecting patients for clinical trials.
      Demographic information, including age, sex, race, and ethnicity was collected from Digital Imaging and Communications in Medicine metadata and/or an internal patient database. Patient-related kidney function data, including eGFR, serum creatinine, blood urea nitrogen (BUN), and albumin/creatinine ratio were also extracted. All patient research authorizations were confirmed before inclusion in the study.

      Statistical Analysis

      Statistical analyses were performed to determine both the performance of the AI-based segmentation tool compared with manually edited AI segmentation and any potential variables which may have been associated with a manually edited segmentation. The Shapiro-Wilk test (SciPy v1.5.4) was used to determine if data were normally distributed. All statistical analyses were performed using Python (v3.8.3) and the following modules: SciPy (v1.5.4), statmodels (v0.12.2), pydicom (v2.1.1), SimpleITK (v2.0.2), seaborn (v.0.11.0), and matplotlib (v.3.2.2).

      Algorithm Performance

      Algorithm performance was determined via comparison of AI-based and manually edited AI segmentations for the manually edited data only. Common image metrics of similarity (Dice coefficient [two times the area of overlap divided by the total number of pixels in both segmentations; minimum value(0), maximum(1)] and Jaccard index [size of intersection divided by size union; minimum value(0), maximum(1)]) and dissimilarity (volume difference, percent volume difference, surface distance [mean of all distances between every surface voxel across segmentations; values close to zero represent perfect overlap], and Hausdorff distance [greatest of all distances between all points between segmentations; values close to zero represent perfect overlap]) were computed (SimpleITK; v2.0.2). Bland-Altman plots (pingouin v0.4.12) were constructed to look at agreement, fixed bias, and any outliers, whereas linear regression (SciPy v1.5.4) assessed correlation between AI and manually edited AI TKV measurements. A scatter plot of the Dice coefficient vs corrected AI TKV was constructed to determine if kidney volume was related to AI-based segmentation performance. Finally, a one-sided Welch’s t-test was computed to determine if the AI-based segmentation was noninferior to manually edited AI segmentation (SciPy v1.5.4). Power calculations were performed to determine the sample size needed to observe a delta value for the noninferiority test. Tests were run across a range of clinically relevant delta values to arrive at a minimum significant delta of noninferiority.
      • Ahn S.
      • Park S.H.
      • Lee K.H.
      How to demonstrate similarity by using noninferiority and equivalence statistical testing in radiology research.

      Pass vs Rework Comparisons

      Scanner characteristics, patient demographics, and disease severity markers were investigated for association with either an AI-based segmentation accept or manually edited rework pathway. A χ2 test of independence compared distributions across accept and rework workflows for discrete variables (SciPyv1.5.4). Additionally, a two-sided Kolmogorov-Smirnov test was used to test for distributional differences between images which were accepted or sent for rework for continuous variables (SciPyv1.5.4). No adjustment for multiple comparisons was performed.

      Results

      Participants and Imaging

      From November 2019 to January 2021, a total of 33 nephrologists across three sites within our institution ordered MR imaging, requesting AI-based TKV calculation for 170 cases in 161 unique patients. There were seven patients who were imaged at different times throughout the study. Two patients were imaged three times, whereas the remaining five were imaged twice. For these cases, the time span between exams was 184±82 days (minimum was 105 days). Of the total 170 cases, output of AI-based segmentation in 86 cases was accepted without manual editing (pass), whereas 84 cases were manually edited (rework). The workflow diagram is shown in Figure 1. In total, 12 medical image analysts and 49 radiologists were involved in this study. The mean patient age was 45.2±14.5 years, and 65.3% (N=105) were female. Nephrologist-confirmed ADPKD subtype was typical in 88.2% (N=142) of patients and atypical in 4.7% (N=8). The remaining patients (7.1%, N=11) were excluded from classification for non-PKD, kidney transplant, or autosomal recessive–PKD. Images were acquired across two scanner manufacturers, GE Medical (61%, N=104) and Siemens (39%, N=66), and nine different models in total. Coronal Half-Fourier Acquisition Single-shot Turbo spin Echo (HASTE, Siemens) or Single-Shot Fast Spin Echo (SSFSE, GE) scan protocols were used. Images were collected across two field strengths, 1.5 T (57.1%, N=97) and 3 T (42.9%, N=73), and two different slice thicknesses, 4 mm (84.1%, N=143) and 5 mm (15.9%, N=27). Breakdown of scanner, site location, and demographics across pathways is shown in Table 1.
      Figure thumbnail gr1
      Figure 1Workflow diagram showing the roles the clinicians (green), radiologists (blue), magnetic resonance (MR) technologists (brown), and image analysts (yellow) played in the study. An arrow indicates the sequence of steps and the direction of the workflow. The clinician is positioned at the start and end of the workflow. PACS, picture archiving and communications system; QC, quality control; TKV, total kidney volume.

      Algorithm Performance

      To determine how well the AI algorithm for TKV performed, AI- and manually edited AI segmentations were compared. Most commonly, these corrections were minor segmentation alterations. Figure 2 presents exemplar MR images with TKV segmentation overlays (AI or manually edited) of maximum (Dice = 0.99) (Figure 2A), minimum (Dice = 0.77) (Figure 2B), and median (Dice = 0.98) (Figure 2C). Dice coefficients are shown in Table 2. The mean TKV difference was 34.0 mL (range, 413.8 to 415.4 mL) and the mean percent difference was 3.3% (range, 41.0% to 22.2%) (Table 2, Figures 3A and 3B ). AI and manually edited TKVs (mL) were highly correlated with a small volumetric offset, suggesting that most rework cases involved very minor corrections (slope = 1.0, intercept = 41.08, r2 = 0.99, P<.0001) (Figure 3C). Furthermore, the intraclass correlation coefficient between AI and manually edited TKV (mL) indicated excellent agreement (inter-rater intraclass correlation coefficient = 0.997). Dice scores were more variable with smaller corrected AI TKVs (Figure 3D). The mean Jaccard index was 0.926 (range, 0.63-0.99) (Table 2), the mean Hausdorff distance was 30.51 mm (range, 5.27-174.29 mm), and the mean surface difference was 1.68 mm (range, 0.06-18.43 mm) (Figure 3E). Finally, to confirm the AI approach was noninferior to previous non-AI–assisted segmentation approaches,
      • Kline T.L.
      • Edwards M.E.
      • Korfiatis P.
      • Akkus Z.
      • Torres V.E.
      • Erickson B.J.
      Semiautomated segmentation of polycystic kidneys in T2-weighted MR images.
      a noninferiority test was conducted. The noninferiority test was powered to a percent delta of 4.97% (2.5% one-sided type 1 error: 80% power). The percent TKV difference between AI and manually edited TKV was noninferior at a minimum percent delta of 4.80% and noninferior to previously determined inter-rater percent delta (6.21%), stereology percent delta (9.12%), and ellipsoid percent delta (22.27%) values (Figure 3F, inter-rater P<.001, stereology P<.0001, ellipsoid P<.0001).
      • Kline T.L.
      • Edwards M.E.
      • Korfiatis P.
      • Akkus Z.
      • Torres V.E.
      • Erickson B.J.
      Semiautomated segmentation of polycystic kidneys in T2-weighted MR images.
      Only 7.05% (12 of 170) of the total cases recorded differences outside the inter-rater delta (6.21%) range, yielding an approximation of the performance of the algorithm without a rework pathway. No increase in number of rework cases weras seen over time (Figure 4A ). In addition, only a small number of cases (N=7) changed image class pre/post rework (Figure 4B). These results indicate that our algorithm performs well and is noninferior to manual medical image analyst–corrected segmentations at an experimentally derived and clinically relevant delta value.
      Figure thumbnail gr2
      Figure 2Example images with artificial intelligence (AI)–generated and medical image analyst–corrected segmentations. A, The original computer tomography image (left), original image with AI-generated total kidney volume segmentation overlay (middle), and the original image plus medical image analyst–corrected AI overlay (right) from a case with max Dice score (0.99). Left kidney segmentation is shown in yellow and right kidney segmentation is shown in green. B, The minimum Dice score (0.77) example. C, The median Dice score (0.98) example.
      Table 2Similarity and Dissimilarity Metrics Between Initial Artificial Intelligence Segmentation Image and Reworked Image
      MetricMeanMinimumMedianMaximum
      Dice0.9590.7740.9770.999
      Jaccard0.9260.6310.9560.997
      Difference–34.004–413.843–13.201415.419
      Percent difference–3.318–41.003–1.44522.151
      Hausdorff distance, mm30.5125.26520.258174.289
      Mean surface distance, mm1.6790.0610.75018.433
      Figure thumbnail gr3
      Figure 3Overall performance of artificial intelligence (AI)–generated total kidney volume (TKV) segmentation compared with medical image analyst–corrected AI-generated TKV segmentation. A, Bland-Altman plots to evaluate absolute agreement between AI-generated segmentation and medical image analyst–corrected AI-generated segmentation. Mean difference between measures (blue dashed line); 95% CI for mean difference (shaded blue band); 95% limits of agreement (green dashed line; average ± 1.96 standard deviation of difference); 95% CI for limits of agreement (shaded green band). B, Same plots as A, but for percent difference between AI-generated TKV and medical image analyst–corrected AI-generated TKV. C, A linear regression of highly correlated AI-generated TKV, and medical image analyst–corrected AI-generated TKV (slope = 1.00, intercept = –41.08, r2 = 0.99, P<.0001). D, A scatter plot of medical image analyst–corrected AI-generated TKV (cc) by Dice score. E, Box plots with individual case scatter of similarity and dissimilarity metrics including Dice, Jaccard, Hausdorff distance (mm), mean surface distance (mm), and surface distance standard deviation. F, A noninferiority plot of the mean percent difference (±95% CI) between AI TKV and corrected AI TKV (gray dashed line = zero difference between methods; dark blue dashed line represents delta acquired from prior inter-rater agreement study; teal dashed line represents delta acquired from stereology measurements; pink dashed line represents delta acquired from ellipsoid measurements). Mean AI TKV and corrected AI TKV difference is noninferior to inter-rater, stereology, and ellipsoid deltas (one-sided t test; inter-rater P<.0001, stereology P<.0001, ellipsoid P<.0001).
      • Kline T.L.
      • Edwards M.E.
      • Korfiatis P.
      • Akkus Z.
      • Torres V.E.
      • Erickson B.J.
      Semiautomated segmentation of polycystic kidneys in T2-weighted MR images.
      Figure thumbnail gr4
      Figure 4Comparison of study date distributions between pass and rework and classification of typical autosomal-dominant polycystic kidney disease (ADPKD) pre- or post-rework pathway. A, Kernel density estimated distributions of study dates between pass (light blue) and rework (light green) pathways that are not significantly different (two-sample Kolmogorov-Smirnov test, P=.08). B, An agreement heatmap between artificial intelligence (AI) (pre-rework) and corrected AI (post-rework) typical ADPKD classification for all patients (weighted Cohen’s kappa = 0.86). Diagonal represents perfect agreement. The darker the shade of blue represents greater counts. QC, quality control.

      Determining Factors Associated With Rework Pathway

      To identify factors associated with a case being sent for rework, we compared scanner information across pass and rework pathways. No significant differences in scanner manufacturer (P=.20), manufacturer model (P=.43), field strength (P=.86), slice thickness (P>.99), and pixel spacing (P=.25) were observed (Supplemental Table 1, found online at http://www.mayoclinicproceedings.org). Comparison of additional imaging parameters, including repetition time, echo time/train length, flip angle, percent sampling, image size, number of images in acquisition, field of view, and patient position were all not significantly different (Supplemental Table 2, found online at http://www.mayoclinicproceedings.org).
      Furthermore, patient demographic factors across pass and rework pathways were compared. Age (P=.26), body mass index (BMI) (P=.06), race/ethnicity (P=.64), and study imaging date (P=.08) (Supplemental Figure A, found online at http://www.mayoclinicproceedings.org) were all not significantly different across AI (pass) and corrected AI (rework) pathways (Supplemental Table 1). Sex was the only measure we found that was significantly different across AI (pass) and corrected AI (rework) pathways (P=.03) (Supplemental Table 1). Females were overrepresented in the corrected AI pathway (73.8%) vs the AI pathway (57.0%) with significantly lower BMI (F [mean ± SD] = 22.42 ± 1.68; M [mean ± SD] = 25.11 ± 1.42; KS test, stat = 0.779, P<.0001) (Supplemental Figure A) and smaller total kidney volumes (F [mean ± SD] = 1299.69 ± 1072.60; M [mean ± SD] = 2153.25 ± 1835.19; KS test, stat = 0.26, P=.008) (Supplemental Figure B) compared with males.
      Typical ADPKD is classified at Mayo Clinic using height-adjusted TKV and age to identify patients with the highest risk of disease progression.
      • Irazabal M.V.
      • Rangel L.J.
      • Bergstralh E.J.
      • et al.
      Imaging classification of autosomal dominant polycystic kidney disease: a simple model for selecting patients for clinical trials.
      The five group classification scale ranges from least severe (class 1A) to most severe (class 1E). The pre-rework and post-rework classifications were compared to determine changes in classification and degree of change. Only a small percentage of rework cases changed classification assignment after rework (10.4%). Agreement between pre-rework and post-rework classification across all cases was high (weighted Cohen’s kappa = 0.86) (Supplemental Figure C). Pre-rework and post-rework classification agreement was higher in females (weighted Cohen’s kappa = 0.90) (Supplemental Figure C) than males (weighted Cohen’s kappa = 0.74) (Supplemental Figure D). Overall, no reclassification changes of greater than one class were observed (Supplemental Figure B).
      Kidney function was assessed by eG FR, serum creatinine, BUN, and albumin/creatinine ratio.
      • Grantham J.J.
      • Torres V.E.
      • Chapman A.B.
      • et al.
      Volume progression in polycystic kidney disease.
      ,
      • Tangri N.
      • Hougen I.
      • Alam A.
      • Perrone R.
      • McFarlane P.
      • Pei Y.
      Total kidney volume as a biomarker of disease progression in autosomal dominant polycystic kidney disease.
      We evaluated whether kidney disease severity was associated with images routing to the rework pathway (Supplemental Table 3, found online at http://www.mayoclinicproceedings.org). Total kidney volume distributions were not significantly different between pass and rework groups (P=.23). Measurements of eGFR (P=.87), creatinine (P=.56) (Supplemental Table 3), BUN (P=.81), and albumin/creatinine ratio (P=.45) were not significantly different between pass and rework pathways.

      Discussion

      Advances in AI in medicine remain weighted toward algorithm development and validation with large-scale clinical implementation still unrealized. Barriers to broad clinical adoption of AI algorithms include poor understanding of the steps involved in their implementation within a practice and a lack of data on their real-world performance. Coordinated interdisciplinary efforts to integrate algorithms into clinical workflows are necessary to drive the work of AI scientists to their full potential and to use algorithms for their intended purpose.
      We have shown the potential for successful clinical implementation of an AI algorithm into a complex radiology practice which required coordination of technical deployment, education of interdisciplinary stakeholders, extraction of real-life performance metrics, and analysis of impact on the intended clinical question. Our internally developed algorithm for MR-derived measurement of TKV in ADPKD was effectively integrated and performed as expected in the real-life clinical setting, proving to be noninferior to non-AI-assisted segmentation. In addition, without the AI tool, manual processing takes 60 to 90 minutes. Even in cases needing editing, the final metrics were now obtained in only a few minutes.

      Technical Deployment

      Technical deployment of the algorithm into the clinical workflow relied upon an integrated information technology team that could set up image filtering and routing rules based on specific inclusion criteria. In this study, routing rules were set up based on the MR series description, thereby only sending a single series for AI processing. Images moved downstream through our institutional orchestration engine,
      • Erickson B.J.
      • Langer S.G.
      • Blezek D.J.
      • Ryan W.J.
      • French T.L.
      DEWEY: the DICOM-enabled workflow engine system.
      and eventually to the medical image analysts for review before output routing to the radiologist and the picture archiving and communication system.

      Education of Stakeholders

      Communication and education for those involved in the AI algorithm clinical implementation are critical to success, both before any change and throughout implementation. For our algorithm, those primarily involved in the clinical workflow are the MR image–ordering clinician (nephrologist), the radiologist protocolling and interpreting the exam (including report of algorithm output), the MR technologist acquiring the images, and the medical image analysts responsible for review and possible segmentation editing.
      Educational materials were developed for each role. Learning modules were available electronically and included both text and graphic presentation of the background, rationale, and steps involved for algorithm implementation. Leaders from each stakeholder group (physicians and technologists) were identified to disseminate the information and act as resources for questions. For example, the radiologist proponent sent informational emails with links to modules, presented information at divisional meetings (including history of pre-implementation algorithm validation), communicated with residents and fellows, and fielded inquiries from radiologists and trainees in real time as cases arose in the clinical practice. Throughout the educational efforts, two messages were critical to adoption of this initiative: an emphasis on real-world patient benefits of this algorithm’s implementation; and a reassurance that it would not be onerous for the radiologist, despite the inherent discomfort that accompanies workflow change.

      Performance Metrics

      Our extraction of real-life performance metrics relied on review of each AI-based segmentation by a medical image analyst and a radiologist. Although approximately half of the cases (84 of 170, 49.4%) during the study period were manually edited, the mean percent volume difference was just -3.3%, indicating that corrections were minor. This also indicated that the technologists had a very low threshold for editing. Therefore, the 50% which were not reworked were accepted at a very high standard. The percent TKV difference between AI-based segmentation and manually edited segmentation was noninferior to previously determined inter-rater difference and to other clinically accepted methods for determining TKV (eg, stereology-based and/or ellipsoid-based measurements).

      Bias Analysis

      We investigated the rework cases where the class changed pre-/post-rework to determine if there was an underlying characteristic which led to the class being changed. The variables investigated included manufacturer, scanner model, field strength, location, sex, age, race, height, weight, continuous BMI, discrete BMI interpretation, algorithm TKV value (mL), eGFR (mL/min per body surface area, creatinine (mg/dL), BUN (mg/dL), and presence of polycystic liver disease (PLD). As rework caused a shift in class for 7 of 67 reworked cases, little concrete information was believed to likely result from this investigation. In all cases, histograms were generated. For the continuous variables, the values observed for the rework individuals where a class change occurred tended to be distributed throughout without obvious clustering in a given region.
      We investigated the influence of PLD in more detail. In particular, for the seven cases that switched image class, four had PLD (two with severe PLD), and three did not have PLD. Also, PLD prevalence in patients affected by PKD is ∼70%. We believe that severe PLD can often cause issues with, for example, assigning adjacent cysts to the right kidney or liver.

      Impact on Intended Clinical Question

      Another critical step in assessing the success of algorithm implementation is the analysis of its impact on intended clinical questions. Total kidney volume as an imaging biomarker in ADPKD is a valuable major variable for assignment of a disease severity class, a reliable and widely used predictor of future eGFR decline, and an important determinant of eligibility for certain therapies. In our study, the agreement for disease class assignment between AI-based segmentation and manually edited segmentation was high (with only five cases being assigned a different class). In the few cases of reclassification from manual editing, no changes greater than one class occurred. Given that the AI-based segmentations were shown to be noninferior to inter-rater difference and other methods of TKV calculation, we would expect a similar rate of reclassification if those methods were similarly investigated.

      Next Steps

      Whereas AI algorithm discovery, development, and initial validation can occur in isolation of a practice’s clinical workflow and real-time patient care, the application of these algorithms for true clinical impact cannot. Future work will include implementation of a workflow where the radiologist first reviews the cases and then triggers a pass or rework pathway, as well as the incorporation of additional analytics (eg, liver segmentation for total liver volume assessment).

      Conclusion

      Performance of an AI algorithm in a large radiology clinical practice can be preserved if careful attention is paid to validation of the algorithm during development and if the implementation environment closely matches the development conditions.

      Potential Competing Interests

      Drs Harris and Torres have received research support from Otsuka. The remaining authors report no potential competing interests.

      Acknowledgments

      The authors thank Lucy Bahn, PhD, for her assistance in the preparation of this manuscript.

      Supplemental Online Material

      References

        • Gabow P.A.
        Autosomal dominant polycystic kidney disease.
        N Engl J Med. 1993; 329: 332-342
        • Harris P.C.
        • Torres V.E.
        Polycystic kidney disease.
        Annu Rev Med. 2009; 60: 321-337
        • Torres V.E.
        • Harris P.C.
        • Pirson Y.
        Autosomal dominant polycystic kidney disease.
        Lancet. 2007; 369: 1287-1301
        • Grantham J.J.
        • Chapman A.B.
        • Torres V.E.
        Volume progression in autosomal dominant polycystic kidney disease: the major factor determining clinical outcomes.
        Clin J Am Soc Nephrol. 2006; 1: 148-157
        • Fick-Brosnahan G.M.
        • Belz M.M.
        • McFann K.K.
        • Johnson A.M.
        • Schrier R.W.
        Relationship between renal volume growth and renal function in autosomal dominant polycystic kidney disease: a longitudinal study.
        Am J Kidney Dis. 2002; 39: 1127-1134
        • Grantham J.J.
        • Torres V.E.
        • Chapman A.B.
        • et al.
        Volume progression in polycystic kidney disease.
        N Engl J Med. 2006; 354: 2122-2130
        • Bae K.T.
        • Shi T.
        • Tao C.
        • et al.
        Expanded imaging classification of autosomal dominant polycystic kidney disease.
        J Am Soc Nephrol. 2020; 31: 1640-1651
        • Bae K.T.
        • Tao C.
        • Wang J.
        • et al.
        Novel approach to estimate kidney and cyst volumes using mid-slice magnetic resonance images in polycystic kidney disease.
        Am J Nephrol. 2013; 38: 333-341
        • Kistler A.D.
        • Poster D.
        • Krauer F.
        • et al.
        Increases in kidney volume in autosomal dominant polycystic kidney disease can be detected within 6 months.
        Kidney Int. 2009; 75: 235-241
        • King B.F.
        • Reed J.E.
        • Bergstralh E.J.
        • Sheedy 2nd, P.F.
        • Torres V.E.
        Quantification and longitudinal trends of kidney, renal cyst, and renal parenchyma volumes in autosomal dominant polycystic kidney disease.
        J Am Soc Nephrol. 2000; 11: 1505-1511
        • Tangri N.
        • Hougen I.
        • Alam A.
        • Perrone R.
        • McFarlane P.
        • Pei Y.
        Total kidney volume as a biomarker of disease progression in autosomal dominant polycystic kidney disease.
        Can J Kidney Health Dis. 2017; 42054358117693355
        • Kline T.L.
        • Edwards M.E.
        • Korfiatis P.
        • Akkus Z.
        • Torres V.E.
        • Erickson B.J.
        Semiautomated segmentation of polycystic kidneys in T2-weighted MR images.
        AJR Am J Roentgenol. 2016; 207: 605-613
        • Kline T.L.
        • Korfiatis P.
        • Edwards M.E.
        • et al.
        Image texture features predict renal function decline in patients with autosomal dominant polycystic kidney disease.
        Kidney Int. 2017; 92: 1206-1216
        • Kline T.L.
        • Korfiatis P.
        • Edwards M.E.
        • et al.
        Automatic total kidney volume measurement on follow-up magnetic resonance images to facilitate monitoring of autosomal dominant polycystic kidney disease progression.
        Nephrol Dial Transplant. 2016; 31: 241-248
        • Gregory A.V.
        • Anaam D.A.
        • Vercnocke A.J.
        • et al.
        Semantic instance segmentation of kidney cysts in MR images: a fully automated 3D approach developed through active learning.
        J Digit Imaging. 2021; 34: 773-787
        • Edwards M.E.
        • Blais J.D.
        • Czerwiec F.S.
        • Erickson B.J.
        • Torres V.E.
        • Kline T.L.
        Standardizing total kidney volume measurements for clinical trials of autosomal dominant polycystic kidney disease.
        Clin Kidney J. 2019; 12: 71-77
        • Edwards M.E.
        • Periyanan S.
        • Anaam D.
        • Gregory A.V.
        • Kline T.L.
        Automated total kidney volume measurements in pre-clinical magnetic resonance imaging for resourcing imaging data, annotations, and source code.
        Kidney Int. 2021; 99: 763-766
        • Kline T.L.
        • Edwards M.E.
        • Fetzer J.
        • et al.
        Automatic semantic segmentation of kidney cysts in MR images of patients affected by autosomal-dominant polycystic kidney disease.
        Abdom Radiol (NY). 2021; 46: 1053-1061
        • Edwards M.E.
        • Chebib F.T.
        • Irazabal M.V.
        • et al.
        Long-term administration of tolvaptan in autosomal dominant polycystic kidney disease.
        Clin J Am Soc Nephrol. 2018; 13: 1153-1161
        • Schönauer R.
        • Baatz S.
        • Nemitz-Kliemchen M.
        • et al.
        Matching clinical and genetic diagnoses in autosomal dominant polycystic kidney disease reveals novel phenocopies and potential candidate genes.
        Genet Med. 2020; 22: 1374-1383
        • Irazabal M.V.
        • Rangel L.J.
        • Bergstralh E.J.
        • et al.
        Imaging classification of autosomal dominant polycystic kidney disease: a simple model for selecting patients for clinical trials.
        J Am Soc Nephrol. 2015; 26: 160-172
        • Ahn S.
        • Park S.H.
        • Lee K.H.
        How to demonstrate similarity by using noninferiority and equivalence statistical testing in radiology research.
        Radiology. 2013; 267: 328-338
        • Erickson B.J.
        • Langer S.G.
        • Blezek D.J.
        • Ryan W.J.
        • French T.L.
        DEWEY: the DICOM-enabled workflow engine system.
        J Digit Imaging. 2014; 27: 309-313