/
Contrastive Language-Image Models in Medical Image Understanding Contrastive Language-Image Models in Medical Image Understanding

Contrastive Language-Image Models in Medical Image Understanding - PowerPoint Presentation

ella
ella . @ella
Follow
64 views
Uploaded On 2024-01-29

Contrastive Language-Image Models in Medical Image Understanding - PPT Presentation

Deep Learning for Medical Applications IN2107 Student Kristina Diery Tutor Chantal Pellegrini Agenda 1 Introduction 11 Problem Statement 12 Contrastive Learning 2 Applications 21 Classification Retrieval ID: 1042911

learning amp medical image amp learning image medical contrastive language applicationsslide 2023deep chest training report global representations pre text

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Contrastive Language-Image Models in Med..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Contrastive Language-Image Models in Medical Image UnderstandingDeep Learning for Medical Applications (IN2107)Student: Kristina DieryTutor: Chantal Pellegrini

2. Agenda1. Introduction1.1 Problem Statement1.2 Contrastive Learning2. Applications2.1 Classification, Retrieval2.2 Detection, Segmentation2.3 Report Generation3. Review

3. Deep Learning for Medical Applications (IN2107)Introduction

4. Problem StatementMedical Image Understanding: accurate diagnosis prediction, treatment planning etc.Data with human annotations is rareDepend on fine-tuning weights from ImageNet pretrainingContrastive image-only learning not enough  high inter-class similarity January 26, 2023Deep Learning for Medical ApplicationsSlide 4Contrastive image-language learningexploit the natural occurring pair image + report from physicianslearns detailed image representations

5. Contrastive Learningself-supervised learningminimize distance between positive pairs, maximize distance between negative pairsModel learns attributes that are common between data classes and attributes that set apart data class from anotherJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 5ENCODERENCODERImageImageimage(unlabeled)ImageImageimage(unlabeled)distancerepresentationrepresentation

6. Contrastive Learning – image-language Input: image with corresponding text (e.g., X-ray image with radiology report)Contrastive loss with cosine similarityMinimizing = maximally preserves mutual information between true pairs under representation functionTwo similar contrastive Training loss = weighted combination of two losses averaged over all positive image-text pairs in each minibatch [1]January 26, 2023Deep Learning for Medical ApplicationsSlide 6[1] Zhang, Yuhao & Jiang, Hang & Miura, Yasuhide & Manning, Christopher & Langlotz, Curtis. (2020). Contrastive Learning of Medical Visual Representations from Paired Images and Text. Image-to-text contrastive lossText-to-image contrastive loss

7. Deep Learning for Medical Applications (IN2107)Applications:Classification, Retrieval

8. ConVIRT[1]Transfer learned image-encoder function to classification and retrieval tasksPretraining on 2 different datasets:Chest image encoderBone image encoderEvaluation on:4 medical classification tasks:RSNA Pneumonia [2]CheXpert [3]COVIDx [4]MURA [5]2 zero-shot retrieval tasks:Image-image retrievalText-image retrievalJanuary 26, 2023Slide 8Deep Learning for Medical Applications[1] Zhang, Yuhao & Jiang, Hang & Miura, Yasuhide & Manning, Christopher & Langlotz, Curtis. (2020). Contrastive Learning of Medical Visual Representations from Paired Images and Text. [2] Wang, Xiaosong & Peng, Yifan & Lu, Le & Lu, Zhiyong & Bagheri, Mohammadhadi & Summers, Ronald. (2017). ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. [3] Irvin, Jeremy & Rajpurkar, Pranav & Ko, Michael & Yu, Yifan & Ciurea-Ilcus, Silviana & Chute, Chris & Marklund, Henrik & Haghgoo, Behzad & Ball, Robyn & Shpanskaya, Katie & Seekins, Jayne & Mong, David & Halabi, Safwan & Sandberg, Jesse & Jones, Ricky & Larson, David & Langlotz, Curtis & Patel, Bhavik & Lungren, Matthew & Ng, Andrew. (2019). CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. Proceedings of the AAAI Conference on Artificial Intelligence. 33. 590-597. [4] Wang, Linda & Lin, Zhong & Wong, Alexander. (2020). COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Scientific Reports. 10.[5] Rajpurkar, Pranav & Irvin, Jeremy & Bagul, Aarti & Ding, Daisy & Duan, Tony & Mehta, Hershel & Yang, Brandon & Zhu, Kaylie & Laird, Dillon & Ball, Robyn L. & Langlotz, Curtis & Shpanskaya, Katie & Lungren, Matthew P. & Ng, Andrew Y. (2018) MURA: Large dataset for abnormality detection in musculoskeletal radiographs.

9. ConVIRT[1] – image classificationJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 9[1] Zhang, Yuhao & Jiang, Hang & Miura, Yasuhide & Manning, Christopher & Langlotz, Curtis. (2020). Contrastive Learning of Medical Visual Representations from Paired Images and Text.

10. January 26, 2023Slide 10Deep Learning for Medical ApplicationsSearch for images particular category with query image or query text[1] Zhang, Yuhao & Jiang, Hang & Miura, Yasuhide & Manning, Christopher & Langlotz, Curtis. (2020). Contrastive Learning of Medical Visual Representations from Paired Images and Text. ConVIRT[1] – zero-shot image retrieval

11. ConVIRT[1] – image-only comparisonJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 11[1] Zhang, Yuhao & Jiang, Hang & Miura, Yasuhide & Manning, Christopher & Langlotz, Curtis. (2020). Contrastive Learning of Medical Visual Representations from Paired Images and Text. [6] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Georey Hinton. (2020a )A simple framework for contrastive learning of visual representations.[7] Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. (2020b) Improved baselines with momentum contrastive learning. Saliency maps on sampled images[6][7]

12. GLoRIA [8]Global and local representationsattention weights: word to regionTrained with CheXpert datasetImage Classification: supervised image classificationZero-shot classification Image-text retrievalSegmentationEvaluation on:CheXpertRSNA PneumoniaSIIM PneumothoraxJanuary 26, 2023Slide 12Deep Learning for Medical Applications[8] Huang, Shih-Cheng & Shen, Liyue & Lungren, Matthew & Yeung, Serena. (2021). GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. Global-Local Representations for Images using Attention

13. GLoRIA – Supervised Classification [8]January 26, 2023Deep Learning for Medical ApplicationsSlide 13[8] Huang, Shih-Cheng & Shen, Liyue & Lungren, Matthew & Yeung, Serena. (2021). GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. Results of fine-tuned image classificationGLoRIA performs better with Linear Classifier trained on 1% of dataset than ImageNet with Linear Classifier trained with 100% of dataset Training with global & local representation: better global representations for label-efficient classification

14. GLoRIA – zero-shot Classification [8]January 26, 2023Deep Learning for Medical ApplicationsSlide 14[8] Huang, Shih-Cheng & Shen, Liyue & Lungren, Matthew & Yeung, Serena. (2021). GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. CheXpert: better F1 score than fine-tuned modelsRSNA comparable to supervised models fine-tuned with 1% training dataZero-shot image classification on CheXpert

15. GLoRIA [8] – Retrieval Image as input query, retrieve target reports by computing similarity between query image and all candidate reportsComparable to ConVIRT if only using global representationsBest overall: leveraging global and local representationsJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 15[8] Huang, Shih-Cheng & Shen, Liyue & Lungren, Matthew & Yeung, Serena. (2021). GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition.

16. ConclusionNo additional expert input necessaryHigher-quality in-domain image representations Global and local representations provide complement semantic informationLocal contrastive representation better capture subtle visual features Data efficiency: same level of accuracy with less dataJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 16

17. Deep Learning for Medical Applications (IN2107)Applications:Detection, Segmentation

18. Radiological Reports Improve Pre-training for localized imaging tasks [9]Comparison of different pre-training methods on 18 localized tasksupervised, image-only contrastive learning and image-text contrastive learningCLIP[10]: same to ConVIRT but: attention pooling instead of global average poolingJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 18[9] Müller, Philip & Kaissis, Georgios & Zou, Congyu & Rueckert, Daniel. (2022). Radiological Reports Improve Pre-training for Localized Imaging Tasks on Chest X-Rays. [10] Radford, Alec & Kim, Jong & Hallacy, Chris & Ramesh, Aditya & Goh, Gabriel & Agarwal, Sandhini & Sastry, Girish & Askell, Amanda & Mishkin, Pamela & Clark, Jack & Krueger, Gretchen & Sutskever, Ilya. (2021). Learning Transferable Visual Models From Natural Language Supervision.

19. Object Detection Comparison [10]ConVIRT and CLIP best on linear tasks ConVIRT and CLIP good on frozen evaluation protocolAdvantages if annotated data not availableImage-language better than image-only contrastive learningJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 19[10] Müller, Philip & Kaissis, Georgios & Zou, Congyu & Rueckert, Daniel. (2022). Radiological Reports Improve Pre-training for Localized Imaging Tasks on Chest X-Rays. [3] Irvin, Jeremy & Rajpurkar, Pranav & Ko, Michael & Yu, Yifan & Ciurea-Ilcus, Silviana & Chute, Chris & Marklund, Henrik & Haghgoo, Behzad & Ball, Robyn & Shpanskaya, Katie & Seekins, Jayne & Mong, David & Halabi, Safwan & Sandberg, Jesse & Jones, Ricky & Larson, David & Langlotz, Curtis & Patel, Bhavik & Lungren, Matthew & Ng, Andrew. (2019). CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. Proceedings of the AAAI Conference on Artificial Intelligence. 33. 590-597.

20. CheXpert[3] outperformed by non supervised methods  contrastive learning should be preferredImage-only great baseline  good if no text availableImage-language with 100% pretraining data outperforms contrastive image-onlyContrastive Image-language learning can reduce number of training sample for same resultsJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 20Segmentation Comparison [10][10] Müller, Philip & Kaissis, Georgios & Zou, Congyu & Rueckert, Daniel. (2022). Radiological Reports Improve Pre-training for Localized Imaging Tasks on Chest X-Rays. [3] Irvin, Jeremy & Rajpurkar, Pranav & Ko, Michael & Yu, Yifan & Ciurea-Ilcus, Silviana & Chute, Chris & Marklund, Henrik & Haghgoo, Behzad & Ball, Robyn & Shpanskaya, Katie & Seekins, Jayne & Mong, David & Halabi, Safwan & Sandberg, Jesse & Jones, Ricky & Larson, David & Langlotz, Curtis & Patel, Bhavik & Lungren, Matthew & Ng, Andrew. (2019). CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. Proceedings of the AAAI Conference on Artificial Intelligence. 33. 590-597.

21. GLoRIA [8] – Segmentation Adaption U-Net architecture: Encoder with weights from pretrained image encoderLearned representations effective for segmentation taskJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 21Dice score on SIIM dataset[8] Huang, Shih-Cheng & Shen, Liyue & Lungren, Matthew & Yeung, Serena. (2021). GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition.

22. ConclusionNo single best pre-training method for all localized tasksIn-domain pre-training always outperforms learning from natural images (even for smaller datasets)Contrastive learning usually outperforms other methodsImage-language can reduce number of required training samplesImage-only good baseline, image-language should be preferredJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 22

23. Deep Learning for Medical Applications (IN2107)Applications:Report Generation

24. RepsNet [11]Encoder-decoder modelVisual Question answering formulation: answer categorized via classificationPre-trained image-language models to interpret imagesClose-ended and open-ended answersJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 24[11] Tanwani, Ajay & Barral, Joelle & Freedman, Daniel. (2022). RepsNet: Combining Vision with Language for Automated Medical Reports.

25. RepsNet [11]January 26, 2023Deep Learning for Medical ApplicationsSlide 25[11] Tanwani, Ajay & Barral, Joelle & Freedman, Daniel. (2022). RepsNet: Combining Vision with Language for Automated Medical Reports. Outperforms competing methods across all datasetsPerformance increase with use of pretrained modelsPerforms better than state-of-the-art methods across all BLEU scoresBLEU scores for medical report generation IU-Xray dataset

26. CXR-RePaiR [12]January 26, 2023Deep Learning for Medical ApplicationsSlide 26[12] Endo, M., Krishnan, R., Krishna, V., Ng, A.Y. & Rajpurkar, P.. (2021). Retrieval-Based Chest X-Ray Report Generation Using a Pre-trained Contrastive Language-Image Model. [10] Radford, Alec & Kim, Jong & Hallacy, Chris & Ramesh, Aditya & Goh, Gabriel & Agarwal, Sandhini & Sastry, Girish & Askell, Amanda & Mishkin, Pamela & Clark, Jack & Krueger, Gretchen & Sutskever, Ilya. (2021). Learning Transferable Visual Models From Natural Language Supervision. Contrastive X-ray-Report Pair RetrievalRetrieval based report generationNo natural language generationChooses from corpus of report or corpus of report sentencesUses contrastive language Image pre-training (CLIP[4])

27. CXR-RePaiR – Results [12]January 26, 2023Deep Learning for Medical ApplicationsSlide 27[12] Endo, M., Krishnan, R., Krishna, V., Ng, A.Y. & Rajpurkar, P.. (2021). Retrieval-Based Chest X-Ray Report Generation Using a Pre-trained Contrastive Language-Image Model. [10] Radford, Alec & Kim, Jong & Hallacy, Chris & Ramesh, Aditya & Goh, Gabriel & Agarwal, Sandhini & Sastry, Girish & Askell, Amanda & Mishkin, Pamela & Clark, Jack & Krueger, Gretchen & Sutskever, Ilya. (2021). Learning Transferable Visual Models From Natural Language Supervision. Prone to Repeating informationUsing sentences from reports better than whole reportsPretrained on natural images and then radiology-report pairs best

28. ConclusionJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 28Good baseline for further improvement of report generationContrastive image-language learning boosts performance of VQA and NL generation tasksTransferable representation learning powerful for generating accurate, clear reportsProvides state-of-the-art results for report generation

29. Deep Learning for Medical Applications (IN2107)Review

30. ReviewStrong in all application fields outperforming state-of-the-art modelsPretrained contrastive image-language models always boosts performanceText gives important contextRepresentations with higher qualityBetter focus on relevant areasData efficiencyEffective use if annotated data rare or not availableLess data needed than in supervised modelsShows efficient use of multi-modal dataFurther improvement on contrastive image-language learning pre-training methods may have even more positive impact on end task performanceJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 30

31. Deep Learning for Medical Applications (IN2107)Thank you for listeningAny Questions?

32. ReferencesJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 32[1] Zhang, Yuhao & Jiang, Hang & Miura, Yasuhide & Manning, Christopher & Langlotz, Curtis. (2020). Contrastive Learning of Medical Visual Representations from Paired Images and Text. [2] Wang, Xiaosong & Peng, Yifan & Lu, Le & Lu, Zhiyong & Bagheri, Mohammadhadi & Summers, Ronald. (2017). ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. [3] Irvin, Jeremy & Rajpurkar, Pranav & Ko, Michael & Yu, Yifan & Ciurea-Ilcus, Silviana & Chute, Chris & Marklund, Henrik & Haghgoo, Behzad & Ball, Robyn & Shpanskaya, Katie & Seekins, Jayne & Mong, David & Halabi, Safwan & Sandberg, Jesse & Jones, Ricky & Larson, David & Langlotz, Curtis & Patel, Bhavik & Lungren, Matthew & Ng, Andrew. (2019). CheXpert: A Large Chest Radiograph Dataset with Uncertainty Labels and Expert Comparison. Proceedings of the AAAI Conference on Artificial Intelligence. 33. 590-597. [4] Wang, Linda & Lin, Zhong & Wong, Alexander. (2020). COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Scientific Reports. 10.[5] Rajpurkar, Pranav & Irvin, Jeremy & Bagul, Aarti & Ding, Daisy & Duan, Tony & Mehta, Hershel & Yang, Brandon & Zhu, Kaylie & Laird, Dillon & Ball, Robyn L. & Langlotz, Curtis & Shpanskaya, Katie & Lungren, Matthew P. & Ng, Andrew Y. (2018) MURA: Large dataset for abnormality detection in musculoskeletal radiographs.

33. ReferencesJanuary 26, 2023Deep Learning for Medical ApplicationsSlide 33[6] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Georey Hinton. (2020a )A simple framework for contrastive learning of visual representations.[7] Xinlei Chen, Haoqi Fan, Ross Girshick, and Kaiming He. (2020b) Improved baselines with momentum contrastive learning. [8] Huang, Shih-Cheng & Shen, Liyue & Lungren, Matthew & Yeung, Serena. (2021). GLoRIA: A Multimodal Global-Local Representation Learning Framework for Label-efficient Medical Image Recognition. [9] Müller, Philip & Kaissis, Georgios & Zou, Congyu & Rueckert, Daniel. (2022). Radiological Reports Improve Pre-training for Localized Imaging Tasks on Chest X-Rays. [10] Radford, Alec & Kim, Jong & Hallacy, Chris & Ramesh, Aditya & Goh, Gabriel & Agarwal, Sandhini & Sastry, Girish & Askell, Amanda & Mishkin, Pamela & Clark, Jack & Krueger, Gretchen & Sutskever, Ilya. (2021). Learning Transferable Visual Models From Natural Language Supervision. [11] Tanwani, Ajay & Barral, Joelle & Freedman, Daniel. (2022). RepsNet: Combining Vision with Language for Automated Medical Reports. [12] Endo, M., Krishnan, R., Krishna, V., Ng, A.Y. & Rajpurkar, P.. (2021). Retrieval-Based Chest X-Ray Report Generation Using a Pre-trained Contrastive Language-Image Model.