/
ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework

ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework - PowerPoint Presentation

rosemary
rosemary . @rosemary
Follow
0 views
Uploaded On 2024-03-15

ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework - PPT Presentation

Private Blockchain Networks TsungTing Kuo PhD 1 ChunNan Hsu PhD 1 and Lucila OhnoMachado MD PhD 12 1 Health System Department of Biomedical Informatics UC San Diego La Jolla CA ID: 1048569

dbmi ucsd data 16modelchain ucsd dbmi 16modelchain data information site proof blockchain model block learning healthcare algorithm m11model modelchain

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "ModelChain: Decentralized Privacy-Preser..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. ModelChain: Decentralized Privacy-Preserving Healthcare Predictive Modeling Framework on Private Blockchain NetworksTsung-Ting Kuo, PhD,1 Chun-Nan Hsu, PhD,1 and Lucila Ohno-Machado, MD, PhD1,21Health System Department of Biomedical Informatics, UC San Diego, La Jolla, CA2Division of Health Services Research & Development, VA San Diego Healthcare System9/27/2016, Gaithersburg, MDThe Office of the National Coordinator for Health Information Technology (ONC)National Institute of Standards and Technology (NIST)Use of Blockchain for Healthcare and Research Workshop9/19/16ModelChain, UCSD DBMI, 20161

2. AgendaIntroductionBrief Review of BlockchainThe ModelChain FrameworkDiscussion and Conclusion9/19/16ModelChain, UCSD DBMI, 20162

3. AgendaIntroductionBrief Review of BlockchainThe ModelChain FrameworkDiscussion and Conclusion9/19/16ModelChain, UCSD DBMI, 20163

4. Cross-institutional Predictive ModelingHealthcare predictive modelingMachine learning from healthcare data to predict outcomesCross-institutional healthcare predictive modelingAdvance research and facilitate quality improvement initiativesComparative effectiveness research, biomedical discovery, patient-care, etc.9/19/16ModelChain, UCSD DBMI, 20164Site 1Patient Records = 10Hard to predict outcomesPatientsSite 1Site 2Site 3Site 4Records = 10Records = 760Records = 1,500Records = 380Feasible to predict outcomePatients

5. Protecting Privacy of IndividualsChallenge of cross-institutional predictive modelingImproper disclosure of personal heath information (PHI)Privacy-preserving algorithms transfer models but not PHIMany methods exist [Wu et al. 2012] [Li et al. 2015] [Wang et al. 2013] [Yan et al. 2013]9/19/16ModelChain, UCSD DBMI, 20165Site 1Site 2Site 3Site 4Share observation-level patient healthcare dataSite 1Site 2Site 3Site 4Share partially-trained machine learning modelsPrivacy-preserving predictive modeling algorithms

6. Risk for Existing AlgorithmsCentralized architectureInstitutional policiesSingle-point-of-failure/breachSites cannot join/leave at any timeMutable data and recordsConsensus/synchronization issues9/19/16ModelChain, UCSD DBMI, 20166Site 1Site 2Site 3Site 4CentralServer

7. The Blockchain TechnologyDesirable featuresDecentralized architecturePeer-to-peerSites keep full control of resourcesNo risk of single-point-of-failureSites can join/leave freelyNo central server overheadNo disruption of learning processImmutable audit trailTampering with data/records is difficult9/19/16ModelChain, UCSD DBMI, 20167Site 1Site 2Site 3Site 4

8. ModelChain: Combining Two TechnologiesPrivacy-preserving online machine learning on blockchainsTransaction metadata to transfer partial models and infoProof-of-information algorithm to decide order of learning9/19/16ModelChain, UCSD DBMI, 20168Site 1Site 2Site 3Site 4Model M11Model M11Model M22Model M11Model M22Model M33Mts = model at time t on site s

9. Advancing Interoperability NeedsONC Nationwide Interoperability RoadmapBuild upon the existing health IT infrastructureUses existing healthcare data in Clinical Data Networks (CDNs)Leverages all existing infrastructures while improving prediction power Maintain modularityKeeps up with institutional policies because each site has control over dataAutomatically coordinates the joining/leaving of each siteProtect privacy and security for all aspects of interoperabilityAvoids single-point-of-failureProvides immutable audit trailsMitigates synchronization issuesFacilitates pursue of national healthcare delivery prioritiesEx. Patient-Centered Outcomes Research (PCOR)9/19/16ModelChain, UCSD DBMI, 20169

10. AgendaIntroductionBrief Review of BlockchainThe ModelChain FrameworkDiscussion and Conclusion9/19/16ModelChain, UCSD DBMI, 201610

11. Blockchain 1.0: Bitcoin [Nakamoto 2008] (1/3)Electronic coin (e.g., Bitcoin) = a chain of transactionsHowever, the double-spending problem need to be solvedWe need a timestamp mechanism to decide the “first” spendingThat is, we need to be aware of all transactionsFor centralized architecture, a “mint” (or central server) is utilizedFor decentralized architecture, we use “blocks” as a solution9/19/16ModelChain, UCSD DBMI, 201611ABC1010ABC1010D10

12. Blockchain 1.0: Bitcoin [Nakamoto 2008] (2/3)A block mainly contains the following 3 partsMultiple transactions (each transaction belongs to one block)Hash of previous block (thus, form a “blockchain”)Nonce (a “counter” serves as one of the input of the hash function)Each block represents an unique time pointThe “first” spending can be identified to prevent double-spendingAbout 430K blocks for Bitcoin as of September 20169/19/16ModelChain, UCSD DBMI, 201612Block B1Hash of Block B0Nonce N1TransactionT11……Block B2Hash of Block B1Nonce N2TransactionT12…TransactionT21TransactionT22…

13. Blockchain 1.0: Bitcoin [Nakamoto 2008] (3/3)Proof-of-work consensus protocolEach site “mines” blocks by solving a difficult hashing problemIncrements nonce until hashed value contains specified leading zero bitsThe first site which completes the work has “decision power”Verifies the transactions in the block (currently ~2K transactions/block)Adds confirmed block with verified transactions to the end of blockchainBenefits of proof-of-workImmutable: changing confirmed blocks is very difficultTo change a block, all blocks after that block need to be re-computedIf honest CPUs > malicious CPUs, the probability of changing is very smallMajority voting: the longest chain represents major decisionNo central server required and resistance to attacks9/19/16ModelChain, UCSD DBMI, 201613

14. Blockchain 1.0: AlternativesAlternative blockchains (“altchains”)Colored coins [Rosenfeld 2012]Support Bitcoin in different colors as different crypto-currenciesSide-chains [Back et al. 2014] [Bonneau et al. 2015]Allow Bitcoin to be transferred between multiple blockchain networksAvoiding monopoly by protocols on top of proof-of-workProof-of-stake [King et al. 2012] [Bentov et al. 2014]“Decision power” = the ages of the owned bitcoinsProof-of-burn [Steward 2012] [Bonneau et al. 2015]“Decision power” = the destroying of the owned bitcoinsModelChain includes a new proof-of-information protocol“Decision power” = the amount of information in the data9/19/16ModelChain, UCSD DBMI, 201614

15. Blockchain 2.0: Smart Property/ContractBlockchain-based distributed databaseArbitrary data can be stored in the transaction metadataOriginal Bitcoin [Nakamoto 2008] only supports 80 bytes of metadataMultiChain [Greenspan 2015] supports adjustable maximum metadata sizeBigchainDB [McConaghy et al. 2016] has no hard size limitModelChain uses transaction metadata to transfer partial modelsSmart property/contractSmart property: blockchain-controlled ownership (“data entries”)Smart contract: computer management programs (“stored procedures”)Ethereum [Buterin 2014] [Wood 2014] is a decentralized platform runs smart contractsModelChain may be implemented using smart property/contract9/19/16ModelChain, UCSD DBMI, 201615

16. Blockchain 3.0: Non-financial ApplicationsApplication on healthcare systemsBlockchain as a tamper-proof public ledger [Irving et al. 2016]To improve trustworthiness in clinical trialsBlockchain as genomic data storage [McKernan 2015]To be open access publishing and anti-fragile distributed data sourcesBlockchain as multi-factor authentication mechanism [Jenkins et al. 2016]To increase data securityOther healthcare applicationsStoring electronic health records [Baxendale 2016]Recording health transactions [Witchey 2015]To the best of our knowledge, ModelChain is the first to adopt Blockchain on privacy-preserving healthcare predictive modeling9/19/16ModelChain, UCSD DBMI, 201616

17. AgendaIntroductionBrief Review of BlockchainThe ModelChain FrameworkDiscussion and Conclusion9/19/16ModelChain, UCSD DBMI, 201617

18. The ModelChain FrameworkPrivacy-preserving online machine learning on blockchainsTransaction metadata to transfer partial models and infoProof-of-information algorithm to decide order of learning9/19/16ModelChain, UCSD DBMI, 201618Site 1Site 2Site 3Site 4Model M11Model M11Model M22Model M11Model M22Model M33Mts = model at time t on site s

19. 1. Online Machine Learning on BlockchainPrivacy-preserving machine learningBatch: model updating using all data at a timeGLORE: logistic regression (LR) with horizontally partitioned data [Wu et al. 2012]VERTIGO: LR with vertical partitioned data [Li et al. 2015]Online: model updating using partial data in sequential orderEXPLORER: expectation propagation LR version of GLORE [Wang et al. 2013]Distributed Autonomous Online Learning [Yan et al. 2013]ModelChain uses online machine learning to update modelEach site updates the model in a sequential orderFocus on privacy-preserving instead of efficiency issuesDifferent from distributed data-parallelism methods (e.g., MapReduce)9/19/16ModelChain, UCSD DBMI, 201619

20. 2. Transaction MetadataModelChain utilizes transaction metadata to disseminateThe partial modelsThe meta information related to the partial modelFlag: actions to a model (INITIALIZE, UPDATE , EVALUATE, and TRANSFER)Hash: save blockchain storage space (only UPDATE includes models)Error: indicate the error (or information) of current model on the siteModelChain runs on private blockchainsWithout transaction fee/amount and mining rewardsIncentive: improved predictive model accuracy using cross-institution data9/19/16ModelChain, UCSD DBMI, 201620Block B1Hash of Block B0Nonce N1Transaction T112FlagTRANSFER……ModelNULLHashHASH (M11)ErrorE12Block B2Hash of Block B1Nonce N2Transaction T222FlagUPDATEModelM22HashHASH (M22)ErrorE22

21. 3. Proof-of-Information (1/7)The order is important for online machine learningGood learning order can increase learning efficiency and accuracyTo determine the order, we use the idea similar to BoostingThe site with hard-to-predict data contains more informationThat site should be assigned a higher priority to update the model nextThus, the site with highest error wins “information bid” ( “decision power”)We can conceptually “transfer” the model to the next updating siteHowever, we should start with the best model (with lowest error)To prevent propagation of errorThree execution scenariosInitialize new network to find consensus modelNew data is added to a site (new site = a site with all new data)A site leaves the network9/19/16ModelChain, UCSD DBMI, 201621

22. 3. Proof-of-Information (2/7)Scenario A: initialize new network to find consensus modelIn this example, M44 is the final consensus model9/19/16ModelChain, UCSD DBMI, 201622Site 1Site 2Site 3Site 4M01E01 = 0.2M02E02 = 0.3M03E03 = 0.5M04E04 = 0.4Model M11Model M11Model M22E11 = 0.2M11E12 = 0.7M11E14 = 0.6M11E13 = 0.4M11E22 = 0.3M22E24 = 0.5M22E23 = 0.6M22E21 = 0.1M22Model M11Model M22Model M33E31 = 0.1M33E32 = 0.2M33E33 = 0.2M33E34 = 0.3M33E42 = 0.1M44E41 = 0.1M44E43 = 0.1M44E44 = 0.2M44Mts = model at time t on site sEts = error at time t on site s

23. 3. Proof-of-Information (3/7)Scenario B: new data are added to a site Start with previous model (M44); no need to re-train the whole model9/19/16ModelChain, UCSD DBMI, 201623Site 1Site 2Site 3Site 4Model M11Model M22Model M33Model M44E42 = 0.1M44E41 = 0.1M44E43 = 0.1M44E44 = 0.2M44E51 = 0.4M54E54 = 0.2M54NewDataE52 = 0.1M54E53 = 0.1M54

24. 3. Proof-of-Information (4/7)Scenario C: a site leaves the networkBased on the Blockchain mechanism, we can ignore the departure9/19/16ModelChain, UCSD DBMI, 201624Site 1Site 2Site 3Site 4Model M11Model M22Model M33Model M44E42 = 0.1M44E41 = 0.1M44E43 = 0.1M44E44 = 0.2M44E51 = 0.4M54E54 = 0.2M54NewDataE52 = 0.1M54E53 = 0.1M54

25. 3. Proof-of-Information (5/7)Algorithm 1. Proof-of-information-IterationCore algorithm to repeat learning until consensus model is found9/19/16ModelChain, UCSD DBMI, 201625

26. 3. Proof-of-Information (6/7)Algorithm 2. Proof-of-information-InitializeFor a new Blockchain network, each site executes this algorithmThis algorithm in turn executes Algorithm 19/19/16ModelChain, UCSD DBMI, 201626

27. 3. Proof-of-Information (7/7)Algorithm 3. Proof-of-information-NewSite with new data or new site executes this algorithmThis algorithm also leverages Algorithm 19/19/16ModelChain, UCSD DBMI, 201627

28. AgendaIntroductionBrief Review of BlockchainThe ModelChain FrameworkDiscussion and Conclusion9/19/16ModelChain, UCSD DBMI, 201628

29. DiscussionLimitations of Blockchain are less important for ModelChainConfidentiality: we only learn model but do not transfer any PHITransaction time: small compared to the machine learning process51% attack: private network minimizes the riskIterations of proof-of-information algorithmIssue: it might run too many iterations before finding the best modelSolutions: error-threshold (“good”), max-iteration (“old”), or bothImplementation issuesParameters: consider CPU, network, required accuracy/efficiencySize of metadata: 1K features ~= 8MB for EXPLORER [Wang et al. 2013]Security: VPN, HIPAA-certified cloud such as iDASH [Ohno-Machado et al. 2012]9/19/16ModelChain, UCSD DBMI, 201629

30. ConclusionThe capability to securely and robustly construct privacy-preserving models on healthcare data is essential to supportONC Nationwide Interoperability RoadmapNational healthcare delivery priorities such as PCORModelChain: improves security and robustnessOnline privacy-preserving machine learning + Blockchain networkTransaction metadata for model disseminationProof-of-information algorithm to determine the order of learningFuture workEvaluate trade-offs in real-world settings like pSCANNER [Ohno-Machado et al, 2014]Improve the efficiency and scalability of proof-of-information9/19/16ModelChain, UCSD DBMI, 201630

31. AcknowledgementThe authors would like to thank Xiaoqian Jiang, PhD, and Shuang Wang, PhD for very helpful discussionsAll authors are funded by PCORI CDRN-1306-04819LO-M is funded by NIH U54HL108460, UL1TR001442, and VA I01HX0009829/19/16ModelChain, UCSD DBMI, 201631Tsung-Ting Kuo PhDChun-Nan Hsu PhDLucila Ohno-Machado MD, PhDXiaoqian Jiang PhDShuang Wang PhD

32. Thank You!Department of Biomedical InformaticsUniversity of California, San Diego9/19/16ModelChain, UCSD DBMI, 201632