Challenges in Using ML for Networking Research:
Author : tatyana-admore | Published Date : 2025-05-22
Description: Challenges in Using ML for Networking Research How to Label If You Must Yukhe Lavinia University of Oregon 1 Ramakrishnan Durairajan University of Oregon Reza Rejaie University of Oregon Walter Willinger NIKSUN Inc ylaviniauoregonedu
Presentation Embed Code
Download Presentation
Download
Presentation The PPT/PDF document
"Challenges in Using ML for Networking Research:" is the property of its rightful owner.
Permission is granted to download and print the materials on this website for personal, non-commercial use only,
and to display it on your personal computer provided you do not modify the materials and that you retain all
copyright notices contained in the materials. By downloading content from our website, you accept the terms of
this agreement.
Transcript:Challenges in Using ML for Networking Research::
Challenges in Using ML for Networking Research: How to Label If You Must Yukhe Lavinia University of Oregon 1 Ramakrishnan Durairajan University of Oregon Reza Rejaie University of Oregon Walter Willinger NIKSUN, Inc. ylavinia@uoregon.edu Introduction Fuel for Machine Learning (ML) research: 2 ylavinia@uoregon.edu data ML research labeled data Outline Challenges Contributions Building blocks Evaluation Conclusion Future work 3 ylavinia@uoregon.edu 4 ylavinia@uoregon.edu ? ? ? Networking Data Label Challenge 1: Lack of labeled networking data Challenge 2: Privacy concern in network data Features of good data? Features of bad data? Difficulty in labeling at scale Challenges in using ML in networking Lack of agreement in community Sharing raw or labeled data Sharing learning models Safest: avoid a possibility of privacy leaks Collaborate using ML in networking Challenge 3: Hidden biases in data Inherent in ML, made complicated by the nature of network data Lack of representation of minority group, creating a model that does not generalize well Limited number of experts Large amount of data High human cost of labeling Contributions 5 ylavinia@uoregon.edu Lack of labeled networking data Privacy concern in network data Hidden biases in data Create high quality labels at scale in a programmable fashion and at low human labor cost Share only learning algorithms Implement multi-task learning (MTL) Challenge Solution (Future work) Task 1 Task 2 More generalized data representation Bias reduction EMERGE a framework to dEmocratize the use of ML for nEtwoRkinG rEsearch EMERGE Create high quality networking data labels: 6 ylavinia@uoregon.edu At scale In programmable fashion At low human labor cost Research Group 1 Research Group 2 Research Group 3 Research Group 4 Privacy-preserving collaboration Promote: Building Blocks 7 ylavinia@uoregon.edu Low quality, labeled data Unlabeled data Weak supervision Labeled data Labeling Functions Domain Expert Probabilistic Labels [1] Ratner et al., “Data programming: Creating large training sets, quickly”, Advances in Neural Information Processing Systems (2016) . [2] Ratner et al., “Snorkel: Rapid training data creation with weak supervision”, VLDB Endowment (2017). Data Programming1 Data programming framework: Snorkel2 Limitations: Not specific to networking Scalability issue Data amount, data diversity Human labor cost Train in supervised setting Building Blocks 8 ylavinia@uoregon.edu [1] Varma et al., “Snuba: Automating weak supervision to label training data”, Proc. VLDB Endow 2018. [2] Muthukumar et al., “Denoising internet delay measurements using weak supervision”, ICMLA 2019. Simple ML classifiers (e.g., logistic regressor, decision tree, nearest neighbor) Snuba1 Probabilistic Labels Low quality, labeled data Unlabeled data