/
Big Data, Bigger Audience:  A Meta-algorithm for Making Machine Learning Actionable for Big Data, Bigger Audience:  A Meta-algorithm for Making Machine Learning Actionable for

Big Data, Bigger Audience: A Meta-algorithm for Making Machine Learning Actionable for - PowerPoint Presentation

giovanna-bartolotta
giovanna-bartolotta . @giovanna-bartolotta
Follow
421 views
Uploaded On 2018-03-17

Big Data, Bigger Audience: A Meta-algorithm for Making Machine Learning Actionable for - PPT Presentation

Dylan Cashman Remco Chang Visual Analytics Lab at Tufts VALT Tufts University Medford MA Stephen Kelley Diane Staheli Cody Fulcher Marianne Procopio MIT Lincoln Laboratory Lexington MA ID: 655138

models data small analysts data models analysts small subspaces model learning algorithm meta machine user output system interpretable network

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Big Data, Bigger Audience: A Meta-algor..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

Big Data, Bigger Audience: A Meta-algorithm for Making Machine Learning Actionable for Analysts

Dylan Cashman,

Remco

Chang

Visual Analytics Lab at Tufts (VALT)

Tufts University

Medford, MA

Stephen Kelley, Diane

Staheli

, Cody Fulcher, Marianne ProcopioMIT Lincoln LaboratoryLexington, MA

Analysts

lack trust

in machine learning algorithms due to

high false positive rate, uninterpretable output, and difficulties of tuning.This is at odds with the increasing velocity of data.

How can we constrain models to have interpretable output?Are we able to compensate for constraints to have comparable results?What design decisions are important for analysts?

Use multiple small models, and present the most confident model to the user.

OVERVIEW

RESEARCH QUESTIONS

IDEA

META-ALGORITHM

CASE STUDY

We

generate a large group of simple models that cover orthogonal sections of the data with the assumption that this

model mesh

approximates a more complex model. To generate this model mesh, we partition the data into 1- and 2-dimension subspaces, and for each subspace, we train several types of machine learning models, including forecasting and clustering. These models are chosen with two priority: they should be orthogonal in that they should cover different qualities of the data, and they should have some interpretable output. Then, we utilize techniques from ensemble learning to integrate these models into a single model for the data task at hand.

In practice, while we give up some capability by limiting ourselves to small subspaces, many machine learning tasks have high accuracy when small subspaces are used in concert.To illustrate this, we used our meta-algorithm on the VAST 2013 network traffic data, and showed that different events showed up in different subspaces of our anomaly detector.

An intrusion detection system built to assist analysts in identifying post-hoc intrusions in network traffic. Our meta-algorithm is used to identify potentially anomalous 30-second windows, and the most significant subspaces are suggested to the user. User feedback is used to implicitly tune and calibrate the combining of the multiple small models.

To demonstrate the effectiveness of this meta-algorithm, we built a system to aid

cybersecurity analysts in a data analysis task. This tool analyzed packet data from approximately 450 hosts in a private corporate network. In this system, we were able to successfully integrate the following design guidelines.

Assist the analyst in an unobtrusive, clear mannerHave enough statistical power to cue the analyst to new insightsHave clear, interpretable output to increase trust in the statistical models usedAllow analysts easy access to underlying raw data to confirm any suggested insightsUse implicit feedback from the user to correct the system’s combination of small models