Multi-Layer Network Representation of the NTC Environment - PowerPoint Presentation

chiquity . @chiquity

342 views
Uploaded On 2020-09-29

Multi-Layer Network Representation of the NTC Environment - PPT Presentation

Lili Sun Proof School Arijit Das Computer Science Results From preliminary analysis we have gotten the following from appending different layers together In total we have around 40 layers For graph visualization we are using ID: 812569

layers data dictionary csv data layers csv dictionary files network people generated file attribute biography columns program graphs python

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/812569" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download The PPT/PDF document "Multi-Layer Network Representation of th..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

Multi-Layer Network Representation of the NTC Environment

Lili Sun, Proof School

Arijit Das, Computer Science

Results

From preliminary analysis we have gotten the following from appending different layers together. In total we have around 40 layers. For graph visualization we are using Gephi.

Lili Sunlili.sun@gmail.com

ApproachExtract data from large data set.Use data to create complex multi-layer network.Analyze the different layers using centrality measures, modularity, etc.The data set was given in approximately 40 Excel files. The columns had different attributes such as date of birth, name, town, primary occupation, secondary occupation. The last column had a very long biography of the person.First, all the Excel files were converted to CSV files for easier processing. After this, all the files were merged, and duplicates were deleted. This was all done using Python programs. All columns except the last were simple, each containing a word/phrase or two. The last column in each of the files was a biography which was a huge chunk of text. The hardest part of the data mining was to extract useful information from the biographies. The biographies were not scanned manually as there were approximately 3000 people and each biography was very long so we used a Python program to extract useful data. To extract the data, we used key words and the Python regular expression module. For each person in the data, a dictionary was returned with the key being the attribute from the biography and the value was typically a True or False value. After extracting the data, a new CSV file was created. Also, a separate CSV file was created that was essentially an edge list for direct connections between people, rather than shared interests. After these CSV files were made, different layers of the network were generated and then appended. Layers of the network for attribute are generated with a dictionary as well. Looping through the rows and columns of the CSV, we add an edge if they share an attribute. Since not all the people in the data have ID numbers, they are just given numbers from 1 to the however many people there are to number the nodes.

Introduction

The United States Army’s National Training Center (NTC) based in Fort Irwin is a training facility which simulates realistic battlefield environments. With these simulations come a lot of data. This project analyzes the data through network science. A multilayer network is created from the database, which is analyzed using different centrality measures and other techniques to find features such as influential nodes and communities. As data increases, scaling up must occur, as the computing power of a laptop is limited. After cleaning and doing the initial processing of the data, it will be analyzed more in-depth through R programs running on Hadoop clusters, allowing us to analyze and process larger data sets more quickly.

SEAP

Science Engineering and Apprenticeship Program at the Naval Postgraduate School

Future Research Plans

Analysis of graphs is still ongoing, and will be continued through

examining different sets of layers together, as well as the layer of direct connections.

AcknowledgementsThank you to Mr. Das, Dr. Gera, and LTC Roginski for their help and guidance.

This function update_dictionary(row, i) is part of the program that deletes duplicates from the data set. This is done using a dictionary, where the key is essentially the last 15 columns concatenated. The value is a tuple that is the row and the number. To update the dictionary, the program compares how much data each duplicate has, and replaces the dictionary value with the row with more data.

Above is part of one of the layers generated from the final CSV file. As you can see, it is the union of a bunch of complete graphs. This is because this is just one attribute, and if people share it, they’ll all be connected to each other. So, for all the different possible values, we get complete graphs.

Above is a graph of the size distribution of modularity classes of one of the layers generated from the final CSV file.