/
Basketball Position Classification Basketball Position Classification

Basketball Position Classification - PowerPoint Presentation

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
344 views
Uploaded On 2019-11-24

Basketball Position Classification - PPT Presentation

Basketball Position Classification Brandon Hardesty Matt Saldaña   Audrey Bunn Informal Problem Statement Utilize classification algorithms to predict a basketball players most effective position either forward or guard NBA players statistics are used for comparison in the classificati ID: 767553

run data players statistics data run statistics players vector nba learning times classification algorithm ram training set force closest

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Basketball Position Classification" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Basketball Position Classification Brandon Hardesty, Matt Saldaña ,  Audrey Bunn

Informal Problem Statement Utilize classification algorithms to predict a basketball player’s most effective position, either forward or guard. NBA players’ statistics are used for comparison in the classification and for algorithmic learning data.

Formal Problem Statement Let P be a set of basketball players of length 90. Set P has four subsets, G, F, X, and Y. Subset G is a list of the top 10 NBA guard statistics for the 2017-2018 season; and subset F is a list of the top 10 NBA forward statistics for the 2017-2018 season. Subset X is a list of NCAA statistics for 25 guards, and subset Y is a list of NCAA statistics for 25 forwards. For all players p i in G and F, p i is mapped to forward if statistics, si, is most similar to set F’s average statistics. Similarly, pi is mapped to guard if si is most similar to set G’s average statistics.P =All playersG = NBA GuardsF = NBA Forwards X = NCAA guards Y = NCAA forwards pi = Position classified players si = Position specific statistics

Program Use AAU & Collegiate programs NBA front offices Companies investing in big data in sports Personal interest Algorithm analysis

Context Defining Modern NBA Player Positions - Applying Machine Learning to Uncover Functional Roles in Basketball by Han Man Using Machine Learning to Find the 8 Types of Players in the NBA by Alex Cheng Utilize K-means clustering, DBSCAN, and Hierarchical clustering Similarities: data source, normalization, classifying by positionDifferences: classifies players within the same league, statistics used for classification

Statistical Evaluation Average points per game* Average rebounds per game* Free throw percentage 3 Point Percentage *All per 36 min game

Forwards vs Guards Statistically

Implemented Algorithms Learning Vector Quantization K’s Nearest Neighbors Brute Force Comparison

Brute Force Method Compares inputted data to average stats of learning data Position with the most “winning” comparisons is the classification Breaks ties with point differentials Pros: Easy to comprehendLow RAM usage Cons: Very naïve Inaccurate Variable run times/slow

Linear Vector Quantization Method LVQ takes training vectors and codebook vectors as inputs. Then, it iterates through the training vectors and finds the closest codebook vector. The closest codebook vector is then moved closer or further away from the training instance by a learning rate times the difference between the vectors. The test data is classified by finding the closest codebook vector after training has finished. Pros: Accurate Fastest Cons: Not the most accurate Memory intensive

K’s Nearest Neighbors Method Given: 2 vectors and value of K Select K entries from training set that are closest to the test value Make classification prediction by evaluating the training instances closest to the test data Pros: Most accurate Low RAM usage Cons: Not the fastest

Experimental Procedure Run college data tests (25 guards, 25 forwards) compared to NBA learning data to classify Multiple runs with different n values (6, 12, 18, 24, 30, 36, 42, 48) Track total run time Track accuracy percentage Track memory usage

Run Time Comparison: In Milliseconds

Accuracy Comparison

RAM Usage

Conclusion Best algorithm for this project– K’s Nearest Neighbor Most accurate Moderate total run times Lowest RAM usage Linear Vector Quantization Comes in Second Moderately accurateFastest run timesHighest RAM usageBrute Force = Worthless

Future Work Predict wins and losses of games Predict tournament winners Different sports Different normalization technique Utilizing more in-game statistics Expanding on the number of positions

Five Questions Q. Why are the brute force run times so variable? What explains the spike in total run time between 24 test players and 42 test players? A. The spike in run times comes from the “tie breaker rounds.” Some of the players that were tested during those runs were borderline players, meaning their stats are almost in between those of a forward and a guard. The additional calculations necessary to create point differentials between the inputted data and these borderline players increased the run time for brute force. Q. Why does K’s Nearest Neighbors take more time than Linear Vector Quantization? A. K Nearest Neighbors has a longer run time because the algorithm has to run through the inputted data four times. The algorithm has to run through the data this many times to find the three closest points (our K value) to the feature vector. Q. Why does LVQ use more RAM to classify? A. The Linear Vector Quantization algorithm has to use more RAM to store codebook vectors and the data set. Q. What would be a more accurate brute force method? A. Running the point differential “tie breaker round” from the beginning instead of the initial “majority rules” method. Point differentials would be more straight to the point, fine-tuned, and accurate. Q. Is there a better classification algorithm out there to solve this problem? A. LVQ input data with the same number of attributes as the training data. The data that is being classified by the LVQ algorithm.