Deep Code Search Xiaodong Gu Hong Kong University
1 / 1

Deep Code Search Xiaodong Gu Hong Kong University

Author : trish-goza | Published Date : 2025-05-17

Description: Deep Code Search Xiaodong Gu Hong Kong University of Science and Technology 1 Hongyu Zhang The University of Newcastle Sunghun Kim Hong Kong University of Science and Technology Programming is hard Lack of experience Unfamiliar libraries 2

Presentation Embed Code

Download Presentation

Download Presentation The PPT/PDF document "Deep Code Search Xiaodong Gu Hong Kong University" is the property of its rightful owner. Permission is granted to download and print the materials on this website for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Transcript:Deep Code Search Xiaodong Gu Hong Kong University:
Deep Code Search Xiaodong Gu Hong Kong University of Science and Technology 1 Hongyu Zhang The University of Newcastle Sunghun Kim Hong Kong University of Science and Technology Programming is hard Lack of experience Unfamiliar libraries 2 Why Not Search for It? 3 Not designed for source code Code Search Engines Keyword Matching! Hard to represent complicated tasks 4 Information Retrieval – Related Work Consider source code as plain text and apply IR techniques (e.g., Lucene) Augment IR approaches by considering properties of source code and NL queries Typical Techniques Sourcerer [Linstead DMKD’09]: augments Lucene by considering method names and code popularity Portforlio [McMillan, ICSE’11]: considers relationships between functions [Lu et al. SANER’15]: query expansion with WordNet CodeHow [Lv et al. ASE’15]: API matching 5 A fundamental problem of IR based code search Query: “how to read an object from an xml” Mismatch between the high-level intent reflected in the queries and the low-level implementation details in the source code Source code and natural language have heterogeneous representations 6 public static S deserialize(Class c, File xml) { try { JAXBContext context = JAXBContext.newInstance(c); Unmarshaller unmarshaller = context.createUnmarshaller(); S deserialized = (S) unmarshaller.unmarshal(xml); return deserialized; } catch (JAXBException ex) { log.error("Error-deserializing-object-from-XML", ex); return null; } } Proposed Approach Joint Embedding of both Code and Natural Language into a unified vector representation “read a text file line by line” “read an object from an xml file” Query/Description Embedding Code Embedding public void readText(String textFile) { BufferedReader br = new BufferedReader( new FileInputStream(helpFile)); String line = null; while ((line = br.readLine()) != null) { System.out.println(line); } •••••• br.close(); } public static < S > S deserialize(Class c, File xml) { try { JAXBContext context = JAXBContext.newInstance(c); Unmarshaller unmarshaller =context.createUnmarshaller(); S deserialized = (S) unmarshaller.unmarshal(xml); return deserialized; } catch (JAXBException ex) { log.error("Error-deserializing-object-from-XML", ex); return null; } } 7 CODEnn (Code-Description Embedding Neural Network) Code Embedding Network (CoNN) Description Embedding Network (DeNN) Similarity Module 8 Code Embedding Network (CoNN) Description Embedding Network(DeNN) Code Description Code Vector Description Vector Cosine Similarity Code Description max pooling read a text file max pooling MLP text reader max pooling Scanner.new Scanner.next Scanner.close max pooling str buff close MLP Fusion MLP method name [M] API sequence [A] Tokens [Γ] [D] Cosine Similarity Training with Ranking Loss: DeepCS – Deep Learning based Code Search 10 Code Vectors Recommended Code embedding Search Codebase Similarity Lookup Query Query Vector embedding 0101010