/
Natural language is a programming language Natural language is a programming language

Natural language is a programming language - PowerPoint Presentation

emily
emily . @emily
Follow
66 views
Uploaded On 2023-06-22

Natural language is a programming language - PPT Presentation

Michael D Ernst UW CSE Joint work with Arianna Blasi Juan Caballero Sergio Delgado Castellanos Alberto Goffi Alessandra Gorla Xi Victoria Lin Deric Pang Mauro Pezzè ID: 1001975

variable software based int software variable int based similarity test tracker document storiesrequirementsspecificationstestsversion program programmingprogramsuser sequence word message tests

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "Natural language is a programming langua..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

1. Natural language is aprogramming languageMichael D. ErnstUW CSEJoint work with Arianna Blasi, Juan Caballero, Sergio Delgado Castellanos, Alberto Goffi, Alessandra Gorla, Xi Victoria Lin, Deric Pang, Mauro Pezzè, Irfan Ul Haq, Kevin Vu, Chenglong Wang, Luke Zettlemoyer, and Sai Zhang

2. Questions about softwareHow many of you have used software?How many of you have written software?

3. What is software?

4. What is software?A sequence of instructions that perform some task

5. What is software?An engineered object amenable to formal analysisA sequence of instructions that perform some task

6. What is software?A sequence of instructions that perform some task

7. What is software?A sequence of instructions that perform some task

8. What is software?A sequence of instructions that perform some taskTest casesVersion control historyIssue trackerDocumentation…How should it be analyzed?

9. ProgrammingUser storiesRequirementsSpecificationsTestsVersion controlDiscussionsArchitectureProcessModelsDocumentationProgramsIssue tracker

10. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLOutput stringsIssue tracker

11. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLOutput stringsIssue tracker

12. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLOutput stringsIssue tracker

13. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationOutput stringsVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLIssue tracker

14. Analysis of a natural objectMachine learning over executionsVersion control history analysisBug predictionUpgrade safetyPrioritizing warningsProgram repair

15. Specifications are needed;Tests are available but ignoredSpecs are needed. Many papers start:“Given a program and its specification…”Tests are ignored. Formal verification process:Write the programTest the programVerify the program, ignoring testing artifactsObservation: Programmers embed semantic info in testsGoal: translate tests into specificationsApproach: machine learning over executions

16. Dynamic detection of likely invariantsObserve values that the program computesGeneralize over them via machine learningResult: invariants (as in asserts or specifications)x > abs(y)x = 16*y + 4*z + 3array a contains no duplicatesfor each node n, n = n.child.parentgraph g is acyclicUnsound, incomplete, and usefulhttps://plse.cs.washington.edu/daikon/[ICSE 1999]

17. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLOutput stringsIssue tracker

18. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLOutput stringsIssue tracker

19. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLOutput stringsIssue tracker

20. Applying NLP to software engineeringProblemsinadequate diagnosticsincorrect operations missing testsunimplemented functionalityNL sourceserror messagesvariable namescodecommentsuserquestionsNLP techniques document similarity word semantics parse trees translation Analyzeexisting codeGeneratenew code

21. Applying NLP to software engineeringProblemsinadequate diagnosticsincorrect operations missing testsunimplemented functionalityNL sourceserror messagesvariable namescodecommentsuserquestionsNLP techniques document similarity word semantics parse trees translation [ISSTA 2015]

22. Inadequate diagnostic messagesScenario: user supplies a wrong configuration option--port_num=100.0Problem: software issues an unhelpful error message“unexpected system failure”“unable to establish connection”Hard for end users to diagnoseGoal: detect such problems before shipping the codeBetter message: “--port_num should be an integer”

23. Challenges for proactive detection of inadequate diagnostic messagesHow to trigger a configuration error?How to determine the inadequacy of a diagnostic message?

24. How to trigger a configuration error?How to determine the inadequacy of a diagnostic message?ConfDiagDetector’s solutionsConfiguration mutation + run system testsUse a NLP technique to check its semantic meaningsystem tests configuration+failed tests ≈ triggered errorsDiagnostic messages output by failed testsUser manualSimilar semantic meanings?(Assumption: a manual, webpage, or man page exists.)(We know the root cause.)

25. When is a message adequate?Contains the mutated option name or value [Keller’08, Yin’11]Mutated option: --percentage-splitDiagnostic message: “the value of percentage-split should be > 0”Similar semantic meaning as the manual descriptionMutated option: --fnumDiagnostic message: “Number of folds must be greater than 1”User manual description of --fnum: “Sets number of folds for cross-validation”

26. Classical document similarity:TF-IDF + cosine similarityConvert document into a real-valued vectorDocument similarity = vector cosine similarityVector length = dictionary size, values = term frequency (TF)Example: [2 classical, 8 document, 3 problem, 3 values, …]Problem: frequent words swamp important wordsSolution: values = TF x IDF (inverse document frequency)IDF = log(total documents / documents with the term)Problem: does not work well on very short documents

27. Text similarity technique [Mihalcea’06]Manual descriptionA messageThe documents have similar semantic meaningsif many words in them have similar meaningsThe program goes wrongThe software failsExample:Remove all stop words.For each word in the diagnostic message,try to find similar words in the manual.Two sentences are similar, if “many” wordsare similar between them.

28. ResultsReported 25 missing and 18 inadequate messagesin Weka, JMeter, Jetty, DerbyValidation by 3 programmers:0% false negative rateTool says message is adequate, humans say it is inadequate2% false positive rateTool says message is inadequate, humans say it is adequatePrevious best: 16%

29. Related workConfiguration error diagnosis techniquesDynamic tainting [Attariyan’08], static tainting [Rabkin’11], Chronus [Whitaker’04]Troubleshooting an exhibited error rather than detecting inadequate diagnostic messagesSoftware diagnosability improvement techniquesPeerPressure [Wang’04], RangeFixer [Xiong’12], ConfErr [Keller’08] and Spex-INJ [Yin’11], EnCore [Zhang’14]Requires source code, usage history, or OS-level support

30. Applying NLP to software engineeringProblemsinadequate diagnosticsincorrect operations missing testsunimplemented functionalityNL sourceserror messagesvariable namescodecommentsuserquestionsNLP techniques document similarity word semantics parse trees translation [WODA 2015]

31. Undesired variable interactionsint totalPrice;int itemPrice;int shippingDistance;totalPrice = itemPrice + shippingDistance;The compiler issues no warningA human can tell the abstract types are differentIdea:Cluster variables based on usage in program operationsCluster variables based on words in variable namesDifferences indicate bugs or poor variable names

32. Undesired variable interactionsint totalPrice;int itemPrice;int shippingDistance;totalPrice = itemPrice + shippingDistance;The compiler issues no warningA human can tell the abstract types are differentIdea:Cluster variables based on words in variable namesCluster variables based on usage in program operationsDifferences indicate bugs or poor variable names

33. Undesired interactionsdistanceitemPricetax_ratemilesshippingFeepercent_complete

34. Undesired interactionsdistanceitemPricetax_ratemilesshippingFeepercent_completeitemPrice + distance

35. Undesired interactionsdistanceitemPricetax_ratemilesshippingFeepercent_completeintfloatProgram types don’t help

36. Undesired interactionsdistanceitemPricetax_ratemilesshippingFeepercent_completeLanguage indicates the problem

37. Variables

38. Variable clusteringCluster based on interactions:operations

39. Variable clusteringCluster based on language:variable names

40. Variable clusteringCluster based on language:variable namesCluster based on interactions:operationsProblemActual algorithm:Cluster based on operationsSub-cluster based on namesRank an operation cluster as suspiciousif it contains well-defined name sub-clusters

41. Clustering based on operationsAbstract type inference [ISSTA 2006]int totalCost(int miles, int price, int tax) { int year = 2016; if ((miles > 1000) && (year > 2000)) { int shippingFee = 10; return price + tax + shippingFee; } else { return price + tax; }}

42. Clustering based on operationsAbstract type inference [ISSTA 2006]int totalCost(int miles, int price, int tax) { int year = 2016; if ((miles > 1000) && (year > 2000)) { int shippingFee = 10; return price + tax + shippingFee; } else { return price + tax; }}

43. Clustering based on variable namesCompute variable name similarity for var1 and var2Tokenize each variable into dictionary wordsin_authskey15 ⇒ {“in”, “authentications”, “key”}Expand abbreviations, best-effort tokenizationCompute word similarityFor all w1 ∈ var1 and w2 ∈ var2, use WordNet (or edit distance)Combine word similarity into variable name similaritymaxwordsim(w1, var2) = max wordsim(w1, w2)varsim(var1, var2) = average maxwordsim(w1, var2)w2 ∈ var2w1 ∈ var1

44. ResultsRan on grep and Exim mail serverTop-ranked mismatch indicatesan undesired variable interaction in grepif (depth < delta[tree->label]) delta[tree->label] = depth; Loses top 3 bytes of depthNot exploitable because of guards elsewhere in program, but not obvious here

45. Related workReusing identifier names is error-prone [Lawrie 2007, Deissenboeck 2010, Arnaoudova 2010]Identifier naming conventions [Simonyi]Units of measure [Ada, F#, etc.]Tokenization of variable names [Lawrie 2010, Guerrouj 2012]

46. Applying NLP to software engineeringProblemsinadequate diagnosticsincorrect operations missing testsunimplemented functionalityNL sourceserror messagesvariable namescodecommentsuserquestionsNLP techniques document similarity word semantics parse trees translation [ISSTA 2016]

47. Test oracles (assert statements)A test consists ofan input (for a unit test, a sequence of calls)an oracle (an assert statement)Programmer-written testsoften trivial oracles, or too few testsAutomatic generation of tests:inputs are easy to generateoracles remain an open challengeGoal: create test oraclesfrom what programmers already write

48. Automatic test generationCode under test:public class FilterIterator implements Iterator { public FilterIterator(Iterator i, Predicate p) {…} public Object next() {…} …}Automatically generated test:public void test() { FilterIterator i = new FilterIterator(null, null); i.next();}Throws NullPointerException!Did the tool discover a bug?It could be:1. Expected behavior2. Illegal input3. Implementation bug/** @throws NullPointerException if either * the iterator or predicate are null */

49. Automatically generated testsA test generation tool outputs:Failing tests – indicates a program bugPassing tests – useful for regression testingWithout a specification, the tool guesseswhether a given behavior is correctFalse positives: report a failing testthat was due to illegal inputsFalse negatives: fail to report a failing testbecause it might have been due to illegal inputs

50. Programmers write code commentsJavadoc is standard procedure documentation/** * Checks whether the comparator is now * locked against further changes. * * @throws UnsupportedOperationException * if the comparator is locked */protected void checkLocked() {...}

51. Javadoc comment and assertionclass MyClass { ArrayList allFoundSoFar = …; boolean canConvert(Object arg) { … } /** @throws IllegalArgumentException if the * element is not in the list and is not * convertible. */ void myMethod(Object element) { … }}Condition for exception: myMethod should throw iff … ( !allFoundSoFar.contains(element) && !canConvert(element) )

52. Nouns = objects, verbs = operationsSNPVPVADJPADJPPThe element is greater than the current maximum.NPPXeltcompareTo()>0currentMaxelt.compareTo(currentMax) > 0nounverbnoun

53. Text to code: Toradocu algorithmParse @param, @return, and @throws expressions using the Stanford ParserParse tree, grammatical relations, cross-referencesChallenges:Often not a well-formed sentence; code snippets as nouns/verbsReferents are implicit, assumes coding knowledgeMatch each subject to a Java elementPattern matchingLexical similarity to identifiers, types, documentation Match each predicate to a Java elementCreate assert statement from expressions and methods

54. ResultsAccuracy on 857 Javadoc tags:97% precision72% recallCan tune parameters to favor either metricPre-processing and pattern-matching are importantDiscovered specification errorsImproving test generation tools:Reduced false positive test failures in EvoSuite by ≥ 1/3Also improved Randoop, but by less

55. Related workHeuristicsJCrasher, Crash’n’Check [Csallner’04, Csallner’05]Randoop [Pacheco’07]SpecificationsASTOOT [Doong’94]Models, contracts, …PropertiesCross-checking oracles [Carzaniga’14]Metamorphic testing [Chen’13]Symmetric testing [Gotlieb’03]Natural language documentationiComment, aComment, @tComment [Tan’07, Tan’11, Tan’12]

56. Applying NLP to software engineeringProblemsinadequate diagnosticsincorrect operations missing testsunimplemented functionalityNL sourceserror messagesvariable namescodecommentsuserquestionsNLP techniques document similarity word semantics parse trees translation

57. Machine translationEnglish: “My hovercraft is full of eels.”Spanish: “Mi aerodeslizador está lleno de anguilas.”English: “Don’t worry.”Spanish: “No te preocupes.”

58. Sequence-to-sequence recurrent neural network translatorsMyishover-craftfullofeels.<START>MiMiaerodeslizadoraerodeslizadorinput layeroutput layerhidden layer……attention mechanismInput, hidden, and output functionsare inferred from training datausing probability maximization.

59. Tellina: text to commandsTraining data: ~5000  ⟨text, command⟩ pairsCollected manually from webpages, plus cleaning17 file system utilities, > 200 flags, 9 types of constantsCompound commands: (), &&, ||Nesting: |, $(), <()Strings are opaque; no command interpreters (awk, sed)No bash compound statements (for)

60. ResultsAccuracy for Tellina’s first output:Structure of command (without constants): 69%Full command (with constants): 30%User experiment:Tellina makes users 22% more efficientEven though it rarely gives a perfect commandQualitative feedbackMost participants wanted to continue using Tellina (5.8/7 Likert scale)Partially-correct answers were helpful, not too hard to correctOutput bash commands are sometimes non-syntactic or subtly wrongNeeds explanation of meaning of output bash commands

61. Related workNeural machine translationSequence-to-sequence learning with neural nets [Sutskever 2014]Attention mechanism [Luong 2015]Semantic parsing Translating natural language to a formal representation [Zettlemoyer 2007, Pasupat 2016]Translating natural language to DSLsIf-this-then-that recipes [Quirk 2015]Regular expressions [Locascio 2016]Text editing, flight queries [Desai 2016]

62. Other software engineering projectsAnalyzing programs before they are writtenGamification (crowd-sourcing) of verificationEvaluating and improving fault localizationPluggable type-checking for error prevention… many more: systems, synthesis, verification, etc. UW is hiring! Faculty, postdocs, grad students

63. Applying NLP to software engineeringProblemsinadequate diagnosticsincorrect operations missing testsunimplemented functionalityNL sourceserror messagesvariable namescodecommentsuserquestionsNLP techniques document similarity word semantics parse trees translation

64. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationOutput stringsVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLIssue tracker

65. Analyzing textiComment [Tan 2007]: pattern matching for nullN-gram models: code completion [Hindle 2011], predict variable names, whitespace [Allemanis 2014]Mining variable names [Pollock et al.]Code  comments [Sridhara 2010]DARPA Big Mechanism (read cancer papers)JSNice [Raychev 2015]: learn rules for identifiers and types

66. Analyzing other artifacts bymachine learning over the programTests (dynamic invariant detection)Mining software repositoriesDefect predictionCode completionClone detection… many, many more

67. Machine learning + software engineeringSoftware is more than source codeFormal program analysis is useful, but insufficientAnalyze and generate all software artifactsA rich space for further exploration

68. ProgrammingProgramsUser storiesRequirementsSpecificationsTestsVersion controlDocumentationOutput stringsVariable namesDiscussionsArchitectureProcessModelsDocumentationStructurePLIssue tracker