1.0 INTRODUCTION.For so long, computer scientists have struggledwith the question, can computers truly learn to perform a task through examplesor previously solved tasks? Can computers improve themselves significantly onthe basis of past mistakes? So, to solve these questions “machine learning” researchbegan and it is now working towards making this a possibility in computers.
In order for a computer or computer controlledrobot to perform a task, traditional programming demands that a programmerwrites a correct algorithm to perform such task and then implement saidalgorithm in the computer using a programming language. Such process is usuallya tedious and time consuming one which is best done by trained personnel 7.Machine learning also promises to reduce the stress of hand programming.Thus, Machine learning according to Tom Mitchell”is concerned with the question of how to construct computer programs thatautomatically improve with experience” 3.This paper therefore looks to understand brieflywhat machine learning is and how it can improve software testing, particularlythe testing method known as “FUZZING”which consists of repeatedly testing an application with modified, or fuzzed,inputs with the goal of finding security vulnerabilities in input-parsing code.5 2.
0RELATED LITERATURE/WORKS§ ThomasJ. Cheatham wrote a paper on the use of Machine Learning techniques to identifyattributes that are important in predicting software testing costs and softwaretesting time in a particular company.§ DUZHANG and JEFFREY J.P. TSAI worked on the possibility of applying machinelearning in software engineering, whereby in the paper they provided thecharacteristics and applicability of some frequently utilized machine learningalgorithms. They also offer some guidelines on applying machine learningmethods to software engineering tasks.§ Backin 2017, William Blum, Rishabh Singh, and Mohit Rajpal all Microsoftresearchers began a research project looking at ways to improve fuzzingtechniques using machine learning and deep neural networks.
They wanted to seewhat a machine learning model could learn if we were to insert a deep neuralnetwork into the feedback loop of a grey box fuzzer.§ PatriceGodefroi, Hila Peleg, and Rishabh Singh in their paper “Learn&Fuzz: MachineLearning for Input Fuzzing” show how to automate the generation of an inputgrammar suitable for input fuzzing using sample inputs and neural-network-basedstatistical machine-learning techniques. They then present a detailed casestudy with a complex input format, namely PDF, and a large complexsecurity-critical parser for this format, namely, the PDF parser embedded inMicrosoft’s new Edge browser.
They also present a new algorithm for thislearn challenge which uses a learnt input probability distribution tointelligently guide where to fuzz inputs.3.0 SUMMARY OF FINDINGS FROMLITERATURETom Mitchell stated in his book”MACHINE LEARNING” 3 that:Acomputer program is said to learnfrom experience E with respect to some class of tasks T and performance measureP, if its performance at tasks in T, as measured by P, improves with experienceE.For example, a computer program thatlearns to play chess might improve itsperformance as measured by itsability to win at the classof tasks involving playing chess,through experience obtained byplaying games against itself.In general, a well-defined learning problem, involves these three features: theclass of tasks, the measure of performance to be improved, and the source ofexperience.The emergence of Machine Learning wasas a result of two significant discoveries:The first was the realization of Arthur Samuel in 1959 – that rather than teachingcomputers everything they need to know about a task and how to carry it out, itmight be possible to teach them to learn for themselves. The second, was the emergence of the internet, and theexplosive increase in the amount of digital information made available foranalysis.Short biography by “John McCarthy andEd Feigenbaum” 8Arthur Samuel (1901-1990) was a pioneerof artificial intelligence research.
From 1949 through the late 1960s, he didthe best work in making computers learn from their experience. His vehicle forthis was the game of checkers, Samuel’s learning program used Lee’s Guide toCheckers to adjust its criteria for choosing moves so that the program wouldchoose those thought good by checker experts as often as possible. To better understand machine learning,it would be good to consider its role within the following three niches in thesoftware world as stated by Tom Mitchell 2 as well as DU ZHANG andJEFFREY J.P. TSAI 4:a. Data mining:Domains where there are large databases containing valuable implicitregularities to be discovered.
4b. Difficult-to-program applications: Poorlyunderstood problem domains where little knowledge exists for humans to developeffective algorithms. 4c. Customized software applications: Domainswhere programs must adapt to changing conditions. 43.
1Artificial Intelligence, Machine Learning and Deep Learning;Artificial Intelligence, MachineLearning and Deep Learning, three terms often used interchangeably making thedifferences between this three somewhat unclear. The simplest way to actuallyunderstand their relationship is by imagining three concentric circles with AIcoming first which deals with machines that can perform tasks that arecharacteristic of human intelligence like, understanding language, recognizingobjects and sounds then machine learning — a subset of AI, and finally deeplearning — which is an approach in machine learning — fitting inside both. 10 3.2Real Life Applications;Some real-life examples of the use ofmachine learning 3: i. Learning to recognize spoken words: TheSPHINX system (e.g., Lee 1989) learns speaker-specific strategies forrecognizing the primitive sounds (phonemes) and words from the observed speechsignal.
ii. Learning to drive an autonomousvehicle: The ALVINN system (Pomerleau 1989) has used its learned strategies todrive unassisted at 70 miles per hour for 90 miles on public highways amongother cars. iii. Learning to classify new astronomicalstructures: The decision tree learning algorithms have been used by NASA to learnhow to classify celestial objects from the second Palomar Observatory SkySurvey (Fayyad et al. 1995). iv.
Learning to play world-classbackgammon: The world’s top computer program for backgammon, TD-GAMMON (Tesauro1992, 1995). learned its strategy by playing over one million practices gamesagainst itself. It now plays at a level competitive with the human worldchampion. v.
And in testing Microsoft have releaseda tool, called MicrosoftSecurity Risk Detection, which makes uses of fuzz testing, orfuzzing and significantly simplifies security testing and does not require youto be an expert in security in order to root out software bugs 9.3.3Classification of machine learning systems:MachineLearning systems may be classified as stated by Jaime G.
Carbonell, Ryszard S. Michalski and Tom M. Mitchell, interms of: (a.) The Underlying Learning Strategy: Here the learning strategiesare distinguished depending on the amount of inference the learner performs onthe information provided. (b.
) The Representation of knowledge or skillacquired by the learner: whereby a learner could acquire knowledge such asdescriptions of physical objects, rules of behavior and so on. (c.) Theapplication domain of the performance system for which knowledge is acquired:this depends on the area of application such as natural language processing,robotics, image recognition etc.
73.4 Fuzzing it with MachineLearning;Software testing has always been atedious yet important part of the software development cycle, and fuzz testingis one of the mostly used automated software testing technique. Fuzzing is doneby presenting a target program with crafted malicious input designed to discover unexpected behaviorssuch as crashes, buffer overflows, memory errors, and exceptions. Thefuzzing techniques can be categorized into three main categories by WilliamBlum 6: i) Blackbox fuzzing: which rely solely on the sample input files togenerate new inputs. ii) Whitebox fuzzing: which analyze the target programeither statically or dynamically to guide the search for new inputs aimed atexploring as many code paths as possible.
and iii) Greybox fuzzing: which makeuse of a feedback loop to guide their search based on observed behavior fromprevious executions of the program. 6Neural networks can then be made tolearn patterns in the input files from previous fuzzing explorations to guidethe future fuzzing explorations.By using a greybox fuzzer calledAmerican fuzzy lop, and inserting a deep neural network into the feedback loopof the AFL the Microsoftresearchers back in 2017 yielded encouraging results which shows that machinelearning can truly improve fuzzing, whereby the neural fuzzing method yields alist of ways to perform greybox fuzzing that is (a.) Simple: The system learnsa strategy from an existing fuzzer.
(b.) Efficient: From the AFL experiment, inthe first 24 hours they explored significantly more unique code paths thantraditional AFL. (c.) Generic: Although tested only on AFL, the approach couldbe applied to any fuzzer, including blackbox and random fuzzers. 94.
0FUTURE RESEARCH/DEVELOPMENT PROPOSITIONSThe Neural fuzzing research projectdone by Microsoft is just scratching the surface of what can be achieved usingdeep neural networks for fuzzing. For now, the model only learns fuzzinglocations, but it could also be used to learn other fuzzing parameters such asthe type of mutation or strategy to apply.The possibility of developing computerprograms that are capable of improving with experience can lead to the creationof computer software’s developed with greater ease yet able to optimize itselfover time.5.0CONCLUDING REMARKSTheemergence of the internet and explosion in available data that followed hasgreatly helped in the development of machine learning and with new data being generated dailymachine learning still has a long way to go in its development and as such it can better be incorporated intothe field of software development, seeing that machine learning is a subset ofArtificial Intelligence, machine learning’s growth will soon be involved in solvingthe problem AI aims at truly making computer programs that are considered to besmart being able to perform tasks that are characteristic of humanintelligence.Thecreation of the tool “Microsoft Security Risk Detection” also shows promise inthe use of machine learning for further means of software testing.