GENETIC ALGORITHM BASED PATH-ORIENTED AUTOMATIC TEST DATA GENERATION

PREMAL B. NIRPAL1, KALE K.V.2*
1Department of CS & IT, Dr. B. A. Marathwada University, Aurangabad, India - 431004
2Department of CS & IT, Dr. B. A. Marathwada University, Aurangabad, India - 431004
* Corresponding Author : premal.nirpal@gmail.com

Received : 20-01-2011     Accepted : 12-04-2011     Published : 15-12-2011
Volume : 2     Issue : 1       Pages : 6 - 9
Software Eng 2.1 (2011):6-9

Conflict of Interest : None declared

Cite - MLA : PREMAL B. NIRPAL and KALE K.V. "GENETIC ALGORITHM BASED PATH-ORIENTED AUTOMATIC TEST DATA GENERATION." Software Engineering 2.1 (2011):6-9.

Cite - APA : PREMAL B. NIRPAL , KALE K.V. (2011). GENETIC ALGORITHM BASED PATH-ORIENTED AUTOMATIC TEST DATA GENERATION. Software Engineering, 2 (1), 6-9.

Cite - Chicago : PREMAL B. NIRPAL and KALE K.V. "GENETIC ALGORITHM BASED PATH-ORIENTED AUTOMATIC TEST DATA GENERATION." Software Engineering 2, no. 1 (2011):6-9.

Copyright : © 2011, PREMAL B. NIRPAL and KALE K.V., Published by Bioinfo Publications. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution and reproduction in any medium, provided the original author and source are credited.

Abstract

This paper discusses genetic algorithms that can automatically generate test cases to test selected path. This algorithm takes a selected path as a target and executes sequences of operators iteratively for test cases to evolve. The evolved test case can lead the program execution to achieve the target path. An automatic path-oriented test data generation is not only a crucial problem but also a hot issue in the research area of software testing today. As a robust metaheuritstic search method in complex spaces, genetic algorithm (GA) has been used to path-oriented test data generation since 1992 and outperforms other approaches.

Keywords

Software Testing, Test case generation, Path testing, Genetic Algorithm

Introduction

Software being utilized in various situations and software quality becomes more important than ever. Being main means of software quality assurance, software testing is very laborious and costly due to the act that it is accounts for approximately 50 percent of the elapsed time and more than 50 percent of the total cost in software development [4,5] .
Automatic test data generation is a key problem in software testing and its implementation can not only significantly improve the effectiveness and efficiency but also reduce the high cost of software testing [3,4] . In particular, it is notable that various structural test data generation problem can be transformed into a path oriented test data generation problem. Moreover, path testing strategy can detect almost 65 percent of errors in program under test [8] .
Although path-oriented test data generation is an undesirable problem [6] , researchers still attempt to develop various methods and have made some progress. These means can be classified into two types: static methods and dynamic methods. Static methods include domain reduction [10,11] and symbolic execution [12] etc. These means suffer from a number of problems when they handle indefinite array, loops, pointer references and procedure calls [13] .
Dynamic methods include random testing, local search approach [14] , goal-oriented approach [15] , chaining approach [16] and evolutionary approach [13-16] . As values of input variables are determined when programs execute, dynamic test data generation can avoid those problems with that static methods are confronted. Being a robust search method in complex spaces, genetic algorithm was applied to test data generation in 1992 [14] and evolutionary approach has been a burgeoning interest since then. Related works [17,16] and [18] indicate that GA-based test data generation outperforms other dynamic approaches e.g. random testing and local search.
The structure of this paper is organized as follows. Section 2 gives a brief introduction to Genetic Algorithms. Section 3 Basic process flow of path-oriented test data generation using GA. Section 4 describes experimental settings and gives experimental results based on a triangle classification program. Finally, section 5 summarizes the paper with conclusions and directions for future work.

Genetic Algorithms

Genetic Algorithms begins with a set of initial individuals as the first generation, which are sampled at random from the problem domain. The algorithms are developed to perform a series of operations that transform the present generation into a new, fitter generation [22] .
Each individual in each generation is evaluated with a fitness function. Based on the evaluation, the evolution of the individuals may approach the optimal solution.
The most common operations of genetic algorithms are designed to produce efficient solution for the target problem [15] . These primary operations include:

a) Reproduction: This operation assigns the reproduction probability to each individual based on the output of the fitness function. The individual with a higher ranking is given a greater probability for reproduction. As a result, the fitter individuals are allowed a better survival chance from one generation to the next.

b) Crossover: This operation is used to produce the descendants that make up the next generation. This operation involves the following crossbreeding procedures:
i) Randomly select two individuals as a couple from the parent generation.
ii) Randomly select a position of the genes, corresponding to this couple, as the crossover point. Thus, each gene is divided into two parts.
iii) Exchange the first parts of both genes corresponding to the couple.
iv) Add the two resulted individuals to the next generation.

c) Mutation: This operation picks a gene at random and changing its state according to the mutation probability. The purpose of the mutation operation is to maintain the diversity in a generation to prevent premature convergence to a local optimal solution. The mutation probability is given intuitively since there is no definite way to determine the mutation probability [22] .
Upon completion of crossover processing and mutation operations, there will be an original parent population and a new offspring population. A fitness function should be devised to determine which of these parents and offspring’s can be survived into the next generation. After performing the fitness function, these parents, and offspring’s are filtered and a new generation is formed. These operations are iterated until the expected goal is achieved. Genetic algorithms guarantee high probability of improving the quality of the individuals over several generations according to the Schema Theorem [5] .

Basic process flow of path-oriented test data generation using genetic algorithm

A selected target path is the goal for GA to achieve, and an input vector X (a test data) is regarded as an individual. To generate path-oriented test data for the program under test using GA, there are five steps and Figure 1 depicts the basic process flow [6,7] .

1. Control flow graph construction- Control flow graph of the program under test may be constructed manually or automatically with related tools. It helps testers to select representative target paths.

2. Target path selection- In general, a program under test has too many paths to test completely. Thus, testers have to select meaningful paths as target paths.

3. Fitness function construction- In order to evaluate distance between the executed path and the target path, fitness function has to be constructed.

4. Program instrumentation- This means inserting probes at the beginning of every block of source code to monitor program execution and collect related information (e.g. fitness values of individuals).

5. Test data generation and the instrumented program execution- Original test data are chosen from their domain at random and GA generates new test data in order to achieve the target path. Finally, suitable test data that executes along the target paths may be generated or no suitable test data may be found because of exceeding max generation [22] .

Experimental studies

Triangle classification program
Triangle classification program has been widely used in the research area of software testing [22,24] . It aims to determine if three input edges can form a triangle and so what type of triangle can be formed by them. Figure 2 gives source code of the program.

1. Control flow graph construction The tested program ( [Fig-2] Triangle classification program) determines what kind of triangle can be formed by any three input lengths. The programs control flow diagram, which contains four paths, is shown in [Fig-3] .

Target path selection:
[Fig-3] is control flow graph of the triangle classification program, which consists of four paths:
• Path l: //Not-a-triangle
• Path 2: //Scalene
• Path 3: //Isosceles
• Path 4: //Equilateral
According to probability theory, the path is the most difficult path to be covered in path testing. Therefore, the path is selected as the target path.

Test case generation and execution:
Experimental settings
Settings of standard genetic algorithm (SGA) are as following:
1. Coding: binary string
2. Length of chromosome: 3Nbits (N=8, 10……..,15), and each edge are range from 1 to 2N
3. Population size: from 1 to 1000
4. Stochastic universal sampling
5. Two-point crossover probability = 0.9
6. Mutation probability = 0.01
7. Generation gap = 0.96
8. Max generation = 1000

In this experiment we have used Genetic Algorithm for 10 generations with n=15, initial population with 1000 test cases. The size of the chromosome is 3. Mutaion rate is 0.01. Selection rate 0.5

Conclusion

In this paper, the genetic algorithms are used to automatically generate test cases for path testing. The greatest merit of genetic algorithm in program testing is its simplicity. Each iteration of the genetic algorithms generates a generation of individuals. In practice, the computation time cannot be infinite, so that the iterations in the algorithm should be limited. Within the limited generations, solution derived by genetic algorithms may be trapped around a local optimum and as a result, fail to locate required may be trapped around unwanted paths and fail to locate the required global optimum. Although the tested cases generated by such algorithms may be trapped around unwanted paths and fail to locate the required paths, since the test cases of the first generation are normally distributed over the domain of the tested program, the probability of being trapped is very low.
The quality of test cases produces by genetic algorithms is higher than the quality of test cases produced by random way because the algorithm can direct the generation of test cases to the desirable range fast.
This paper shows that genetic algorithms are useful in reducing the time required for lengthy testing meaningfully by generating test cases for path testing. Furthermore, we build our Genetic Algorithm for structural testing for reduce execution time & generate more suitable test cases.

Acknowledgment

The authors wish to acknowledge UGC for the award of Research Fellowship under Fellowship in Sciences to Meritorious Students (RFSMS) scheme for carrying out this research.

References

[1] Roger S. Pressman (1997) Software Engineering, A Practitioner’s Approach 5th Edition, McGraw Hill.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[2] Beizer B. (1990) Software Testing Techniques 2nd Edition, International Thomson Computer Press.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[3] Srinivasan Desikan, Gopalaswamy Ramesh (2006) Software Testing Principles & Practices, PEARSON Education.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[4] Myers G. J. (2004) The Art of Software Testing.2nd ed.: John Wiley & Sons Inc.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[5] Antonia B. (2007) Software Testing Research: Achievements, Challenges, Dreams, in 2007 Future of Software Engineering: IEEE Computer Society.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[6] Chen Yong and Zhong Yong (2008) Proceedings of Fourth International Conference on Natural Computation (ICNC '08), Jinan, China.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[7] Chen Yong, Zhong Yong, Bao Shengli and He Famei (2008) The International Conference 2007 on Information Computing and Automation, Chengdu, China.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[8] Kernighan B.W. and Plauger P.J. (1982) The Elements of Programming Style: McGraw-Hill, Inc. New York, NY, USA.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[9] Weyuker E. J. (1979) International Journal of Parallel Programming, vol. 8, 1979,pp. 387-403.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[10] Chen T. Y., Tse T. H. and Zhiquan Z. (2002) Proceedings of the 2002 ACM SIGSOFT international symposium on Software testing and analysis Roma, Italy: ACM.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[11] Nguyen Tran S. and Yves D. (2003) ACM SIGSOFT Software Engineering Notes, vol. 28, 108-117.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[12] James C. K. (1975) Proceedings of the international conference on Reliable software Los Angeles, California: ACM.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[13] Michael G.M.C.C., Schatz M. (2001) IEEE Transactions on Software Engineering, vol. 27, 1085-1110.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[14] Korel B. (1990) IEEE Transactions on Software Engineering, vol. 16, 870- 879.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[15] B. Korel, "Dynamic method for software test data generation," Software Testing, Verification & Reliability, vol. 2, 1992, pp. 203-213.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[16] Wegener J., Kerstin B. and Hartmut P. (2002) Proceedings of the Genetic and Evolutionary Computation Conference: Morgan Kaufmann Publishers Inc.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[17] Joachim W., Andr Baresel and Harmen S. (2002) Proceedings of the 26th International Computer Software and Applications Conference on Prolonging Software Life: Development and Redevelopment: IEEE Computer Society.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[18] Christoph C. Michael, Gary McGraw and Michael A. Schatz (2001) IEEE Transactions On Software Engineering, 27, 12.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[19] Roy P Pargas, Mary Jean Harrold, Robert R Peck (1999) Journal of Software Testing, Verification and Reliability.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[20] Alan C. Schultz, John J. Grefenstette, aid Kenneth A. De Jong (1993) “Test and Evaluation by Genetic Algorithms”, IEEE, 8(5), 9-14, DOI: 10.1109/64.236476.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[21] Joachim Wegener, Kerstin Buhr, Hartmut Pohlheim (2002) Proceeding GECCO '02 Proceedings of the Genetic and Evolutionary Computation Conference.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[22] Yong Chen, Yong Zhong, Tingting Shi and Jingyong Liu (2009) 2009 Fifth International Conference on Natural Computation, IEEE.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[23] Richard A. DeMillo and A. Jefferson Offutt (1991) IEEE Transactions On Software Engineering, 17, 9.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[24] Jin-Cherng Lin and Pu-Lin Yeh (2000) Proceeding ATS '00 Proceedings of the 9th Asian Test Symposium, ISBN:0-7695-0887-1. IEEE Computer Society Washington, DC, USA  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[25] Debasis Mohapatra, Prachet Bhuyan and Durga P. Mohapatra (2009) WASE International Conference on Information Engineering.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[26] Donald J. Berndt and Alison Watkins (2004) Proceedings of the Eighth IEEE International Symposium on High Assurance Systems Engineering (HASE’04).  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[27] Abdelhamid Bouchachia (2007) Seventh International Conference on Hybrid Intelligent Systems, IEEE, 2007.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[28] Xiajiong Shen, Qian Wang, Peipei Wang and Bo Zhou (2007) Automatic Generation of Test Case based on GATS Algorithm, AA04Z148.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[29] Uwadia C. O., Odejobi O. A. (2005) SACJ, 34.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[30] Tan X.B., Cheng Longxin and Xu Xiumei, “(2009) Fifth International Joint Conference on INC, IMS and IDC, IEEE.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

[31] Konstantinos Liaskos and Marc Roper, (2007) Testing: Academic and Industrial Conference – Practice and Research Techniques, IEEE.  
» CrossRef   » Google Scholar   » PubMed   » DOAJ   » CAS   » Scopus  

Images
Fig. 1- Basic process flow
Fig. 2- An example program
Fig. 3- Control flow graph of the example Program
Fig. 4- Average number of test cases on the path of [Fig-3] of each generation
Table 1- Korel’s distance function
Table 2- Fig Average number of test cases on the path of [Fig-3] of each generation