One of the important fields in GP is the generalization ability of the solutions which means that whetherthe obtained solutions on the unseen data can return good results as on the training data. Generally, apart of attempts is related to analyzing the variance to increase the possibility of generalization. Some2researchers have studied the factors affecting overfitting and generalization in GP 10, 11, 12. Some other ofthe researchers introduced some modifications led to enhance GP generalization capacity 2, 13, 14, 15, 16.
Castelli et al. 10 proposed a measurement of functional complexity called Graph-Based Complexity(GBC). The idea of this measurement was related to the proportionality of complexity of a function withits degree of curvature. This measurement was embedded in a novel fitness function to boost generalizationability of GP.
A novel selection method based on the variance was introduced by Azad and Ryan 13which improved the generalization ability of GP. Based on the variance of outputs on the training data,the smoothness of solutions was estimated. As a result, the training error and variance of outputs wereminimized.Gonalves et al. 17 proposed a method to control overfitting which was named Random SamplingTechnique (RST). In this method, instead of using the whole training data in each generation, a subset ofthem which was randomly chosen is applied to compute the fitness.
Thus, the performance of programson different subsamples improved and the chance of remaining in the next generation went up. Note thatRST is beneficial when there is a large training data. Although researchers efforts could improve the GPgeneralization ability based on 18, 19, we still do need to recommend better solutions to this problem.
Recently some researchers have studied multi-population genetic programming to deal with the problemswhich need to maintain the diversity of population during the evolution.Kommenda et al. 20 proposed a new method based on GP along with data migration. This studyfocused on developing an algorithm with suitable generalization ability for symbolic regression.
GP enabledwith data migration can benefit from several subpopulations to both maintain the diversity of populationduring the evolution and utilize an efficient selection method for training subset. By using different fixedtraining subset (FTS) for evaluating each subpopulation and a special variable training subset (VTS) to beexchanged between these subpopulations at predefined data migration intervals, we have a better generalization ability. This is due to regular changes created by VTS.Maua and Grbac 21 used multi-population GP for predicting software defect. In their study, they usedtwo operators for colonizing a migrating with three ensemble selection method to analyze the performanceof evolving different ensembles.
This analysis was done using GP on unbalanced data for multi-objectivepurposes