1. Student name: Siddharth Ganesh Sheshadri 2. Your email address: [email protected] 3. Research Area: Data mining and analysis, direct marketing,revenue management 4.
Research Question: An analysis of the shopping data ofcustomers at a retailer, to understand the effect of the various directmarketing campaigns and to study the effect of various demographic factorswhich affect customer spending. 5. Proposed supervisor: Dr. Mimi Zhang 6. A brief summary of yourresearch: The dissertation aims atfinding patterns in the shopping patterns of customers who have been subjectedto 3 different types of marketing campaigns, the factors which affect shoppingtrends and the responses of the customers to the marketing campaign. The datasetcontains details about customer spending patterns, if they used the couponsgiven to them, and how they responded to it.
The project will use supervisedlearning algorithms for the purpose of data mining and analysis. Decision treesare used to mine the data and find relationship between the spending of thecustomers. Decision trees are good with the cause and effect analysis. CHAID (Chi-square Automatic Interaction Detector) analysis will be done to create a predictivetree to understand how each factor interacts with each other to explain theoutput in the target variable. Ensembles are used to boost the model’sperformance and accuracy. 7. Research Motivation: Due to the pressure to reduce costs,companies have realized that fewer, but more targeted content is a better strategyto improve their efforts.
This analysis will give an insight into the shoppingpattern of customers, and if the marketing campaign subjected to them waseffective or not. It will provide valuable information about the factors whichaffects customers, and can help in retail management, leading to more effectivemarketing techniques. 8. Research aims: Aim to find effect of direct marketing oncustomers, to find trends in their shopping patterns and analyze the results. 9.
Proposedmethodology/implementation: We usea tree based approach to mine the data and find relations between the effect ofdirect marketing and customer shopping pattern. Decision trees are powerful algorithmsfor classification and regression. It creates a hierarchical structure torepresent the data, where each node denotes the test done on a particularfactor, and the branches give the output of the test. The leaves of the treescontain information about the classes. Each tree contains nodes which specifythe rules for the split, and the leaves contain the decision taken. The splitsare made based on the factor that most effectively splits the sample set intosubsets, based on differences in entropy.
The tree methodology will be enhancedby using various Ensembles like Random Forest. The Random Forest method is usefulin dealing with large amounts of categorical data with multiple levels. Itcreates multiple decision trees at training time, and uses the mode of theclasses for prediction.
It limits the overfitting problem of decision treeswhile not increasing the variance or bias. Ensembles combined with decision treescreate a powerful model for data analysis. 10. Background of your research area which puts your research into context:I am currently studying data analytics, and have a background in statistics andmathematics.
This allows me to use the various tools and techniques I’ve learntto analyze large volumes of data. The Machine learning course taught at collegehas provided valuable insight into classification algorithms and data mining. Thiswill allow to leverage various methods of unsupervised learning to perform dataanalysis. 11. Ethical Issues: The data used is freely available topublic on the website https://www.dunnhumby.
com/. Thus, Ethics permission willnot be needed. 12. Sources of literature: Scopus:· Predicting direct marketing response in banking:comparison of class imbalance methods· Customer baseanalysis: Partial defection of behaviorally loyal clients in a non-contractualFMCG retail setting. (2005) European Journal of Operational Research· Sensory analysisin the food industry as a tool for marketing decisions.
(2012) Advances in DataAnalysis and Classification· Predicting directmarketing response i n banking: comparison of class imbalance methods. (2017)Service Business· Comparison oftarget selection methods in direct marketing: http://www.inesc-id.pt/ficheiros/publicacoes/2041.pdf· https://www.sciencedirect.com/science/article/pii/S0167923612001881· https://www.
scirp.org/Journal/PaperInformation.aspx?PaperID=30463 13. Your own expertise and howwell you are positioned to carry out the work: Specialization in data science offers a whole range of tools to performdata mining and analysis.
The decision tree model and ensembles have beencovered in depth in the course which will provide valuable insight into thedata. Previous experience with statistics will help identify the correct teststo use and interpret the results correctly. Various machine learning algorithmswill be used to derive useful insights from the data. 19. References: · Decision Tree Pruning Using Expert Knowledge, JingfengCai· A complete fuzzy decision tree technique, http://www.sciencedirect.
com/science/article/pii/S0165011403000897· On theoptimization of fuzzy decision trees, http://www.sciencedirect.com/science/article/pii/S0165011497003862· Data Mining for Direct Marketing: Problems andSolutions Charles X. Ling and Chenghui Li· Decision Tree and Naïve Bayes Algorithm forClassification and Generation of Actionable Knowledge for Direct MarketingMasud Karim, Rashedur M. Rahman 20. Table/Chart of ResearchMilestones: 21.
Proposed Table of contents for your dissertation: