1. approach to mine the data and find relations


1. Student name: Siddharth Ganesh Sheshadri


2. Your email address: [email protected]

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!

order now


3. Research Area: Data mining and analysis, direct marketing,
revenue management


4. Research Question: An analysis of the shopping data of
customers at a retailer, to understand the effect of the various direct
marketing campaigns and to study the effect of various demographic factors
which affect customer spending.


5. Proposed supervisor: Dr. Mimi Zhang


6. A brief summary of your
research: The dissertation aims at
finding patterns in the shopping patterns of customers who have been subjected
to 3 different types of marketing campaigns, the factors which affect shopping
trends and the responses of the customers to the marketing campaign. The dataset
contains details about customer spending patterns, if they used the coupons
given to them, and how they responded to it. The project will use supervised
learning algorithms for the purpose of data mining and analysis. Decision trees
are used to mine the data and find relationship between the spending of the
customers. Decision trees are good with the cause and effect analysis. CHAID (Chi-square Automatic Interaction Detector) analysis will be done to create a predictive
tree to understand how each factor interacts with each other to explain the
output in the target variable. Ensembles are used to boost the model’s
performance and accuracy.


7. Research Motivation: Due to the pressure to reduce costs,
companies have realized that fewer, but more targeted content is a better strategy
to improve their efforts. This analysis will give an insight into the shopping
pattern of customers, and if the marketing campaign subjected to them was
effective or not. It will provide valuable information about the factors which
affects customers, and can help in retail management, leading to more effective
marketing techniques.


8. Research aims: Aim to find effect of direct marketing on
customers, to find trends in their shopping patterns and analyze the results.


9. Proposed
methodology/implementation: We use
a tree based approach to mine the data and find relations between the effect of
direct marketing and customer shopping pattern. Decision trees are powerful algorithms
for classification and regression. It creates a hierarchical structure to
represent the data, where each node denotes the test done on a particular
factor, and the branches give the output of the test. The leaves of the trees
contain information about the classes. Each tree contains nodes which specify
the rules for the split, and the leaves contain the decision taken. The splits
are made based on the factor that most effectively splits the sample set into
subsets, based on differences in entropy. The tree methodology will be enhanced
by using various Ensembles like Random Forest. The Random Forest method is useful
in dealing with large amounts of categorical data with multiple levels. It
creates multiple decision trees at training time, and uses the mode of the
classes for prediction. It limits the overfitting problem of decision trees
while not increasing the variance or bias. Ensembles combined with decision trees
create a powerful model for data analysis.


10. Background of your research area which puts your research into context:
I am currently studying data analytics, and have a background in statistics and
mathematics. This allows me to use the various tools and techniques I’ve learnt
to analyze large volumes of data. The Machine learning course taught at college
has provided valuable insight into classification algorithms and data mining. This
will allow to leverage various methods of unsupervised learning to perform data


11. Ethical Issues: The data used is freely available to
public on the website https://www.dunnhumby.com/. Thus, Ethics permission will
not be needed.



12. Sources of literature: Scopus:

Predicting direct marketing response in banking:
comparison of class imbalance methods

·       Customer base
analysis: Partial defection of behaviorally loyal clients in a non-contractual
FMCG retail setting. (2005) European Journal of Operational Research

·       Sensory analysis
in the food industry as a tool for marketing decisions. (2012) Advances in Data
Analysis and Classification

·       Predicting direct
marketing response i n banking: comparison of class imbalance methods. (2017)
Service Business

·       Comparison of
target selection methods in direct marketing: http://www.inesc-id.pt/ficheiros/publicacoes/2041.pdf

·       https://www.sciencedirect.com/science/article/pii/S0167923612001881

·       https://www.scirp.org/Journal/PaperInformation.aspx?PaperID=30463



13. Your own expertise and how
well you are positioned to carry out the work: Specialization in data science offers a whole range of tools to perform
data mining and analysis. The decision tree model and ensembles have been
covered in depth in the course which will provide valuable insight into the
data. Previous experience with statistics will help identify the correct tests
to use and interpret the results correctly. Various machine learning algorithms
will be used to derive useful insights from the data.


19. References:

Decision Tree Pruning Using Expert Knowledge, Jingfeng

A complete fuzzy decision tree technique, http://www.sciencedirect.com/science/article/pii/S0165011403000897

 On the
optimization of fuzzy decision trees, http://www.sciencedirect.com/science/article/pii/S0165011497003862

Data Mining for Direct Marketing: Problems and
Solutions Charles X. Ling and Chenghui Li

Decision Tree and Naïve Bayes Algorithm for
Classification and Generation of Actionable Knowledge for Direct Marketing
Masud Karim, Rashedur M. Rahman  


20. Table/Chart of Research


21. Proposed Table of contents for your dissertation: