How to use AI for Customer Behavior Analysis?

Dec 4, 2024

8 min read

Click here to use the template

Analyze the target

Analyze the target based on user data and consumption behavior data

Use Bayeslab to build a classification model and perform logistic regression

to predict customer groups with a higher probability of using coupons

Data overview analysis

Data Preview

Indicator explanation

ID record code

age age

job occupation

marital marital status

default has Huabei ever been in default

returned has there ever been a return

loan has Huabei been used for payment

coupon_used_in_last6_month number of coupons used in the past six months

coupon_used_in_last_month number of coupons used in the past month

coupon_ind was a coupon used in this event

prompt:

Check and display the basic data situation of the data table

prompt:

Check if the data has any missing values

Checked the general information of the data and confirmed that there are no missing values.

Data cleaning

Categorical variable

Transform categorical variables into numerical variables for easier analysis later

But in this case, to facilitate subsequent analysis, only process the default, returned, loan variables, keeping job, marital

Extract the default, returned, loan variables separately and perform one-hot encoding using get_dummies().

prompt:

Extract the variables default, returned, and loan separately for dummy variable processing using get_dummies(), and concatenate the result with the original table, then save it in the table.

prompt:

Delete the columns 'ID', 'default', 'default_no', 'returned', 'returned_no', 'loan', 'loan_no', rename coupon_ind to flag, and save the final result in the table.

prompt:

display the data information

Univariate analysis

Observe the balance of sample 0 and 1

In binary classification problems, we typically have two classes, represented here by 0 and 1. The ideal class distribution is balanced, meaning the number of samples in both classes is roughly equal. If one class has significantly more samples than the other, this leads to class imbalance, which can affect the model's generalization ability and prediction accuracy.

In binary classification problems, even for the minority class, its proportion in the total sample should not be less than 5%. This is an empirical rule to ensure that there are enough samples for the minority class to train the model, allowing it to learn features about the minority class.

prompt:

For the sample flag, show the proportion of two categories.

In binary classification problems, the proportions of 0 and 1 should be kept balanced, with no less than 0.05 in actual situations; otherwise, it will affect the model's predictions.

The proportions of 0 and 1 in this dataset are both higher than 0.05, so its distribution is reasonable.

Observe the magnitude of the mean value

prompt:

Group by flag and aggregate, calculate the mean of each other field. Ensure that all other indicators are numerical.

For variables with data types of 0 and 1, observing the size of the mean can help us analyze the distribution of this variable on flag:

The mean of coupon_used_in_last_month is 0.26 for 0 and 0.53 for 1, indicating that customers who used coupons last month are more likely to use them again

The means of default_yes and loan_yes when 0 are both greater than when 1, suggesting that customers who defaulted on Huabei or paid bills using Huabei have a lower probability of using coupons in the following period

The means of age for 0 and 1 are 40.8 and 41.8 respectively, with little difference, indicating that age does not have a significant distinguishing relationship

Visualization

prompt:

Draw a chart to observe the distribution of returned_yes on the flag

It is found that customers who return items are less likely to use coupons compared to those who do not return items. It is speculated that part of the reason may be due to forgetting to use the coupon.

prompt:

Draw a chart to observe the distribution of marital on flag

Distinguish the flag.

The probability of married customers using coupons is slightly higher than that of unmarried and divorced customers using coupons.

The probability of married people not using coupons is also higher than that of unmarried people not using coupons.

However, the probability of all three groups not using coupons is much higher than that of using coupons.

prompt:

Draw a graph to observe the job distribution on the flag

Customers who found that their job title was management, technician, blue-collar were more likely to use coupons

prompt:

Draw a graph to observe the age distribution on the flag

prompt:

Draw a graph to observe the age distribution on the flag

Fewer extreme values were found for ages > 60, but they affected the overall data distribution. It is speculated that this part of data is wrong data, so this part of data needs to be excluded from the scope of analysis

prompt:

Regardless of age>60, age was quickly grouped (<20,<40,<60) to explore the influence of each age group on flag

drop >60

The data shows that customers younger than 20 years old are more likely to use coupon

Correlation and visualization

prompt:

Draw the correlation heat maps of all fields except job and marital (excluding rowid). Use blue for the image colo

flag was strongly positively correlated with coupon_used_in_last_month and age

flag is strongly and negatively correlated with coupon_used_in_last6_month, returned_yes

The correlation between other variables and flag is not obvious. For the sake of analysis accuracy, over-interpretation is not carried out

Establishment and evaluation of logistic regression model

Model establishment

prompt:

Set the independent variables as ['coupon_used_in_last_month', 'returned_yes', 'loan_yes'], and the dependent variable as ['flag']. Call the sklearn module to randomly split the training set and test set (in a 7/3 ratio). Then fit using logistic regression and display the model coefficient results.

prompt:

use auc to evaluate the model and give me the model score roc_auc

When coupon_used_in_last_month changes from 0 to 1, the probability of not using a coupon to using a coupon increases by a factor of e^0.41, which is 1.5 times that of other groups of customers.

When returned_yes changes from 1 to 0, the probability of using a coupon to not using a coupon increases by a factor of 0.41 times that of other groups of customers.

When loan_yes changes from 1 to 0, the probability of using a coupon to not using a coupon increases by a factor of 0.63 times that of other groups of customers.

Therefore, from a probabilistic perspective, customers who used coupons last month, customers who have not returned goods, and customers who did not pay with Huabei are more likely to use coupons again.

However, the model score is 0.67. Generally, a good model score is between 0.7 and 0.8, so consider adjusting this model.

Model optimization

prompt:

Set the independent variables as ['coupon_used_in_last_month', 'returned_yes', 'loan_yes', 'coupon_used_in_last6_month', 'default_yes', 'age'], and the dependent variable as flag. Call the sklearn module, randomly split the training set and test set (7/3), then fit using logistic regression, and display the model coefficient results.

prompt:

use auc to evaluate the model and give me the model score roc_auc

Only coupon_used_in_last_month, age, and flag have a positive correlation, while other variables are negatively correlated with flag.

The AUC score did not change much after model iteration, indicating that the usage rate of coupons is significantly low.

Business Suggestions

User Analysis

The probability of using a coupon is highest among customers aged 20-40.

18, 32, and 48 are the average ages with higher probabilities of using coupons in their respective age groups.

Analysis of improving coupon usage rate - high-value users

The mean of coupon_used_in_last_month is 0.26 for 0 and 0.53 for 1, indicating that customers who used coupons last month are more likely to use coupons again.

The means of default_yes and loan_yes when they are 0 are both greater than when they are 1, suggesting that customers who defaulted on Huabei or paid with Huabei have a lower probability of using coupons in the following period.

Compared to customers who did not return goods, those who returned goods have a lower probability of using coupons.

Married customers have a slightly higher probability of using coupons compared to unmarried and divorced customers.

Customers with job titles of management, technician, and blue-collar are more likely to use coupons.

The flag has a strong positive correlation with coupon_used_in_last_month and age, and a strong negative correlation with coupon_used_in_last6_month and returned_yes. The correlations between other variables and the flag are not significant.

Conclusion

The usage rate of coupons is low.

Pay special attention to the retention of customers aged 20-60. For customers with purchasing potential or those who purchase a relatively single type of product, develop an upselling or cross-selling model to enhance the value of existing customers.

Encourage customers who used coupons last month to use them again, and develop corresponding product response models or event response models to maximize benefits.

For customers who have not returned products, married customers, those without Huabei (Ant Credit Pay) defaults, and those who have not paid via Huabei, develop customer churn warning models or customer win-back models. Focus on managers, technicians, and blue-collar workers to try to retain these customers as much as possible.

Increase the promotion of coupons within the APP, strengthen marketing measures such as banners and advertising pushes; conduct additional pushes outside the APP, including third-party coupon pushes, so that customers can learn about and increase their likelihood of using coupons.

How to use AI for Customer Behavior Analysis?

How to use AI for Customer Behavior Analysis?

How to use AI for Customer Behavior Analysis?

How to use AI for Customer Behavior Analysis?

Click here to use the template

Analyze the target

Data overview analysis

Data Preview

Indicator explanation

prompt:

prompt:

Data cleaning

Categorical variable

prompt:

prompt:

prompt:

Univariate analysis

Observe the balance of sample 0 and 1

prompt:

Observe the magnitude of the mean value

prompt:

Visualization

prompt:

prompt:

prompt:

prompt:

prompt:

prompt:

Correlation and visualization

prompt:

Establishment and evaluation of logistic regression model

Model establishment

prompt:

prompt:

Model optimization

prompt:

prompt:

Business Suggestions

User Analysis

Analysis of improving coupon usage rate - high-value users

Conclusion

Bayeslab makes data analysis as easy as note-taking!

Bayeslab makes data analysis as easy as note-taking!

Bayeslab makes data analysis as easy as note-taking!

Bayeslab makes data analysis as easy as note-taking!

Bayeslab makes data analysis as easy
as note-taking!