Exploring Customer Behavior: A Data Analysis of Credit Card Dataset
Data Visualization and Statistical Analysis of a Kaggle Credit Card Dataset using Tableau
Introduction
This project delves into a comprehensive analysis of a credit card dataset sourced from Kaggle, utilizing the power of data visualization and statistical analysis techniques in Tableau. The aim is to uncover valuable insights into customer behavior and explore the relationships between various demographic, financial, and product-related factors present in the dataset.
Objective
- To conduct visual analysis of the distribution and relationships of key variables in the credit card dataset.
- To gain insights into customer behavior and demographic characteristics, such as age, gender, education level, and marital status.
- To identify significant correlations between financial indicators, such as credit limits, transaction counts, and utilization ratios.
- To perform statistical analysis to validate assumptions, identify trends, and draw meaningful conclusions from the dataset.
- To create informative and visually appealing Tableau dashboards for effective communication of findings and facilitating data-driven decision-making.
Methodology
- Data Preprocessing: The dataset will be cleansed and prepared, addressing missing values, outliers, and data inconsistencies.
- Exploratory Data Analysis (EDA): A comprehensive EDA will be conducted to understand the distribution, summary statistics, and relationships among variables.
- Data Visualization: Tableau’s robust visualization capabilities will be leveraged to create insightful graphs, charts, and interactive dashboards.
- Statistical Analysis: Statistical techniques, including correlation analysis, hypothesis testing, and regression analysis, will be applied to gain deeper insights into the dataset.
- Interpretation and Conclusion: The visualizations and statistical analysis results will be interpreted to derive meaningful conclusions and actionable insights.
Dataset Description
The dataset consists of a diverse range of variables, including customer age, gender, dependent count, education level, marital status, income category, card category, and several financial indicators such as credit limit, transaction counts, and utilization ratios. With a substantial number of records, this dataset presents an opportunity to reveal hidden patterns and trends.
Part 1 — Demographics Overview
The histogram analysis reveals interesting insights about the distribution of credit card holders by age. The largest segment of credit card holders falls within the age range of 40to 45, accounting for 19.26% of the dataset. This age group represents the highest concentration of customers, indicating that individuals in their early 50s are more likely to hold credit cards. Additionally, the age range of 45to 50 exhibits a considerable proportion of credit card holders at 18.76%. These findings suggest that middle-aged individuals form a significant customer base for credit card services. On the other hand, the age group of 25 to 30represents the smallest portion of credit card holders at 1.09%. This lower percentage indicates that younger individuals may have a relatively lower propensity for credit card usage. Understanding the age distribution of credit card holders is crucial for tailoring marketing strategies and designing targeted financial products that align with the preferences and needs of specific age segments.
The bar chart analysis provides valuable insights into the distribution of credit card holders by marital status. The largest segment of credit card holders is married individuals, accounting for a significant percentage of 46.28%. This indicates that married individuals are more likely to hold credit cards compared to other marital status categories. The next prominent category is singles, comprising 38.94% of credit card holders. This suggests that unmarried individuals, including those who are single or never married, form a substantial portion of the customer base. Divorced individuals represent a smaller percentage of 7.39%, indicating a comparatively lower prevalence of credit card usage among this group. Lastly, the category of “unknown” marital status accounts for 7.40%, indicating a small proportion of credit card holders with undisclosed or unspecified marital status. Understanding the distribution of credit card holders by marital status is crucial for tailoring marketing strategies, developing targeted financial products, and offering appropriate benefits that cater to the needs and preferences of specific marital status segments.
The stack bar chart provides insightful analysis of the attrition flag by gender among credit card holders. Among female customers, 43% are existing customers, indicating a significant portion of female cardholders who have not closed their accounts. On the other hand, 9% of female customers have attrited, representing those who have closed their accounts. This suggests that while the majority of female customers remain active, there is still a notable attrition rate within this segment. Among male customers, 40% are existing customers, comparable to the proportion of existing female customers. However, the attrition rate among male customers is slightly lower at 8%. These findings indicate that both male and female customers have similar levels of account retention, but the attrition rate is slightly higher among female customers. Understanding the attrition patterns by gender can help financial institutions identify potential areas for improvement in customer retention strategies and implement targeted measures to reduce attrition rates and enhance customer loyalty.
The bar chart analysis provides valuable insights into the distribution of credit card holders by marital status. Among the income categories, the largest segment of credit card holders falls into the income range of “Less than $40K,” accounting for a significant percentage of 35.16%. This indicates that a substantial proportion of credit card holders have lower incomes. The next prominent income category is “$40K-$60K,” representing 17.68% of credit card holders. The income ranges of “$60K-$80K” and “$80K-$120K” follow closely with percentages of 13.84% and 15.16% respectively. These findings suggest that a considerable number of credit card holders belong to the middle-income brackets. The highest income category of “$120K+” accounts for 7.18% of credit card holders, indicating a smaller proportion of individuals with higher incomes. Lastly, the category of “Unknown” income represents 10.98% of credit card holders, indicating a portion of customers with undisclosed or unspecified income information. Understanding the distribution of credit card holders by income category helps financial institutions tailor their services, benefits, and credit limit offerings to align with the financial profiles and needs of specific income segments.
Part 2— Card Holder Profile
The packed bubble chart offers a visual representation of the educational composition of credit card holders. The largest segment belongs to those with a Graduate level of education, constituting 36.34% of all cardholders. This may suggest that individuals with graduate degrees are the most likely demographic to own and use credit cards, potentially due to higher income levels, better financial literacy, or a combination of both. Following this, the second-largest group is the High School educated cardholders, accounting for 23.39%. The Uneducated group represents the third-largest at 17.27%, which may reflect factors such as lower income levels or a lack of access or awareness of credit facilities. College educated individuals represent 11.77% of the cardholders, Post-Graduate at 5.99%, and Doctorate holders are the smallest group with 5.24%. It’s interesting to note the significant drop from the Graduate to Post-Graduate and Doctorate levels. This may indicate that those with higher educational qualifications beyond a graduate degree may not see the need for credit cards or prefer other means of financial management. These insights could be valuable for credit card companies to strategize their marketing efforts based on education levels.
The Highlight Table provides a comprehensive view of how credit card holders are distributed across different income categories. The colors and numerical values in the table offer a clear visual representation of this distribution.
From this chart, it’s evident that the most significant category for credit card holders is those with less than $40K annual income. Despite the lower income levels, these customers make up a sizable portion of the credit card holder base, potentially due to the necessity of credit for day-to-day expenses or the use of credit cards as a means of short-term borrowing.
The next substantial category is the $40K — $60K range, followed by the $80K — $120K and the $60K — $80K range. Interestingly, customers with an annual income of $120K and above represent a smaller portion of the credit card holders, which may imply they have alternative means for managing their finances or they don’t heavily rely on credit cards.
Knowing these income distributions can be valuable for targeting marketing and customer engagement efforts. It allows the credit card companies to understand where their primary customer base lies in the income spectrum, and plan their product offerings and communication strategies accordingly.
The Highlight Table illustrates a clear preference among credit card holders for the various card types. The numbers reveal a significant predominance of the ‘Blue’ card category, which constitutes 9436 cardholders.
Analyzing the data, we observe that Blue cardholders represent the majority, with an overwhelming 93.18% of total credit card holders. This could suggest that the ‘Blue’ card is likely the most accessible or offers the best value for the majority of customers. The ‘Silver’ cardholders come next, comprising a much smaller 5.48% of the total cardholders, indicating it could be a more premium or specialized offering compared to the ‘Blue’ card. The ‘Gold’ and ‘Platinum’ cardholders represent only a small fraction of the total at 1.15% and 0.20% respectively, suggesting that these cards are likely high-end offerings targeting a specific, possibly affluent, customer segment. Understanding this distribution is essential for card companies as they design their product offerings and market segmentation strategies.
Part 3— Financial Analysis
The Stacked Bar Chart provides an insightful overview of the income distribution of credit card holders separated by gender. Each bar represents an income category, and the size of the segments in each bar indicates the proportion of males and females in each category.
If, for example, in the ‘Less than $40K’ income category, the female segment is larger than the male segment, it suggests that more women in our dataset fall into this income category compared to male. Similarly, if in the ‘$40K-$60K’ category, the female segment is larger, it indicates that women make up the majority of credit card holders in this income category.
Such a gender-based analysis of income categories can help credit card companies design gender-specific marketing strategies and better understand income patterns among their male and female customers.
The Bar Chart provides an insight into how transaction frequency varies across different education levels. Each bar in the chart represents an education level, and the length of the bar signifies the total number of transactions made by credit card holders within that category.
When the bar corresponding to the ‘Graduate’ education level is the longest, it implies that credit card holders with a graduate degree have the highest number of transactions. In contrast, a shorter bar for the ‘Doctorate’ education level suggests that customers educated up to doctorate degree have fewer transactions.
Understanding these patterns allows credit card companies to tailor their strategies and services to cater to various education level segments. They may consider offering different incentives or card benefits to stimulate increased transaction activity where it is lower.
The box-and-whisker plot reveals interesting patterns about the distribution of credit limits across various education levels. Customers who are ‘Uneducated’ have the widest range of credit limits, even exceeding those with higher levels of education, hinting at the possibility of other factors influencing credit limits. Interestingly, ‘Post-Graduate’ customers have a slightly lower median credit limit compared to other education levels, defying the expectation of a clear correlation between higher education and higher credit limits. On the other hand, ‘Graduate’ and ‘High School’ customers have similar median credit limits, suggesting these groups may have comparable financial behaviors or creditworthiness in the eyes of the credit card company. These insights emphasize the importance of multifaceted customer analysis and careful strategy design in the credit card industry.
The box-and-whisker plot offers insightful perspectives about the utilization of credit cards among different card categories.
‘Blue’ card holders exhibit the broadest range of utilization ratios, with the upper whisker reaching as high as 0.9990, indicating that some users of the ‘Blue’ card category have nearly fully utilized their credit limits. However, the median is relatively low at 0.2070, suggesting that half of ‘Blue’ card holders have a utilization ratio below this value.
In comparison, ‘Gold,’ ‘Platinum,’ and ‘Silver’ card categories show significantly lower maximum utilization ratios (upper whisker at 0.1290, 0.0930, and 0.1660, respectively). The median values for these categories are also much lower than that of the ‘Blue’ card category, indicating that these card holders generally utilize a smaller proportion of their credit limits.
It is notable that the ‘Blue’ card category, typically associated with a lower income segment, shows a higher utilization ratio compared to the premium ‘Gold,’ ‘Platinum,’ and ‘Silver’ card categories. This could imply different spending behaviors among card holders in different income brackets or differing strategies by the credit card company for different card categories. Understanding these differences is crucial for tailoring appropriate services and offers to each card category.
Part 4— Relationship and Credit Metrics
The scatter plot presents an inverse relationship between the credit limit and the average utilization ratio. It utilizes a power regression model, with the equation given by ‘Credit Limit = 1761.89 * Avg Utilization Ratio^-0.846276’. This equation indicates that as the average utilization ratio decreases, the credit limit tends to increase, and vice versa.
The R-Squared value is approximately 0.816, which suggests that the power model explains about 81.6% of the variability in the credit limit. The fact that the p-value is less than 0.0001 means that the model is highly statistically significant. In other words, the chances are minuscule that the observed relationship between the credit limit and the average utilization ratio happened due to randomness.
This analysis suggests that customers who use a smaller proportion of their credit limit tend to have higher credit limits. This could be an indicator of good financial management, where maintaining a low utilization ratio leads to increased credit trustworthiness and, consequently, higher credit limits. The credit card company might also be rewarding such behavior with higher credit limits to incentivize low utilization ratios. These insights can be used to develop strategies for encouraging responsible credit usage among cardholders.
The scatter plot illustrates a relatively weak positive correlation between a customer’s credit limit and their total revolving balance. As derived from the line of best fit (Total Revolving Bal = 0.0038103*Credit Limit + 1129.92), it suggests that as the credit limit increases, the total revolving balance also increases, albeit very slightly.
However, the R-squared value of 0.0018056 shows that only about 0.18% of the variance in the total revolving balance can be explained by the credit limit. This implies a relatively weak predictive relationship between the two variables.
The p-value is less than 0.0001, meaning the relationship is statistically significant, but given the low R-squared value, the practical significance may be limited.
This could suggest that while customers with higher credit limits tend to have slightly higher revolving balances, other factors not included in this analysis likely play a more substantial role in determining a customer’s revolving balance. These factors could include customer income level, spending habits, and their propensity to carry a balance from month to month.
The scatter plot illustrates a strong positive relationship between ‘Average Open to Buy’ and ‘Credit Limit’. The linear regression model, denoted as ‘Avg Open To Buy = 0.99619 * Credit Limit — 1129.92’, demonstrates that as the credit limit increases, the Average Open to Buy also tends to increase.
The R-Squared value is approximately 0.992, indicating that this linear model accounts for about 99.2% of the variance in ‘Average Open to Buy.’ This suggests that the credit limit is a significant predictor of the Average Open to Buy. Additionally, the p-value is less than 0.0001, which denotes a high level of statistical significance, indicating that the chances are extremely small that the observed relationship occurred due to random chance.
This strong positive relationship implies that as credit card companies increase the credit limit for their customers, the customers’ spending power or the amount they can spend before reaching their credit limit also tends to increase significantly. These insights can help credit card companies better understand how adjustments to credit limits can directly influence their customers’ purchasing abilities.
The scatter plot illustrates an inverse U-curve relationship between the duration of a customer’s relationship with the bank (Months on Book) and their Credit Limit.
Initially, it appears that as the length of the relationship increases, so does the credit limit. This is likely due to trust and reliability being built over time, encouraging the bank to offer higher credit limits to their longstanding customers. However, after reaching a certain point, further increases in the length of the relationship are associated with decreases in the credit limit.
This might be explained by various factors such as changes in a customer’s financial situation, reduced spending, or other factors that have led the bank to lower the credit limit. It could also reflect a cautious approach by the bank towards very long-term customers who might be seen as more likely to accumulate debt.
Reflection
Undertaking the Tableau data visualization project marked a significant development in the professional journey of the data analyst. The project entailed an in-depth analysis of bank data, with a specific focus on customer behavior relating to credit card usage. This not only refined the analyst’s ability to decipher complex data sets but also underscored the pivotal role of data-driven decision-making within the financial sector. The resulting insights offered valuable direction for banks to formulate strategies aimed at bolstering customer engagement and maximizing service utilization. This experience was profoundly enlightening, shedding light on the power of data in shaping strategic decisions. Moving forward, the analyst is better equipped to harness data effectively, contributing to more informed decision-making processes in future projects.
Link:
Kaggle Dataset: https://www.kaggle.com/datasets/sakshigoyal7/credit-card-customers
Tableau: https://public.tableau.com/app/profile/gordon.kwok/viz/Dashboardofcreditcardcustomers/Dashboard1