Bank Marketing — Analysis & Explanation

This document explains the operations performed on the bank marketing dataset and shows representative example outputs (tables and charts).

1) Load dataset — what this does

Reads the dataset into a tabular structure and presents a preview so you can verify columns, sample values, and basic shape.

Example preview (first 5 rows)

agejobmaritaleducationbalancey
30admin.marrieduniversity.degree1789no
34techniciansinglehigh.school0no
47blue-collarmarriedbasic.9y1506yes
22servicessinglehigh.school0no
58retiredmarriedilliterate214no

2) Check for missing values & data types — what this does

Counts missing entries per column and reports each column's data type so you can plan cleaning and type conversions.

Example missing-values summary

columnmissing
age0
job0
education2
balance0
y0

Example data types

columntype
ageinteger
jobcategorical
balancefloat
ycategorical (target)

3) Summary statistics — what this does

Computes descriptive statistics for numeric and categorical fields (count, mean, std, top categories, unique counts).

Example numeric summary (selected columns)

metricagebalance
count45214521
mean41.71362.3
std10.23045.1
min18-6847
25%3371
50%39448
75%501428
max95102127

4) Target distribution — what this does

Shows counts of the target classes (e.g., how many subscribed vs not). Useful to detect class imbalance and to plan sampling strategies.

classcountpercent
no398588%
yes53612%

5) Correlation matrix (numerical features) — what this does

Computes Pearson correlations between numeric fields to reveal linear relationships and potential multicollinearity.

Interpretation notes: high positive values (red) indicate strong positive correlation; negative values (blue) indicate inverse relationships. Use this info to decide feature selection or regularization.

6) Conclusion & recommended next steps