Hello everyone,
This is my take on the binary classification of determining employees who are at a risk of termination or not.
It is a Binary Classification Problem. The tools used are:
-
Pandas for data manipulation and ingestion
-
Numpy for multidimensional array computing
-
Matplotlib and seaborn for data visualization
-
Word Cloud for geeting the most populare string
-
Imblearn for oversampling of the model
-
Scikit Learn for Data Preprocessing
Project Details:
Dataset Used:
https://www.kaggle.com/manasdalakoti/univai-hack-data
For modelling:
- Random Forest Classifier:
Accuracy Reached: 95.74%
- XG Boost Classifier:
Accuracy Reached: 93.17%
- Light Gradient Boosting:
Accuracy Reached: 91.10%
- Cat Boost classifier:
Accuracy Reached: 95.74%