Kaggle Titanic Challenge

NIRAJ KUMAR
Oct 7, 2022
1 min read

Updated: Oct 7, 2022

Problem Statement: Use the Titanic passenger data (name, age, price of ticket, etc) to try to predict who will survive and who will die.

Data: The data is divided into three files as listed below:

Train.csv : Contains information of passengers (like passenger id, age, fare, name) on boarded titanic. It also contains information whether the passenger survived or not.
Test.csv : Apply model on test dataset to predict whether they survived or not
gender_submission.csv : Sample file for submission

View Code : Github

Mounting google drive and reading the files:

Data Preprocessing and Heat Map for data correlation:

Data Preprocessing step: checking for NA and Null value in training data set. Generated correlation heat map to find the features having positive correlation value.

Creating new data frame from existing and removing the NA, NULL values in Test Dataset with mean value for the column. Fare attribute has positive correlation value.

Pattern Analysis:

Based on gender_submission.csv, Majority of female passengers survived and only a small fraction of male passengers survived as depicted in below.

Models Used:

The models listed below were used for prediction of survival of passengers in Titanic. Different features and hyper-parameters were used to increase the accuracy of the predictions.

Random Forest model
K-Neighbors
SVM

Model and their Accuracy : Rows highlighted in light green have higher accuracy than base accuracy of 0.77511

Model	Features	Hyper Parameters	Score
RandomForest	["Pclass", "Sex", "SibSp", "Parch"]	n_estimators=100, max_depth=5, random_state=1	0.77511
RandomForest	["Pclass", "Sex", "SibSp", "Parch"]	n_estimators=1, max_depth=1, random_state=1	0.76555
RandomForest	["Pclass", "Sex", "SibSp", "Parch","Fare"]	n_estimators=20, max_depth=5, random_state=1	0.66746
RandomForest	["Pclass", "Sex", "SibSp", "Parch","Fare"]	n_estimators=100, max_depth=16, random_state=2	0.77272
RandomForest	["Pclass", "Sex", "SibSp", "Parch"]	n_estimators=100, max_depth=16, random_state=2	0.78229
SVM	["Pclass", "Sex", "SibSp", "Parch"]	default	0.77751
SVM	["Pclass", "Sex", "SibSp", "Parch","Fare"]	default	0.66746
K Neighbors	["Pclass", "Sex", "SibSp", "Parch"]	default	0.77751
K Neighbors	["Pclass", "Sex", "SibSp", "Parch","Fare"]	n_neighbors=10,algorithm='kd_tree'	0.71052

Kaggle Submission:

References:

Kaggle Titanic Challenge

Recent Posts

Comentários

Get in Touch