A Technical Report on the Kaggle Titanic Dataset | by Bernard Worthy | Jun, 2024

INTRODUCTION

This report relies on the Titanic dataset from Kaggle(https://www.kaggle.com/c/titanic/data). The first intention of this technical report is to investigate this dataset and develop a predictive mannequin that predicts the survival price of passengers on the Titanic.

For this report, I used two python libraries to make my statement. I used the Pandas to learn, perceive and get insights from the information. I additionally used the Seaborn library to visualise the information.

From the Prolonged Information Diagram (EDD), I noticed that there are 11 columns within the dataset with 6 numerical columns and 5 categorical columns:

Numerical Information:

· PassengerId

· Survived

· Pclass

· Age

· Sibsp

· Parch

Categorical Information:

· Title

· Intercourse

· Ticket

· Cabin

· Embarked

OBSERVATION

By mere trying on the information, I used to be in a position to observe that, there have been 891 passengers on the titanic and the intercourse column is extremely associated to the Survived column as many of the survivors are ladies.

From the Prolonged information dictionary (EDD), I made the next observations:

Lacking Values:

The EDD returned a depend from the values of the columns and from that depend I used to be in a position to decide which columns had lacking values, they embody:

· Age

· Cabin

· Embarked

Doable Outliers:

I additionally observed potential outliers in some columns and this was due to the bounce in values between the seventy fifth and the one centesimal percentile. This was observed within the following columns

· Age

· Sibsp

· Parch

· Fare.

CONCLUSION

From the dataset, I noticed lacking values in a couple of columns and they are often handled by both changing the lacking values with the median or mode of the column. The imply may also be used to deal with it however there are possibilities of you having outliers within the columns.

I additionally observed outliers in sure columns and they are often handled by changing the outliers with the both the 0th or 99th percentile.

https://hng.tech/internship, https://hng.tech/hire

Source link

A Technical Report on the Kaggle Titanic Dataset | by Bernard Worthy | Jun, 2024

Working with Input-Convex Neural Networks part3(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

Embracing the Future: The Rise of AI-Driven Development in Software Engineering The software… | by DevBlogs | Jul, 2024

Research on Metaheuristic methods part4(Machine Learning 2024) | by Monodeep Mukherjee | Jul, 2024

How Real-Time Data Analytics and AI Are Transforming Heavy Equipment Operations

NVIDIA Accelerates Google Quantum AI Processor Design With Simulation of Quantum Device Physics

Game Development and Cloud Computing: Benefits of Cloud-Native Game Servers

Teradata AI Unlimited in Microsoft Fabric is Now Available for Public Preview through Microsoft Fabric Workload Hub

Cognigy Unveils Agentic AI: Transforming the Future of Enterprise Contact Centers

Our Picks

Photo: Language as a Colonial Tool — Understanding its Role in Power Dynamics — Felipe Castro QuilesHave You Realized? The Role of “Natural Language” in Colonization and Power Dynamics | by Felipe Castro Quiles | Jul, 2024

Classificador de Dígitos Manuscritos com PyTorch | by VINICIUS MOREIRA | Apr, 2024

Robot-packed meals are coming to the frozen-food aisle

Most Popular

Revolutionizing the Way We Find Love

Will GenAI Replace Data Engineers? No – And Here’s Why.

Assortment Optimization Machine Learning | by Danishaliarshar | Mar, 2024

A Technical Report on the Kaggle Titanic Dataset | by Bernard Worthy | Jun, 2024

Related Posts