Introduction
This analysis about the dataset of passengers that were onboard when the British Luxury Passenger Ship (named Titanic) sank on the 15th of April, 1912.
The purpose for this surface-level analysis is to have the statistical view of the age-range of the passengers on the ship.
The Titanic dataset has the following columns:
- PassengerId: This is the identifier for each passenger
- Survived: This informs us if a particular passenger survived (with value 1) or not (with value 0)
- Pclass: This is the ticket class, either upper (1st), middle (2nd) or lower (3rd)
- Age: The age of the passenger
- SibSp: The number of siblings for spouse the passenger boarded the ship with
- ParCh: The number of parent or children the passenger boarded the ship with.
- Fare: The amount the passenger paid for the trip
- Cabin: The cabin identifier for the passenger
- Embarked: The port the passenger onboards (either Cherbourg, Queenstown, Southampton)
Observation
Missing Data in the dataset
The columns with missing data are: Cabin, Age and Embarked with Cabin having the highest number of missing values with 687 and Embarked having the lowest with 2 while Age sits in the middle with 177.
All other columns don't have missing value(s).
The image below gives the description of each column
Since our purpose is the age of passengers, it can be seen that:
- the minimum age is 0.42
- the mean(average) age is 29.70
- the maximum age is 80
The below is the box plot of Survived against Age.
About HNG
HNG Internship is a remote internship aimed at providing real life experience on projects to interns all over the world. This is the link to the official website.
It also serves as a pool for skilled professional because the interns there would have experience with real life impacting projects, here is the hire link.
Top comments (1)