Data Challenge #4 - Air Crashes

alice - April 16, 2021

This challenge will test your ability to clean data. Minimal preparation has been done to this data set, so as to give you a chance to deal with a more realistic set of formatting issues.

The following are data sets of total flights each year and a data set of air crashes.

Flight data


air crash data from 1908-2009

You will need to do a lot of cleaning with this data. 

Some hints:

The date is in american format, so you will need to separate out day month and year.

This will allow you to use year as a common variable between the two datasets.

You will need to add an index column, if you wish to count the number of air crashes.

To compare the total flights data with the crash data you will need to know how to unpivot columns in Power BI.

This video will tell you how.

(if you are using excel, do not worry about this part. Skip Q1, but answer all the others)

Make a line graph showing both the number of flights, and the number of crashes.

Q1 Does the trend in total number of flights explain the trend in the number of crashes?

Add two more columns to your crash data set that calculate the number of survivors and the survival rate.

Q2 Suppose a genie tells you that you will be in an air crash. You are allowed to choose what year you will be in the crash. What year should you choose to maximize your chance of survival?

Q3 Which month has the most crashes?

Q4 Where was the worst air disaster in aviation History?

Q5 How many people died?

Q6 How many people died in the air disaster with the most Survivors?

Q7 Which air operator has had the most fatalities, which has had second most?

Q8 What year was the most deadly for air travel?

Q9 What year produced the most survivors of air crashes?

Send your answers to lachlan.vaughantaylor@withyouwithme.com by May 14th.

Join our community

We have a Discord server where you’ll be able to chat with your instructors and cohort. Stay active in your learning!
Join discord