Welcome to our first competition! In this competition you will be predicting values of air pollution measurements over time, based on basic weather information (temperature and humidity) and the input values of 5 sensors. The three labels to predict are target_carbon_monoxide, target_benzene, and target_nitrogen_oxides.
Files (can be found in club Google Drive folder):
Leaderboard
Position | Name(s) |
---|---|
1 | Benjamin Kleyner |
2 | Sameer Khan |
3 | Abhishek Shah |
Evaluation: The Root Mean Squared Logarithmic Error (RMSLE) will be calculated and averaged for the three columns. Your goal is to minimize this error. Submissions with the lowest error will receive a higher ranking.
Submission:
Your program must predict on the testing data and generate a csv file with the predictions. Use this form to submit your program. We will run your code on the testing data and will use the generated csv file to determine the score.
Sample csv file with predictions
Starter Code
Loading/Reading the Dataset
import pandas as pd df = pd.read_csv("train.csv") test_df = pd.read_csv("test.csv") #print the first few rows print(df.head()) #create features list features = ["feature1", "feature2", "..."] #Create dataframe with the selected features features_df = df[features] X_test = test_df[features] #Next steps are to create and train your machine learning model
Generating CSV File with Predictions
# Run this code after creating your model #Imports should have already been done carbon_monoxide_predictions = carbon_monoxide_model.predict(X_test) # Create the rest of the prediction lists with the other models... output_df = pd.DataFrame({ "date_time" : test_df.date_time, "target_carbon_monoxide" : carbon_monoxide_predictions, "target_benzene" : benzene_predictions, "target_nitrogen_oxides" : nitrogen_oxides_predictions }) output_df.to_csv("submission.csv", index=False) #when index = True, an extra column with indices is added. That is an incorrect format for this competition