Competition 1 (Feb 4 - Mar 4)





Welcome to our first competition! In this competition you will be predicting values of air pollution measurements over time, based on basic weather information (temperature and humidity) and the input values of 5 sensors. The three labels to predict are target_carbon_monoxide, target_benzene, and target_nitrogen_oxides.

Files (can be found in club Google Drive folder):

  • train.csv - training data, including the weather data, sensor data, and values for the 3 targets
  • test.csv - same format as train.csv



Leaderboard

Position Name(s)
1 Benjamin Kleyner
2 Sameer Khan
3 Abhishek Shah


Evaluation: The Root Mean Squared Logarithmic Error (RMSLE) will be calculated and averaged for the three columns. Your goal is to minimize this error. Submissions with the lowest error will receive a higher ranking.



Submission:
Your program must predict on the testing data and generate a csv file with the predictions. Use this form to submit your program. We will run your code on the testing data and will use the generated csv file to determine the score.



Sample csv file with predictions

Sample Submission


Starter Code

Loading/Reading the Dataset

                    import pandas as pd

                    df = pd.read_csv("train.csv")
                    test_df = pd.read_csv("test.csv")

                    #print the first few rows
                    print(df.head())

                    #create features list
                    features = ["feature1", "feature2", "..."]

                    #Create dataframe with the selected features
                    features_df = df[features]
                    X_test = test_df[features]

                    #Next steps are to create and train your machine learning model 
                  

Generating CSV File with Predictions

                    # Run this code after creating your model

                    #Imports should have already been done

                    carbon_monoxide_predictions = carbon_monoxide_model.predict(X_test)
                    # Create the rest of the prediction lists with the other models...

                    output_df = pd.DataFrame({
                      "date_time" : test_df.date_time,
                      "target_carbon_monoxide" : carbon_monoxide_predictions,
                      "target_benzene" : benzene_predictions,
                      "target_nitrogen_oxides" : nitrogen_oxides_predictions
                      })

                    output_df.to_csv("submission.csv", index=False) 
                    #when index = True, an extra column with indices is added. That is an incorrect format for this competition