Welcome to our first competition of the 2022-2023 school year! The data given to you contains 79 variables describing almost every aspect of homes in Ames, Iowa. In this competition, you are tasked to use machine learning to predict the final price of each home. This is a regression task.
The dataset can be found in Google Classroom. Please contact the AI & Machine Learning club board if you need the code to join.
Winners
Position | Name(s) | RMSE |
---|---|---|
1 | Sameer Khan | 0.13239 |
2 | Alexander Tanton | 0.13295 |
3 | Benjamin Kleyner | 0.20802 |
4 | Samuel Fishburn, Nate Case, Aivan Byrnes | 0.24017 |
5 | Akhil Kodadhala, Justin Han, Anoop Kodadhala, Ben Cao, Anuj Pannala | 0.24383 |
Honorable Mention | Nirav Eati, Oliver Alexander | 0.47837 |
Evaluation: The Root Mean Squared Error (RMSE) between the logarithm of the predicted and logarithm of the actual value will be calculated. Your goal is to minimize this error. Submissions with the lowest error will receive a higher ranking.
Submission:
Your program must predict on the testing data and generate a CSV (
comma-separated values) file with the predictions. Use Google Classroom to submit your code and submission file. We will run your code on the testing data and will use the generated CSV file to determine the score.
Sample csv file with predictions
Starter Code
Loading/Reading the Dataset
import pandas as pd df = pd.read_csv("train.csv") test_df = pd.read_csv("test.csv") # print the first few rows print(df.head()) # create features list features = ["LotArea"] # create dataframe with the selected features X_train = df[features] y_train = df["SalePrice"] X_test = test_df[features] # The next steps are to create and train your machine learning model.
Generating CSV File with Predictions
# Run this code after creating your model # Imports should have already been done predictions = model.predict(X_test) # Create the rest of the prediction lists with the other models. output_df = pd.DataFrame({ "Id" : test_df['Id'], "SalePrice" : predictions }) output_df.to_csv("submission.csv", index=False) # when index = True, an extra column with indices is added. That is an incorrect format for this competition.