Green Level AI & Machine Learning Club

2022-2023 Competition 1 (Nov 2 - Dec 7)

Welcome to our first competition of the 2022-2023 school year! The data given to you contains 79 variables describing almost every aspect of homes in Ames, Iowa. In this competition, you are tasked to use machine learning to predict the final price of each home. This is a regression task.

The dataset can be found in Google Classroom. Please contact the AI & Machine Learning club board if you need the code to join.

train.csv contains the training data. There are 79 columns that you can use as features. Some columns contain missing data, but you are not required to use every variable to train your final model. The column “SalePrice” is what you are trying to predict.
test.csv contains the testing data. You are required to test your model on this data and generate a submission file.
sample_submission.csv shows a sample submission file in the correct format. Your submission file should look the same except for the predicted prices.
description.txt contains a list of columns and what they represent.

Winners
Due to high levels of participation, we decided to recognize 5 winning teams as well as an honorable mention.

Position	Name(s)	RMSE
1	Sameer Khan	0.13239
2	Alexander Tanton	0.13295
3	Benjamin Kleyner	0.20802
4	Samuel Fishburn, Nate Case, Aivan Byrnes	0.24017
5	Akhil Kodadhala, Justin Han, Anoop Kodadhala, Ben Cao, Anuj Pannala	0.24383
Honorable Mention	Nirav Eati, Oliver Alexander	0.47837

Evaluation: The Root Mean Squared Error (RMSE) between the logarithm of the predicted and logarithm of the actual value will be calculated. Your goal is to minimize this error. Submissions with the lowest error will receive a higher ranking.

Submission:
Your program must predict on the testing data and generate a CSV ( comma-separated values) file with the predictions. Use Google Classroom to submit your code and submission file. We will run your code on the testing data and will use the generated CSV file to determine the score.

Sample csv file with predictions

Starter Code

Loading/Reading the Dataset

                    import pandas as pd

                    df = pd.read_csv("train.csv")
                    test_df = pd.read_csv("test.csv")

                    # print the first few rows
                    print(df.head())

                    # create features list
                    features = ["LotArea"]

                    # create dataframe with the selected features
                    X_train = df[features]
                    y_train = df["SalePrice"]

                    X_test = test_df[features]

                    # The next steps are to create and train your machine learning model.

Generating CSV File with Predictions

                    # Run this code after creating your model

                    # Imports should have already been done

                    predictions = model.predict(X_test)

                    # Create the rest of the prediction lists with the other models.
                    output_df = pd.DataFrame({
                      "Id" : test_df['Id'],
                      "SalePrice" : predictions		
			})

                    output_df.to_csv("submission.csv", index=False) 
                    # when index = True, an extra column with indices is added. That is an incorrect format for this competition.