Site icon Tutor Bin

Kaggle Competition Using Regression Models in R Worksheet

Kaggle Competition Using Regression Models in R Worksheet

Description

Description

What makes some songs become popular? The dataset describes popular songs based on auditory features such as loudness and tempo.

Goal

Construct a model using a dataset of popular songs to predict ratings based on auditory features of the songs included in scoringData.csv. (You may use linear regression, logistic regression, feature selection e.g. Lasso, decision tree or advance tree)

Metric

Submissions will be evaluated based on RMSE (root mean squared error). Lower the RMSE, better the model.

Submission File

The submission file should be in text format (.csv) with only two columns, id and rating. The rating column must contain predicted rating. The number of decimal places to use is up to you. The file should contain a header and have the following format:

"id","rating"
50400,37.3065
96747,37.1732
1824,36.9784
67597,36.9780
86944,36.8176
85423,37.0173

An example submission file (example_submission.csv) is shared with the set of files under Data.

Sample Code

Here is an illustration in R of how you can create a model, apply it to scoringData.csv to prepare a submission file.

# ensure analysisData.csv and scoringData.csv are in your working directory

# following code will read data and construct a simple model
songs = read.csv('analysisData.csv')
model = lm(rating~ tempo+time_signature,songs)

# read in scoring data and apply model to generate predictions
scoringData = read.csv('scoringData.csv')
pred = predict(model,newdata=scoringData)

# construct submission from predictions
submissionFile = data.frame(id = scoringData$id, rating = pred)
write.csv(submissionFile, 'sample_submission.csv',row.names = F)

* Disclaimer: This data is to be used solely for the purpose of the Project for this course. It is not recommended for any use outside of this competition.

Submission Count:

By the end of the competition, you must have at least 3 submissions. At least one must use a forest ranger model.

Attached File Descriptions:

  • analysisData.csv: Data for building a model
  • scoringData.csv: Use for applying predictions or scoring
  • example_submission.csv: Sample submission file in the desired format

Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."

Exit mobile version