Need help with your Discussion

Get a timely done, PLAGIARISM-FREE paper
from our highly-qualified writers!

glass
pen
clip
papers
heaphones

MNIST Data Set and The Pima Data Set Machine Learning Questions

MNIST Data Set and The Pima Data Set Machine Learning Questions

MNIST Data Set and The Pima Data Set Machine Learning Questions

Description

For the MNIST data set and the Pima data set, you must create the following ML models and performthe below mentioned part 3,4,5(if needed 2) separately for all the models. Part 1 can be common. Themodels are as follows – 1. K-NN classifier from scratch that uses the training data to build a classifier, andevaluate and report on the classifier performance. 2. SVM Model 3. Decision Tree classifier 4. RandomForest Classifier Do NOT use machine learning packages for the KNN portion of this task.Youare only permitted to use existing tools for simple linear algebra. Rest for the other modelsyou can use scikit learn or any other ML package.

1. (10 points) Perform the analysis of the data and create some visualizations (for images, a few examplesfrom each category; for other data, perhaps some scatter plots or histograms that show a big pictureof the data).

2. (10 points-kNN,5 points-SVM,5 points-Decsion Tree, 5 points-Random Forest Classifier)Describe any data pre-processing or Feature Engineering that you did. Also discuss about the traintest split ratio and some of the hyper-parameters that you have tweaked.

3. (5 points-kNN,5 points-SVM,5 points-Decsion Tree, 5 points-Random Forest Classifier)Show the accuracy of your algorithm by using the Classification Metrics discussed in the class. Alsojustify your reason for using that metric. Your metrics should remain same for all your models andyou can use different metrics for different data sets. Sample metric can be as follows- -In the case ofthe Pima data set, show accuracy with tables showing false positive, false negative, true positive andtrue negatives. -In the case of the MNIST digits show the complete confusion matrix. Choose a singledigit to measure accuracy and show how that number varies as a function of K.In the case of the MNIST digits show the complete confusion matrix. Choose a single digit to measureaccuracy and show how that number varies as a function of K.

4. (5 points) Describe the run-time of your algorithms and also share the actual “wall-clock” time thatit took to compute your results.

5. (10 points) Describe the impact of imbalanced data set,presence of outliers and missing values foreach of the ML algorithm used by you. And discuss if these factors have played any role in youranalysis

Please use the Portuguese data set (student-por.csv) in the provided link for this assignment. This dataset contains 649 instances and 30 features. Create the following ML models- 1.a linear regression model fromscratch 2. SVM model 3. Random Forest model 4. Decision Tree model and use it on this data set to predictthe value for the final variable G3, the final grade for each student.Part 1 can be same for all the models butpart 2 and part 3 needs to be separate for each model Do NOT use machine learning packages forthe Linear Regression portion of the assignment. You are only permitted to use existing toolsfor simple linear algebra. For the other 3 models which are SVM,Decision Tree and RandomForest you can use the ML packages and libraries.

1. (10 points) Some of the variables in this data set are categorical and some of them are numeric. Howcan we encode the categorical variables for the linear regression process? Please describe your approachto encoding categorical values and apply it to the data set in your code.

2. (10 points-Linear Reg.,5 points-SVM,5 points-Decsion Tree, 5 points-Random Forest)Experiment by using different groups of features during training. What features work well in predictinga student’s final score? What features work poorly? Why might you use or not use certain features?Calculate mean squared error scores for your ML models using at least two different groups of features,and compare the performance of the feature groups with each other.

3. (5 points-Linear Reg.,5 points-SVM,5 points-Decsion Tree, 5 points-Random Forest) Per-form linear regression using all available features. Use mean squared error or any other metric to reportthe ability of your model to fit to the data and justify your choice. How does this approach compareto the groups of features you selected?

Data sets: The project will explore three data sets, the famous MNIST data set of pictures of handwrittennumbers, a data set that explores the prevalance of diabetes in a Native American tribe named the Pima,and a data set that examines student achievement in secondary education in two Portuguese schools. Youcan access the data sets here:1. https://www.kaggle.com/c/digit-recognizer/data2. https://www.kaggle.com/uciml/pima-indians-diabetes…3. https://archive.ics.uci.edu/ml/machine-learning-da…

Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."

Order Solution Now

Our Service Charter


1. Professional & Expert Writers: Eminence Papers only hires the best. Our writers are specially selected and recruited, after which they undergo further training to perfect their skills for specialization purposes. Moreover, our writers are holders of masters and Ph.D. degrees. They have impressive academic records, besides being native English speakers.

2. Top Quality Papers: Our customers are always guaranteed of papers that exceed their expectations. All our writers have +5 years of experience. This implies that all papers are written by individuals who are experts in their fields. In addition, the quality team reviews all the papers before sending them to the customers.

3. Plagiarism-Free Papers: All papers provided by Eminence Papers are written from scratch. Appropriate referencing and citation of key information are followed. Plagiarism checkers are used by the Quality assurance team and our editors just to double-check that there are no instances of plagiarism.

4. Timely Delivery: Time wasted is equivalent to a failed dedication and commitment. Eminence Papers are known for the timely delivery of any pending customer orders. Customers are well informed of the progress of their papers to ensure they keep track of what the writer is providing before the final draft is sent for grading.

5. Affordable Prices: Our prices are fairly structured to fit in all groups. Any customer willing to place their assignments with us can do so at very affordable prices. In addition, our customers enjoy regular discounts and bonuses.

6. 24/7 Customer Support: At Eminence Papers, we have put in place a team of experts who answer all customer inquiries promptly. The best part is the ever-availability of the team. Customers can make inquiries anytime.

We Can Write It for You! Enjoy 20% OFF on This Order. Use Code SAVE20

Stuck with your Assignment?

Enjoy 20% OFF Today
Use code SAVE20