Need help with your Discussion

Get a timely done, PLAGIARISM-FREE paper
from our highly-qualified writers!

glass
pen
clip
papers
heaphones

GCCCD Rapidminer Prediction Modeling Process Predictive Analytics

GCCCD Rapidminer Prediction Modeling Process Predictive Analytics

GCCCD Rapidminer Prediction Modeling Process Predictive Analytics

Description

ASSIGNMENT: Modifying a RapidMiner Prediction Modeling Process / Exploring & Interpreting Predictive Analytics

Answer the following questions based on your learning and understanding from this module’s videos and supporting Excel documents.

IMPORTANT:  
This assignment assumes you are STARTING from a completed RapidMiner process for the Meal Kit case.  That is, you’ve already completed the tutorial videos successfully, and you now have a RapidMiner process file that you can further modify to answer the following question prompts.  The completed process file you should have made is also available here (download it):

THE SOLUTION YOU ARE STARTING FROM WILL BE ATTACHED

PART 1:

Let’s imagine it has been 6 months since our MealKit company first built a prediction model for its “eco friendly” packaging.  The company wants to re-run the model with NEWER data, since it is concerned that the data from 6 months ago has become a little stale and not reflective of their current customers.

The MealKit company has provided two new data files to use:

  • New400_TrainMe_Data.csvThis is 400 NEW Customers who we marketed the “eco-friendly” packaging to, and we observed whether they did/didn’t subscribe
  • New1000_PredictMe_Data.csvThis is 1,000 NEW Customers who we’d like to PREDICT whether or not they’re likely to subscribe to our “eco-friendly” packaging option.

TASK: Using the same completed process file as was built in the Meal Kit tutorial videos, update both the Training & Test *.csv files with the new ones provided here.

Q1.A. – Run the process file.  Look at the “Lift Chart (Simple)” Results.  According to this result, if we select the first 40% of customers that we are MOST confident in them not subscribing, what % of ALL the non-subscribers will we have identified?

Q1. B.Now look at the ModelSimulator Results.  For someone who is a “Warm” lead, in Segment “2”, has Platform usage of 50, and a NPS_Score of 8, what is the % confidence we have that they WILL SUBSCRIBE and what % confidence do we have that they WON’T SUBSCRIBE?

Q1.C. – Now look at the Predictions made for Customer #3020 and #3030.  What is the Confidence % for each that they WON’T SUBSCRIBE?

PART 2:

RapidMiner can build all kinds of prediction models, not just Logistic Regression.  Let’s swap out our Logistic Regression operator and try a new prediction model type “Random Forest”.  We haven’t used Decision Trees yet, so this type of prediction model is unfamiliar to us. Nonetheless, we can still use it to make predictions!

To do this easily, simply:

  1. RIGHT-CLICK on the “Logistic Regression” Operator.
  2. Select “Replace Operator” from the menu.
  3. Then, move through the folders to find:
    Modeling ? Predictive ? Trees ? Random Forest
  4. The default settings should mostly be fine (check the screenshot below to verify), but you will need to set the Random Seed to “2005” to guarantee your results will match mine.
  5. IMAGE: Parameter Settings for the Random Forest Operator (screenshot) 
  6. Number of Trees = 100
    criterion = gain_ratio
    maximal depth = 10
    DO NOT check “apply pruning,” “apply prepruning,” or “random splits”
    CHECK “guess subset ratio”
    voting strategy = confidence vote
    CHECK “use local random seed” = 2005Aside from the local random seed, all the other parameters for Random Forest should be, by default, the same as you see here.

  7. TASK: With the Random Forest Operator swapped in and parameters set, Re-run your process file.  Let’s inspect the results.
  8. Q2.A. – Run the process file.  Look at the “Lift Chart (Simple)” Results.  According to this result, if we select the first 40% of customers that we are MOST confident in them not subscribing, what % of ALL the non-subscribers will we have identified?
  9. Q2. B.Now look at the ModelSimulator Results.  For someone who is a “Warm” lead, in Segment “2”, has Platform usage of 50, and a NPS_Score of 8, what is the % confidence we have that they WILL SUBSCRIBE and what % confidence do we have that they WON’T SUBSCRIBE?
  10. Q2.
  11. C. – Now look at the Predictions made for Customer #3020 and #3030.  What is the Confidence % for each that they WON’T SUBSCRIBE?
  12. QUESTION Q2.D. [critical thinking] Notice that the prompts I asked you in questions A. , B., and C. for Q1 and Q2 were the same.  Were your ANSWERS the same?  If you used the same DATA to make your predictions, what could possibly account for any differences?  Answer using no more than 2-3 sentences.PART 3: [“guardrails” off!]
    The MealKit company has given us two NEW datafiles, these files contain all the same information we have previously used, but there are some new additional columns of information as well:

    • New2000_EvenMoreInfo_TrainMe.csvThis is 2,000 NEW Customers who we marketed the “eco-friendly” packaging to, and we observed whether they did/didn’t subscribe.  Plus new information about the customers we haven’t had access to previously.
    • New500_EvenMoreInfo_PredictMe.csvThis is 500 NEW Customers who we’d like to PREDICT whether or not they’re likely to subscribe to our “eco-friendly” packaging option. Plus new information about the customers we haven’t had access to previously.
    • You should inspect these new files (you can open them in Excel or read them into RapidMiner and check the summary statistics).  There are some new columns of information, explained below:
  13. Twitter: Whether or not the customer is known to have an active Twitter account.
    Instagram: Whether or not the customer is known to have an active Instagram account.
    Acquisition: The way the customer reported they first heard about our company (PodcastPromo, FriendReferral, DirectMail, None, and Other)
    Refunds: Whether this customer has ever requested a refund for a meal kit they were displeased with (None, OneRefund,  MoreThanOneRefund)Tip:  Since we have more columns of information, you’ll have to update the “data set meta data information” for both of the READ CSV operators!  You need to do this yourself, I don’t have any screenshots or step-by-step instructions this time!
  14. TASK: Using any prediction model of your choice (Logistic Regression, RandomForest, or any other model RapidMiner will allow you to use, if you’re feeling it!), build a prediction model.  However, management says they want to keep the model (relatively) simple, so you aren’t allowed to use any more than FOUR of the available predictors in the dataset.  To make sure your prediction model uses no more than four of the available predictors, you’ll need to figure out how to use the “Select Attributes” Operator in RapidMiner (we didn’t cover this in the videos!) in order to reduce the number of predictors that are used in the prediction model.  Part of the challenge of this Task is to practice learning how to use Operators you are unfamiliar with.  Fortunately, the Help files in RapidMiner are very good, and some very modest Googling will provide your with tips and examples of how to use this Operator. Once you have set up your RapidMiner process file, you can explore / test as many different potential models as you wish, but you must settle on one final model.  Once you do so, report the following:
    • Describe the final prediction model and predictors you decided to use.  Briefly explain your rationale for how you chose your final prediction model and the particular predictors you selected to use.
    • Using available information, describe how well your model performs at predicting the outcome of interest.
    • Using available information, describe a few types of customers you think should be targeted by the company based on your results.

Have a similar assignment? "Place an order for your assignment and have exceptional work written by our team of experts, guaranteeing you A results."

Order Solution Now

Our Service Charter


1. Professional & Expert Writers: Eminence Papers only hires the best. Our writers are specially selected and recruited, after which they undergo further training to perfect their skills for specialization purposes. Moreover, our writers are holders of masters and Ph.D. degrees. They have impressive academic records, besides being native English speakers.

2. Top Quality Papers: Our customers are always guaranteed of papers that exceed their expectations. All our writers have +5 years of experience. This implies that all papers are written by individuals who are experts in their fields. In addition, the quality team reviews all the papers before sending them to the customers.

3. Plagiarism-Free Papers: All papers provided by Eminence Papers are written from scratch. Appropriate referencing and citation of key information are followed. Plagiarism checkers are used by the Quality assurance team and our editors just to double-check that there are no instances of plagiarism.

4. Timely Delivery: Time wasted is equivalent to a failed dedication and commitment. Eminence Papers are known for the timely delivery of any pending customer orders. Customers are well informed of the progress of their papers to ensure they keep track of what the writer is providing before the final draft is sent for grading.

5. Affordable Prices: Our prices are fairly structured to fit in all groups. Any customer willing to place their assignments with us can do so at very affordable prices. In addition, our customers enjoy regular discounts and bonuses.

6. 24/7 Customer Support: At Eminence Papers, we have put in place a team of experts who answer all customer inquiries promptly. The best part is the ever-availability of the team. Customers can make inquiries anytime.

We Can Write It for You! Enjoy 20% OFF on This Order. Use Code SAVE20

Stuck with your Assignment?

Enjoy 20% OFF Today
Use code SAVE20