Designer: Recommenders

Azure ML Studio can generate not only traditional supervised ML models, but also recommender models which are useful for a variety of tasks. Recommenders are used for recommending videos, upselling online shoppers on additional products, ordering news feeds and social media feeds, recommending job candidates to companies and companies to job candidates, recommending drugs and treatments for patients, financial products to investors, and many other scenarios.

Assuming you have already covered the prior chapters teaching the theory behind recommendation algorithms, we will forgo the background behind recommenders and simply cover the primary two categories of recommendation available in Azure ML Studio below.

Collaborative Filtering: SVD Recommender

Let's learn how to implement content filtering through AMLS Designer's SVD Recommender pills. The first thing to learn is that the data used for content filtering needs to be in a different format. Rather than using the consumer as the unit of analysis, we use the instance of a movie being consumed, an instance of a restaurant being visited, an instance of an item appearing in a shopping cart, etc. Think, for example, of taking all of a store's receipts and taping them together so that you have a list of items appearing in a cart with a unique ID for the consumer and the product in each row. Therefore, the same consumer and the same product will appear many times in this list. In summary, bare minimum three data fields needed for this analysis is a customerID-productID-rating triple.

Table 23.1
Recommender Data Fields
customer item rating
Homer Football 5
Homer Baseball 2
Homer Basketball 3
Marge Football 1
Marge Baseball 3
Marge Basketball 5
Bart Football 3
Lisa Football 5
Lisa Baseball 2
Lisa Basketball 1
Lisa Soccer 5
Maggie Football 4

Follow along with AMLS Designer: SVD Recommender, Movies example to re-create the movies recommender experiment in AMLS Designer. It will cover several new AMLS Designer features: , , , ,

AMLS Designer: SVD Recommender, Movies example

Let's summarize what we learned. First, at a minimum, a recommender algorithm like AMLS Designer's SVD Recommender requires three data fields on each row. First, there must be an identifier for the group. This is typically a customer ID of some sort so that all ratings or purchases can be grouped by the customer that made them. However, this could also be an order ID if a customer ID isn't available. An order ID also represents a grouping because many items can appear in a single order. Second, there must be some sort of product ID. This could represent a physical product, but also a movie, restaurant, or something else. Finally, there must be an indicator of the level of interest in that product such as a rating. However, this same SVD Recommender technique can be used to predict purchase volume by including a quantity ordered rather than a rating.

Train SVD Recommender pill

The SVD recommender algorithm is based on matrix factorization as described in the prior section. The concept of singular value decomposition (SVD) is a well-established technique for identifying latent (i.e. unmeasurable) semantic (i.e. study of word and phrase meaning) factors. Algorithms like stochastic gradient descent and alternating least squares (ALS) are commonly used to measure the differences among pairs of vectors. Koren and Volinsky (2009) outline the mathetical details of how AMLS's Train SVD Recommender pill works. We will not get into any more detail on the SVD recommender in this book. All we need to understand are the relevant parameters below.

  1. Number of factors: Each factor measures how much the user is relating with the item. The number of factors is also the dimensionality of latent factor space. With the number of users and items increasing, it's better to set a larger number of factors. But if the number is too large, performance might drop.

  2. Number of recommendation algorithm iterations indicates how many times the algorithm should process the input data. The more iterations, the more accurate the predictions are. However, more iterations also means slower training. The default value is 30.

  3. Learning rate varies between 0.0 and 2.0. The learning rate determines the size of the step at each iteration. If the step size is too large, you might overshoot the optimal solution. If the step size is too small, training takes longer to find the best solution.

Score SVD Recommender pill

  • Recommender prediction kind

    • Rating Prediction: This is used only to generate model fit statistics for the recommender model. It generates a scored rating prediction for every item that was rated by each user and compares those ratings to the actual ratings given. Mean absolute error (MAE) and root mean squared error (RMSE) can then be calculated to provide a practical assessment of how well the model performs. As in other contexts, high numbers for both MAE and RMSE indicate lower quality models. Importantly, this setting would never be used in production to recommend movies. It is only for model evaluation.

    • Item Recommendation: This is used for both model evaluation and in production to make recommendations for future data depending on the Recomend item selection chosen in conjuntion with this option. See details below.

    • Related Items (may be in a future release)

    • Related Users (may be in a future release)

  • Recommended item selection

    • From All Items: Select this option to generate recommendations based on all items including those that the user has already rated in our dataset. This is common for restaurant ratings where customers frequently enjoy dining at the same location repeatedly. The idea is that a customer would have to try every single menu item at a restaurant before they can determine with certainty exactly how much the prefer a restaurant.

    • From Rated Items (for model evaluation): Select this option ONLY to generate the NDCG score for model evaluation purposes. This is similar to the Rating Prediction option above that is only used to generate MAE and RMSE fit metrics. You would NOT use this model in production. Even if there was a use case where you would only recommend items that have already been rated, there would be no need to generate a rating prediction since you already have their preference recorded.

    • From Unrated Items (to suggest new items to users): Select this option to generate recommendations based on unrated items only. This is common for movie ratings where customers only need recommendations for items they haven't purchased before. The idea is that customers who have watched a given movie have already experienced the product in its entirety (unlike visiting a restaurant once) and know with greater certainty whether they want to experience the product again.

  • Maximum number of items to recommend: enter the number or recommendations you want returned

  • Whether to return the predicted ratings: Select True/False dending on whether yout want the predicted rating (in the range of the training data's ratings) along with the rand-ordered recommendations

Content Filtering: Wide and Deep Recommender

The AMLS Designer doesn't provide an option for strictly content-based filtering alone (yet). Rather, it provides what it calls a "wide and deep" recommender option which is a hybrid of collaborative- and content-based filtering. The video below will walk you through the technique using Microsoft's restaurant data provided with AMLS Designer.

AMLS Designer: Wide and Deep Recommender Restaurant Example

Train Wide and Deep Recommender

The Train Wide and Deep Recommender has several parameters that we don't get in the SVD Recommender that are worth modifying to optimize the model. Here is a review of the options:

  • Epochs: the number of times the algorithm should process the whole training data. The higher this number, the more adequate the training; however, training costs more time and may cause overfitting.

  • Batch size: the number of training examples utilized in one training step. This hyperparameter can influence the training speed. A higher batch size leads to a less time cost epoch, but may increase the convergence time. And if batch is too large to fit GPU/CPU, a memory error may raised.

  • Wide part optimizer: the optimizer to apply gradients to the wide part of the model.

  • Wide optimizer learning rate: a number between 0.0 and 2.0 that defines the learning rate of wide part optimizer. This hyperparameter determines the step size at each training step while moving toward a minimum of loss function. A large learning rate may cause learning jump over the minima, while a too small learning rate may cause convergence problem.

  • Crossed feature dimension: type the dimension by entering the desired user IDs and item ID features. The Wide and Deep recommender performs cross-product transformation over user ID and item ID features by default. The crossed result will be hashed according to this number to ensure the dimension.

  • Deep part optimizer: select one optimizer to apply gradients to the deep part of the model.

  • Deep optimizer learning rate: enter a number between 0.0 and 2.0 that defines the learning rate of deep part optimizer.

  • User embedding dimension: type an integer to specify the dimension of user ID embedding. The Wide and Deep recommender creates the shared user ID embeddings and item ID embeddings for both wide part and deep part.

  • Item embedding dimension: type an integer to specify the dimension of item ID embedding.

  • Categorical features embedding dimension: enter an integer to specify the dimensions of categorical feature embeddings. In deep component of Wide and Deep recommender, an embedding vector is learnt for each categorical feature. And these embedding vectors share the same dimension.

  • Hidden units: type the number of hidden nodes of deep component. The nodes number in each layer is separated by commas. For example, by type "1000,500,100", you specify the deep component has three layers, with the first layer to the last respectively has 1000 nodes, 500 nodes, and 100 nodes.

  • Activation function: select one activation function applied to each layer, the default is ReLU.

  • Dropout: enter a number between 0.0 and 1.0 to determine the probability the outputs will be dropped in each layer during training. Dropout is a regularization method to prevent neural networks from overfitting. One common decision for this value is to start with 0.5, which seems to be close to optimal for a wide range of networks and tasks.

  • Batch Normalization: select this option to use batch normalization after each hidden layer in the deep component. Batch normalization is a technique to fight internal covariate shift problem during networks training. In general, it can help to improve the speed, performance and stability of the networks.

Score SVD Recommender pill

The Score Wide and Deep Recommender pill has all of the same parameters as the Score SVD Recommender pill, but we will list them again for convenience:

  • Recommender prediction kind

    • Rating Prediction: This is used only to generate model fit statistics for the recommender model. It generates a scored rating prediction for every item that was rated by each user and compares those ratings to the actual ratings given. Mean absolute error (MAE) and root mean squared error (RMSE) can then be calculated to provide a practical assessment of how well the model performs. As in other contexts, high numbers for both MAE and RMSE indicate lower quality models. Importantly, this setting would never be used in production to recommend movies. It is only for model evaluation.

    • Item Recommendation: This is used for both model evaluation and in production to make recommendations for future data depending on the Recomend item selection chosen in conjuntion with this option. See details below.

    • Related Items (may be in a future release)

    • Related Users (may be in a future release)

  • Recommended item selection

    • From All Items: Select this option to generate recommendations based on all items including those that the user has already rated in our dataset. This is common for restaurant ratings where customers frequently enjoy dining at the same location repeatedly. The idea is that a customer would have to try every single menu item at a restaurant before they can determine with certainty exactly how much the prefer a restaurant.

    • From Rated Items (for model evaluation): Select this option ONLY to generate the NDCG score for model evaluation purposes. This is similar to the Rating Prediction option above that is only used to generate MAE and RMSE fit metrics. You would NOT use this model in production. Even if there was a use case where you would only recommend items that have already been rated, there would be no need to generate a rating prediction since you already have their preference recorded.

    • From Unrated Items (to suggest new items to users): Select this option to generate recommendations based on unrated items only. This is common for movie ratings where customers only need recommendations for items they haven't purchased before. The idea is that customers who have watched a given movie have already experienced the product in its entirety (unlike visiting a restaurant once) and know with greater certainty whether they want to experience the product again.

  • Maximum number of items to recommend: enter the number or recommendations you want returned

  • Whether to return the predicted ratings: Select True/False dending on whether yout want the predicted rating (in the range of the training data's ratings) along with the rand-ordered recommendations

Deploying Recommenders

Deploying recommender models as endpoints is very similar to deploying regression and classification models. However, there are a couple of key differences that need to be pointed out in the video below to help you succeed:

Azure ML Recommender Assessment

The embedded activity could not be inserted. (g2569d856b0680001x2)
Click here to view a list of available activities.