Alternating Least Squares with implicit feedback – The search for alphaPosted on 2016-12-04 by Mendel van 't Riet
So you want to build a recommendation engine with ALS… You search the internet for some example code in your language of choice… You copy paste the code and start tweaking… But then you realize that your data is different from all the examples you found online. You don’t have explicit ratings in some range from 1 to 10; instead, you have click events where 1 means ‘clicked’. Will you still be able to use ALS? And if so, how?
A brief recap on Collaborative Filtering and Alternating Least Squares
Collaborative Filtering is the technique of predicting the preference of a user by collecting the preferences of many users. The underlying assumption of the collaborative filtering approach is that if a person A has the same opinion as a person B on an issue, A is more likely to have B’s opinion on a different issue than to have the opinion on of a person chosen randomly (Wikipedia).
A preference takes the form of a (user, item, rating) triple. Collecting them yields a sparse matrix of known ratings of users for items . The task is to fill in the missing values. In the Latent Model approach to Collaborative Filtering, we do this by decomposing into a user matrix and an items matrix so that we can find the ‘hidden’ features of users and items. (In the case of movies one could think of these hidden features as genres.) By taking the product of the user matrix and item matrix, we can reconstruct the (complete) ratings matrix . Or for individual ratings:
To compute these factors, we will first randomly initialize and and iteratively recompute them by minimizing the loss function :
The first term in is the sum of squared errors and the second term is used for regularization. In order to minimize the loss function, we will take the derivatives with respect to and and solve for 0.
And for y:
Recomputing and can be done with Stochastic Gradient Descent, but this is a non-convex optimization problem. We can convert it into a set of quadratic problems, by keeping either or fixed while optimizing the other. In that case, we can iteratively solve and by alternating between them until the algorithm converges. This is Alternating Least Squares.
Unfortunately, you don’t have ratings, you have click events. And a ‘click’ event does not necessarily mean a user really likes the product; the user could be curious about the product for some other reason. Even when you are using ‘buy’ events you are not in the clear, because people buy gifts for other people all the time. Furthermore, the absence of a click event does not imply a dislike for a product . So can you still use ALS? Yes, you can still use ALS, but you have to take into account the fact that you have implicit ratings/feedback. Luckily, your preferred machine learning library shows there is an ‘implicit’ switch on the ALS interface and that there is an ‘alpha’ parameter involved as well.
So what is this character?
To understand alpha, we should go to the source, which is Hu et al. 2008 (1). He suggests to split each rating into a preference and a confidence level. The preference is calculated by capping the rating to a binary value.
The confidence level is defined as:
For a rating of 0 we would have a minimum confidence of 1 and if the rating increases, the confidence increases accordingly. The rate of increase is controlled by alpha. So alpha reflects how much we value observed events versus unobserved events. Factors are now computed by minimizing the following loss function L:
Now suppose that for a given rating that is very large, so that the squared residual is very large, then the rating has a big impact on our loss function. And it should! will be drawn towards the 0-1 range, which is a good thing, because we want to predict whether the event will occur or not (0 or 1).
Alternatively, suppose that for a given rating we have observed many events, and suppose also that our initial value is close to 0, so that the squared residual approximates 1, then the rating will still have quite some impact on our loss function, because our confidence is large. Again, this is a good thing, because in this case, we want to go towards 1.
If either the confidence level is low or the residual is low, there is not much impact on the loss function, so the update of and will be small.
Now that we have some background on this alpha, we can safely copy-paste the recommender engine code we found online and expand it so that it includes the alpha parameter. All that is left now is some extra parameter tuning and we are done! After that, we can run our final model on the test set and we can calculate the root mean squared error… Wait.. What..? Somehow, that metric just doesn’t feel right. Oh, well… Enough for today 🙂