Implicit Alternating Least Squares

The library provides the Implicit Alternating Least Squares (implicit ALS) algorithm [Fleischer2008], based on collaborative filtering.

Details

Given the input dataset \(R=\left\{{r}_{ui}\right\}\) of size \(m \times n\), where m is the number of users and n is the number of items, the problem is to train the Alternating Least Squares (ALS) model represented as two matrices: \(X\) of size \(m \times f\), and \(Y\) of size \(f \times n\), where \(f\) is the number of factors. The matrices \(X\) and \(Y\) are the factors of low-rank factorization of matrix \(R\):

\[R\approx X\cdot Y\]

Initialization Stage

Initialization of the matrix Y can be done using the following method: for each \(i = 1, \ldots, n\) \({y}_{1i}=\frac{1}{m}\sum _{u=1}^{m}{r}_{ui}\) and \(y_{ki}\) are independent random numbers uniformly distributed on the interval \((0,1)\), \(k = 2, \ldots, f\).

Training Stage

The ALS model is trained using the implicit ALS algorithm [Hu2008] by minimizing the following cost function:

\[\underset{{x}_{*},{y}_{*}}{\mathrm{min}}\underset{u,i}{\mathrm{\Sigma }}{c}_{ui}{\left({p}_{ui}-{x}_{u}^{T}{y}_{i}\right)}^{2}+\lambda \left(\underset{u}{\mathrm{\Sigma }}{n}_{{x}_{u}}{\|{x}_{u}\|}^{2}+\underset{i}{\mathrm{\Sigma }}{m}_{{y}_{i}}{\|{y}_{i}\|}^{2}\right),\]

where:

  • \({p}_{ui}\) indicates the preference of user u of item i:

    \[\begin{split}p_{ui} = \begin{cases} 1, & {r}_{ui}> \epsilon \\ 0, & {r}_{ui}\le \epsilon \end{cases}\end{split}\]
  • \(\epsilon\) is the threshold used to define the preference values. \(\epsilon = 0\) is the only threshold valu supported so far.

  • \({c}_{ui}=1+\alpha r_{ui}\), \(c_{ui}\) measures the confidence in observing \(p_{ui}\)

  • \(\alpha\) is the rate of confidence

  • \(r_{ui}\) is the element of the matrix \(R\)

  • \(\lambda\) is the parameter of the regularization

  • \({n}_{{x}_{u}}\), \({m}_{{y}_{i}}\) denote the number of ratings of user \(u\) and item \(i\) respectively

Prediction Stage

Prediction of Ratings

Given the trained ALS model and the matrix \(D\) that describes for which pairs of factors \(X\) and \(Y\) the rating should be computed, the system calculates the matrix of recommended ratings Res: \({res}_{ui}=\sum _{j=1}^{f}{x}_{uj}{y}_{ji}\), if \({d}_{ui}\ne 0\), \(u=1,\ldots,m\); \(i=1,\ldots n\).

Initialization

For initialization, the following computation modes are available:

Computation

The following computation modes are available:

Examples

Batch Processing:

Distributed Processing:

Note

There is no support for Java on GPU.

Batch Processing:

Distributed Processing:

Batch Processing:

Performance Considerations

To get the best overall performance of the implicit ALS recommender:

  • If input data is homogeneous, provide the input data and store results in homogeneous numeric tables of the same type as specified in the algorithmFPType class template parameter.

  • If input data is sparse, use CSR numeric tables.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804