Singular Value Decomposition

Singular Value Decomposition (SVD) is one of matrix factorization techniques. It has a broad range of applications including dimensionality reduction, solving linear inverse problems, and data fitting.


Given the matrix \(X\) of size \(n \times p\), the problem is to compute the Singular Value Decomposition (SVD) \(X = U \Sigma V^t\), where:

  • \(U\) is an orthogonal matrix of size \(n \times n\)

  • \(\Sigma\) is a rectangular diagonal matrix of size \(n \times p\) with non-negative values on the diagonal, called singular values

  • \(V_t\) is an orthogonal matrix of size \(p \times p\)

Columns of the matrices \(U\) and \(V\) are called left and right singular vectors, respectively.


The following computation modes are available:


Batch Processing:

Online Processing:

Distributed Processing:


There is no support for Java on GPU.

Batch Processing:

Online Processing:

Distributed Processing:

Batch Processing:

Online Processing:

Distributed Processing:

Performance Considerations

To get the best overall performance of singular value decomposition (SVD), for input, output, and auxiliary data, use homogeneous numeric tables of the same type as specified in the algorithmFPType class template parameter.

Online Processing

SVD in the online processing mode is at least as computationally complex as in the batch processing mode and has high memory requirements for storing auxiliary data between calls to the compute() method. On the other hand, the online version of SVD may enable you to hide the latency of reading data from a slow data source. To do this, implement load prefetching of the next data block in parallel with the compute() method for the current block.

Online processing mostly benefits SVD when the matrix of left singular vectors is not required. In this case, memory requirements for storing auxiliary data goes down from \(O(p \cdot n)\) to \(O(p \cdot p \cdot \text{nblocks})\).

Distributed Processing

Using SVD in the distributed processing mode requires gathering local-node \(p \times p\) numeric tables on the master node. When the amount of local-node work is small, that is, when the local-node data set is small, the network data transfer may become a bottleneck. To avoid this situation, ensure that local nodes have a sufficient amount of work. For example, distribute input data set across a smaller number of nodes.

Optimization Notice

Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804