axpy_batch

Computes a group of vector-scalar products added to a vector.

Syntax

Group API

event axpy_batch(queue &exec_queue, int64_t *n_array, T *alpha_array, const T **x_array, int64_t *incx_array, T **y_array, int64_t *incy_array, int64_t group_count, int64_t *group_size_array, const vector_class<event> &dependencies = {})

Strided API

event axpy_batch(queue &exec_queue, int64_t n, T alpha, const T *x, int64_t incx, int64_t stridex, T *y, int64_t incy, int64_t stridey, int64_t batch_size, const vector_class<event> &dependencies = {})
void axpy_batch(queue &exec_queue, int64_t n, T alpha, buffer<T, 1> &x, int64_t incx, int64_t stridex, buffer<T, 1> &y, int64_t incy, int64_t stridey, int64_t batch_size)

axpy_batch supports the following precisions and devices.

T

Devices Supported

float

Host, CPU, and GPU

double

Host, CPU, and GPU

std::complex<float>

Host, CPU, and GPU

std::complex<double>

Host, CPU, and GPU

Description

The axpy_batch routines perform a series of scalar-vector product added to a vector. They are similar to the axpy routine counterparts, but the axpy_batch routines perform vector operations with a groups of vectors.

For the group API, each group contains vectors with the same parameters (size and increment). The operation for the group API is defined as

idx = 0
for i = 0 … group_count – 1
    n, alpha, incx, incy and group_size at position i in n_array, alpha_array, incx_array, incy_array and group_size_array
    for j = 0 … group_size – 1
        x and y are vectors of size n at position idx in x_array and y_array
        y := alpha * x + y
        idx := idx + 1
    end for
end for

The number of entries in x_array, and y_array is total_batch_count = the sum of all of the group_size entries.

For the strided API, all vector x (respectively, y) have the same parameters (size, increments) and are stored at constant stridex (respectively, stridey) from each other. The operation for the strided API is defined as

For i = 0 … batch_size – 1
    X and Y are vectors at offset i * stridex and i * stridey in x and y
    Y = alpha * X + Y
end for

Input Parameters

Group API

exec_queue

The queue where the routine should be executed.

n_array

Array of size group_count. For the group i, ni = n_array[i] is the number of elements in vectors x and y.

alpha_array

Array of size group_count. For the group i, alphai = alpha_array[i] is the scalar alpha.

x_array

Array of size total_batch_count of pointers used to store x vectors. The array allocated for the x vectors of the group i must be of size at least (1 + (ni – 1)*abs(incxi)). See Matrix and Vector Storage for more details.

incx_array

Array of size group_count. For the group i, incxi = incx_array[i] is the stride of vector x.

y_array

Array of size total_batch_count of pointers used to store y vectors. The array allocated for the y vectors of the group i must be of size at least (1 + (ni – 1)*abs(incyi)). See Matrix and Vector Storage for more details.

incy_array

Array of size group_count. For the group i, incyi = incy_array[i] is the stride of vector y.

group_count

Number of groups. Must be at least 0.

group_size_array

Array of size group_count. The element group_size_array[i] is the number of vector in the group i. Each element in group_size_array must be at least 0.

dependencies

List of events to wait for before starting computation, if any. If omitted, defaults to no dependencies.

Strided API

exec_queue

The queue where the routine should be executed.

n

Number of elements in vectors x and y.

alpha

Specifies the scalar alpha.

x

Buffer or USM pointer accessible by the queue’s device holding all the input vector x. The buffer or allocated memory must be of size at least batch_size * stridex.

incx

Stride between two consecutive elements of the x vectors.

stridex

Stride between two consecutive x vectors, must be at least (1 + (n-1)*abs(incx)). See Matrix and Vector Storage for more details.

y

Buffer or USM pointer accessible by the queue’s device holding all the input vectors y. The buffer or allocated memory must be of size at least batch_size * stridey.

incy

Stride between two consecutive elements of the y vectors.

stridey

Stride between two consecutive y vectors, must be at least (1 + (n-1)*abs(incy)). See Matrix and Vector Storage for more details.

batch_size

Number of axpy computations to perform and x and y vectors. Must be at least 0.

Output Parameters

Group API

y_array

Array of pointers holding the total_batch_count updated vector y.

Strided API

y

Array or buffer holding the batch_size updated vector y.