# Extension API¶

The Extension API currently includes algorithms, iterators, and function object classes. The algorithms include segmented reduce, segmented scan and vectorized search algorithms. The iterators provided implement zip, transform, and permutation operations on other iterators, and also include a counting iterator and a discard iterator. The function object classes provide minimum, maximum and identity operations that may be passed to algorithms such as reduce or transform. The Extension API also includes an experimental implementation of range-based algorithms and the ranges required to use them.

## Parallel Algorithms¶

The definitions of the algorithms provided in the Extension API are available through the `oneapi/dpl/algorithm`

header. All algorithms are implemented in the `oneapi::dpl`

namespace.

reduce_by_segment

The `reduce_by_segment`

algorithm performs partial reductions on a sequence’s values and keys. Each
reduction is computed with a given reduction operation for a contiguous subsequence of values, which are
determined by keys being equal according to a predicate. A return value is a pair of iterators holding
the end of the output sequences for keys and values.

For correct computation, the reduction operation should be associative. If no operation is specified,
the default operation for the reduction is `std::plus`

, and the default predicate is `std::equal_to`

.
The algorithm requires that the type of the elements used for values be default constructible.

Example:

keys: [0,0,0,1,1,1] values: [1,2,3,4,5,6] output_keys: [0,1] output_values: [1+2+3=6,4+5+6=15]

inclusive_scan_by_segment

The `inclusive_scan_by_segment`

algorithm performs partial prefix scans on a sequence’s values. Each
scan applies to a contiguous subsequence of values, which are determined by the keys associated with the
values being equal. The return value is an iterator targeting the end of the result sequence.

For correct computation, the prefix scan operation should be associative. If no operation is specified,
the default operation is `std::plus`

, and the default predicate is `std::equal_to`

. The algorithm
requires that the type of the elements used for values be default constructible.

Example:

keys: [0,0,0,1,1,1] values: [1,2,3,4,5,6] result: [1,1+2=3,1+2+3=6,4,4+5=9,4+5+6=15]

exclusive_scan_by_segment

The `exclusive_scan_by_segment`

algorithm performs partial prefix scans on a sequence’s values. Each
scan applies to a contiguous subsequence of values that are determined by the keys associated with the values
being equal, and sets the first element to the initial value provided. The return value is an iterator
targeting the end of the result sequence.

For correct computation, the prefix scan operation should be associative. If no operation is specified,
the default operation is `std::plus`

, and the default predicate is `std::equal_to`

.

Example:

keys: [0,0,0,1,1,1] values: [1,2,3,4,5,6] initial value: [0] result: [0,0+1=1,0+1+2=3,0,0+4=4,0+4+5=9]

binary_search

The `binary_search`

algorithm performs a binary search of the input sequence for each of the values in
the search sequence provided. For each element of the search sequence the algorithm writes a boolean value
to the result sequence that indicates whether the search value was found in the input sequence. An iterator
to one past the last value in the result sequence is returned. The algorithm assumes the input sequence has
been sorted by the comparator provided. If no comparator is provided then a function object that uses
`operator<`

to compare the elements will be used.

Example:

input sequence: [0, 2, 2, 2, 3, 3, 3, 3, 6, 6] search sequence: [0, 2, 4, 7, 6] result sequence: [true, true, false, false, true]

lower_bound

The `lower_bound`

algorithm performs a binary search of the input sequence for each of the values in
the search sequence provided to identify the lowest index in the input sequence where the search value could
be inserted without violating the sorted ordering of the input sequence. The lowest index for each search
value is written to the result sequence, and the algorithm returns an iterator to one past the last value
written to the result sequence. If no comparator is provided then a function object that uses `operator<`

to compare the elements will be used.

Example:

input sequence: [0, 2, 2, 2, 3, 3, 3, 3, 6, 6] search sequence: [0, 2, 4, 7, 6] result sequence: [0, 1, 8, 10, 8]

upper_bound

The `upper_bound`

algorithm performs a binary search of the input sequence for each of the values in
the search sequence provided to identify the highest index in the input sequence where the search value could
be inserted without violating the sorted ordering of the input sequence. The highest index for each search
value is written to the result sequence, and the algorithm returns an iterator to one past the last value
written to the result sequence. If no comparator is provided then a function object that uses `operator<`

to compare the elements will be used.

Example:

input sequence: [0, 2, 2, 2, 3, 3, 3, 3, 6, 6] search sequence: [0, 2, 4, 7, 6] result sequence: [1, 4, 8, 10, 10]

## Iterators¶

The definitions of the iterators provided in the Extension API are available through the `oneapi/dpl/iterator`

header. All iterators are implemented in the `oneapi::dpl`

namespace.

counting_iterator

`counting_iterator`

is a random access iterator-like type whose dereferenced value is an integer
counter. Instances of `counting_iterator`

provide read-only dereference operations. The counter of an
`counting_iterator`

instance changes according to the arithmetics of the random access iterator type.

```
using namespace oneapi;
dpl::counting_iterator<int> count_a(0);
dpl::counting_iterator<int> count_b = count_a + 10;
int init = count_a[0]; // OK: init == 0
*count_b = 7; // ERROR: counting_iterator doesn't provide write operations
auto sum = std::reduce(dpl::execution::dpcpp_default,
count_a, count_b, init); // sum is (0 + 0 + 1 + ... + 9) = 45
```

discard_iterator

`discard_iterator`

is a random access iterator-like type that provides write-only dereference
operations that discard values passed.

The iterator is useful in the implementation of stencil algorithms where the stencil is not part of the
desired output. An example of this would be a `copy_if`

algorithm that receives an an input iterator range
and a stencil iterator range and copies the elements of the input whose corresponding stencil value is 1. We
do not want to declare a temporary allocation to store the copy of the stencil, and thus use `discard_iterator`

.

```
using namespace oneapi;
auto zipped_first = dpl::make_zip_iterator(first, stencil);
std::copy_if(dpl::execution::dpcpp_default,
zipped_first, zipped_first + (last - first),
dpl::make_zip_iterator(result, dpl::discard_iterator()),
[](auto t){return get<1>(t) == 1;}
```

transform_iterator

`transform_iterator`

is an iterator defined over another iterator whose dereferenced value is the result
of a function applied to the corresponding element of the original iterator. Both the type of the original
iterator and the unary function applied during dereference operations are required template parameters of
the `transform_iterator`

class. The constructor of the `transform_iterator`

receives both the original
iterator and an instance of the unary transform operation as well.

To simplify the construction of the iterator `oneapi::dpl::make_transform_iterator`

is provided. The
function receives the original iterator and transform operation instance as arguments, and constructs the
`transform_iterator`

instance.

```
using namespace oneapi;
dpl::counting_iterator<int> first(0);
dpl::counting_iterator<int> last(10);
auto transform_first = dpl::make_transform_iterator(first, std::negate<int>());
auto transform_last = transform_first + (last - first);
auto sum = std::reduce(dpl::execution::dpcpp_default,
transform_first, transform_last); // sum is (0 + -1 + ... + -9) = -45
```

permutation_iterator

`permutation_iterator`

is an iterator whose dereferenced value set is defined by the source iterator
provided, and whose iteration order over the dereferenced value set is defined by either another iterator or
a functor whose index operator defines the mapping from the `permutation_iterator`

index to the index of the
source iterator. The `permutation_iterator`

is useful in implementing applications where noncontiguous
elements of data represented by an iterator need to be processed by an algorithm as though they were contiguous.
An example is copying every other element to an output iterator.

`make_permutation_iterator`

is provided to simplify construction of iterator instances. The function
receives the source iterator and the iterator or function object representing the index map.

```
struct multiply_index_by_two {
template <typename Index>
Index operator()(const Index& i)
{ return i*2; }
};
// first and last are iterators that define a contiguous range of input elements
// compute the number of elements in the range between the first and last that are accessed
// by the permutation iterator
size_t num_elements = std::distance(first, last) / 2 + std::distance(first, last) % 2;
using namespace oneapi;
auto permutation_first = dpl::make_permutation_iterator(first, multiply_index_by_two());
auto permutation_last = permutation_first + num_elements;
std::copy(dpl::execution::dpcpp_default, permutation_first, permutation_last, result);
```

zip_iterator

`zip_iterator`

is an iterator constructed with one or more iterators as input. The value returned by the
iterator when dereferenced is a tuple of the values returned by dereferencing the member iterators on which
the `zip_iterator`

is defined. Arithmetic operations performed on a `zip_iterator`

instance are also
applied to each of the member iterators.

The `make_zip_iterator`

function is provided to simplify the construction of `zip_iterator`

instances.
The function receives each of the iterators to be held as member iterators by the `zip_iterator`

instance
it returns.

The example provided for `discard_iterator`

demonstrates `zip_iterator`

use in defining stencil
algorithms. The `zip_iterator`

is also useful in defining “by key” algorithms where input iterators
representing keys and values are processed as key-value pairs. The example below demonstrates a stable sort
by key where only the keys are compared but both keys and values are swapped.

```
using namespace oneapi;
auto zipped_begin = dpl::make_zip_iterator(keys_begin, vals_begin);
std::stable_sort(dpl::execution::dpcpp_default, zipped_begin, zipped_begin + n,
[](auto lhs, auto rhs) { return get<0>(lhs) < get<0>(rhs); });
```

## Function Object Classes¶

The definitions of the function objects provided in the Extension API are available through the
`oneapi/dpl/functional`

header. All function objects are implemented in the `oneapi::dpl`

namespace.

identity: A C++11 implementation of the C++20

`std::identity`

function object type, where the operator() returns the argument unchanged.minimum: A function object type where the operator() applies

`std::less`

to its arguments, then returns the lesser argument unchanged.maximum: A function object type where the operator() applies

`std::greater`

to its arguments, then returns the greater argument unchanged.

## Range-based API¶

C++20 indroduces the *Ranges* library. С++20 standard splits ranges into two categories: *factories* and *adaptors*.
A range factory doesn’t have underlying data. An element is generated on success by an index or by dereferencing an iterator.
A range adaptor, from the DPC++ library perspective, is an utility that transforms *base range*, or another adapted range into
a view with custom behavior.

The DPC++ library supports `iota_view`

range factory.

`sycl::buffer`

wrapped with `all_view`

can be used as the range.

The DPC++ library considers the supported factories and `all_view`

as base ranges.
The range adaptors may be combined into a pipeline with a `base`

range at the beginning. For example:

```
cl::sycl::buffer<int> buf(data, cl::sycl::range<1>(10));
auto range_1 = iota_view(0, 10) | views::reverse();
auto range_2 = all_view(buf) | views::reverse();
```

For the range, based on the `all_view`

factory, data access is permitted on a device only. `size()`

and `empty()`

methods are allowed
to be called on both host and device.

The following algorithms are available to use with the ranges:

`for_each`

, `copy`

, `transform`

, `find`

, `find_if`

, `find_if_not`

, `find_end`

, `find_first_of`

, `search`

, `is_sorted`

,
`is_sorted_until`

, `reduce`

, `transform_reduce`

, `min_element`

, `max_element`

, `minmax_element`

,
`exclusive_scan`

, `inclusive_scan`

, `transform_exclusive_scan`

, `transform_inclusive_scan`

.

The signature example of the range-based algorithms looks like:

```
template <typename ExecutionPolicy, typename Range1, typename Range2>
void copy(ExecutionPolicy&& exec, Range1&& source, Range2&& destination);
```

where `source`

is used instead of two iterators to represent the input. `destination`

represents the output.

These algorithms are declared in `oneapi::dpl::experimental::ranges`

namespace and implemented only for DPC++ policies.
In order to make these algorithm available the `<oneapi/dpl/ranges>`

header should be included.
Use of the range-based API requires C++17 and the C++ standard libraries coming with GCC 8.1 (or higher) or Clang 7 (or higher).

The following viewable ranges are declared in `oneapi::dpl::experimental::ranges`

namespace. Only those are allowed to use as ranges for range-based algorithms.

`iota_view`

. A range factory - generates a sequence of N elements which starts from an initial value and ends by final N-1.`all_view`

. A custom utility - represents a view of all or a part of`sycl::buffer`

underlying elements.`guard_view`

. A custom utility - represents a view of USM data range defined by a two USM pointers.`zip_view`

. A custom range adapter - produces one`zip_view`

from other several views.`transform_view`

. A range adapter - represents a view of a underlying sequence after applying a transformation to each element.`reverse_view`

. A range adapter - produces a reversed sequence of elements provided by another view.`take_view`

. A range adapter - produces a view of the first N elements from another view.`drop_view`

. A range adapter - produces a view excluding the first N elements from another view.

### Example of Range-based API usage¶

```
using namespace oneapi::dpl::experimental::ranges;
{
cl::sycl::buffer<int> A(data, cl::sycl::range<1>(max_n));
cl::sycl::buffer<int> B(data2, cl::sycl::range<1>(max_n));
auto view = all_view(A) | views::reverse();
auto range_res = all_view<int, cl::sycl::access::mode::write>(B);
copy(oneapi::dpl::execution::dpcpp_default, view, range_res);
}
```