Template Class BlockMergeSort¶
Defined in File block_merge_sort.hpp
Inheritance Relationships¶
Base Type¶
public hipcub::BlockMergeSortStrategy< KeyT, ValueT, BLOCK_DIM_X *BLOCK_DIM_Y *BLOCK_DIM_Z, ITEMS_PER_THREAD, BlockMergeSort< KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, BLOCK_DIM_Y, BLOCK_DIM_Z > >(Template Class BlockMergeSortStrategy)
Class Documentation¶
-
template<typename KeyT, int BLOCK_DIM_X, int ITEMS_PER_THREAD, typename ValueT = NullType, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1>
class hipcub::BlockMergeSort : public hipcub::BlockMergeSortStrategy<KeyT, ValueT, BLOCK_DIM_X * BLOCK_DIM_Y * BLOCK_DIM_Z, ITEMS_PER_THREAD, BlockMergeSort<KeyT, BLOCK_DIM_X, ITEMS_PER_THREAD, ValueT, BLOCK_DIM_Y, BLOCK_DIM_Z>>¶ The BlockMergeSort class provides methods for sorting items partitioned across a CUDA thread block using a merge sorting method.
This example can be easily adapted to the storage required by BlockMergeSort.
- Overview
BlockMergeSort arranges items into ascending order using a comparison functor with less-than semantics. Merge sort can handle arbitrary types and comparison functors, but is slower than BlockRadixSort when sorting arithmetic types into ascending/descending order.
- A Simple Example
The code snippet below illustrates a sort of 512 integer keys that are partitioned across 128 threads * where each thread owns 4 consecutive items.
#include <hipcub/hipcub.hpp> // or equivalently <hipcub/block/block_merge_sort.hpp> struct CustomLess { template <typename DataType> __device__ bool operator()(const DataType &lhs, const DataType &rhs) { return lhs < rhs; } }; __global__ void ExampleKernel(...) { // Specialize BlockMergeSort for a 1D block of 128 threads owning 4 integer items each typedef hipcub::BlockMergeSort<int, 128, 4> BlockMergeSort; // Allocate shared memory for BlockMergeSort __shared__ typename BlockMergeSort::TempStorage temp_storage_shuffle; // Obtain a segment of consecutive items that are blocked across threads int thread_keys[4]; ... BlockMergeSort(temp_storage_shuffle).Sort(thread_keys, CustomLess()); ... }
Suppose the set of input
thread_keysacross the block of threads is{ [0,511,1,510], [2,509,3,508], [4,507,5,506], ..., [254,257,255,256] }. The corresponding outputthread_keysin those threads will be{ [0,1,2,3], [4,5,6,7], [8,9,10,11], ..., [508,509,510,511] }.- Re-using dynamically allocating shared memory
The following example under the examples/block folder illustrates usage of dynamically shared memory with BlockReduce and how to re-purpose the same memory region: example_block_reduce_dyn_smem.cu
- Template Parameters
KeyT – KeyT type
BLOCK_DIM_X – The thread block length in threads along the X dimension
ITEMS_PER_THREAD – The number of items per thread
ValueT – [optional] ValueT type (default:
hipcub::NullType, which indicates a keys-only sort)BLOCK_DIM_Y – [optional] The thread block length in threads along the Y dimension (default: 1)
BLOCK_DIM_Z – [optional] The thread block length in threads along the Z dimension (default: 1)
Public Functions
-
__device__ __forceinline__ inline BlockMergeSort()¶
-
__device__ __forceinline__ inline explicit BlockMergeSort(typename BlockMergeSortStrategyT::TempStorage &temp_storage)¶
-
__device__ __forceinline__ inline unsigned int get_linear_tid() const¶
-
__device__ __forceinline__ inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op)¶
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that i and j are equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs).CompareOpis a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
__device__ __forceinline__ inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)¶
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
iandjare equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.The value of
oob_defaultis assigned to all elements that are out ofvalid_itemsboundaries. It’s expected thatoob_defaultis ordered after any value in thevalid_itemsboundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered afteroob_default, it won’t be placed withinvalid_itemsboundaries.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs).CompareOpis a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
__device__ __forceinline__ inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], ValueT (&items)[ITEMS_PER_THREAD], CompareOp compare_op)¶
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
iandjare equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs).CompareOpis a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
__device__ __forceinline__ inline void Sort(KeyT (&keys)[ITEMS_PER_THREAD], ValueT (&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)¶
Sorts items partitioned across a CUDA thread block using a merge sorting method.
Sort is not guaranteed to be stable. That is, suppose that
iandjare equivalent: neither one is less than the other. It is not guaranteed that the relative order of these two elements will be preserved by sort.The value of
oob_defaultis assigned to all elements that are out ofvalid_itemsboundaries. It’s expected thatoob_defaultis ordered after any value in thevalid_itemsboundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered afteroob_default, it won’t be placed withinvalid_itemsboundaries.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs)CompareOpis a model of Strict Weak Ordering.IS_LAST_TILE – True if
valid_itemsisn’t equal to theITEMS_PER_TILE
- Parameters
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
__device__ __forceinline__ inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op)¶
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
xandyare elements such thatxprecedesy, and if the two elements are equivalent (neitherx < ynory < x) then a postcondition of StableSort is thatxstill precedesy.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs).CompareOpis a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
__device__ __forceinline__ inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], ValueT (&items)[ITEMS_PER_THREAD], CompareOp compare_op)¶
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
xandyare elements such thatxprecedesy, and if the two elements are equivalent (neitherx < ynory < x) then a postcondition of StableSort is thatxstill precedesy.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs).CompareOpis a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
-
__device__ __forceinline__ inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)¶
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
xandyare elements such thatxprecedesy, and if the two elements are equivalent (neitherx < ynory < x) then a postcondition of StableSort is thatxstill precedesy.The value of
oob_defaultis assigned to all elements that are out ofvalid_itemsboundaries. It’s expected thatoob_defaultis ordered after any value in thevalid_itemsboundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered afteroob_default, it won’t be placed withinvalid_itemsboundaries.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs).CompareOpis a model of Strict Weak Ordering.- Parameters
keys – [inout] Keys to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items
-
__device__ __forceinline__ inline void StableSort(KeyT (&keys)[ITEMS_PER_THREAD], ValueT (&items)[ITEMS_PER_THREAD], CompareOp compare_op, int valid_items, KeyT oob_default)¶
Sorts items partitioned across a CUDA thread block using a merge sorting method.
StableSort is stable: it preserves the relative ordering of equivalent elements. That is, if
xandyare elements such thatxprecedesy, and if the two elements are equivalent (neitherx < ynory < x) then a postcondition of StableSort is thatxstill precedesy.The value of
oob_defaultis assigned to all elements that are out ofvalid_itemsboundaries. It’s expected thatoob_defaultis ordered after any value in thevalid_itemsboundaries. The algorithm always sorts a fixed amount of elements, which is equal toITEMS_PER_THREAD * BLOCK_THREADS. If there is a value that is ordered afteroob_default, it won’t be placed withinvalid_itemsboundaries.
- Template Parameters
CompareOp – functor type having member
bool operator()(KeyT lhs, KeyT rhs).CompareOpis a model of Strict Weak Ordering.IS_LAST_TILE – True if
valid_itemsisn’t equal to theITEMS_PER_TILE
- Parameters
keys – [inout] Keys to sort
items – [inout] Values to sort
compare_op – [in] Comparison function object which returns true if the first argument is ordered before the second
valid_items – [in] Number of valid items to sort
oob_default – [in] Default value to assign out-of-bound items