Template Class BlockRadixRank

Nested Relationships

Nested Types

Class Documentation

template<int BLOCK_DIM_X, int RADIX_BITS, bool IS_DESCENDING, bool MEMOIZE_OUTER_SCAN = false, BlockScanAlgorithm INNER_SCAN_ALGORITHM = BLOCK_SCAN_WARP_SCANS, hipSharedMemConfig SMEM_CONFIG = hipSharedMemBankSizeFourByte, int BLOCK_DIM_Y = 1, int BLOCK_DIM_Z = 1, int ARCH = 1>
class hipcub::BlockRadixRank

BlockRadixRank provides operations for ranking unsigned integer types within a CUDA thread block.

Overview

Blah…

  • Keys must be in a form suitable for radix ranking (i.e., unsigned bits).

Performance Considerations

Examples

  • Example 1: Simple radix rank of 32-bit integer keys

    #include <hipcub/hipcub.hpp>
    
    template <int BLOCK_THREADS>
    __global__ void ExampleKernel(...)
    {
    

Template Parameters
  • BLOCK_DIM_X – The thread block length in threads along the X dimension

  • RADIX_BITS – The number of radix bits per digit place

  • IS_DESCENDING – Whether or not the sorted-order is high-to-low

  • MEMOIZE_OUTER_SCAN[optional] Whether or not to buffer outer raking scan partials to incur fewer shared memory reads at the expense of higher register pressure (default: true for architectures SM35 and newer, false otherwise). See BlockScanAlgorithm::BLOCK_SCAN_RAKING_MEMOIZE for more details.

  • INNER_SCAN_ALGORITHM[optional] The hipcub::BlockScanAlgorithm algorithm to use (default: hipcub::BLOCK_SCAN_WARP_SCANS)

  • SMEM_CONFIG[optional] Shared memory bank mode (default: hipSharedMemBankSizeFourByte)

  • BLOCK_DIM_Y[optional] The thread block length in threads along the Y dimension (default: 1)

  • BLOCK_DIM_Z[optional] The thread block length in threads along the Z dimension (default: 1)

  • ARCH[optional]

Collective constructors

__device__ inline BlockRadixRank()

Collective constructor using a private static allocation of shared memory as temporary storage.

__device__ inline BlockRadixRank(TempStorage &temp_storage)

Collective constructor using the specified memory allocation as temporary storage.

Parameters

temp_storage – Reference to memory allocation having layout type TempStorage

Raking

template<typename UnsignedBits, int KEYS_PER_THREAD, typename DigitExtractorT>
__device__ inline void RankKeys(UnsignedBits (&keys)[KEYS_PER_THREAD], int (&ranks)[KEYS_PER_THREAD], DigitExtractorT digit_extractor)

Rank keys.

Parameters
  • keys – Keys for this tile

  • ranks – For each key, the local rank within the tile

  • digit_extractor – The digit extractor

template<typename UnsignedBits, int KEYS_PER_THREAD, typename DigitExtractorT>
__device__ inline void RankKeys(UnsignedBits (&keys)[KEYS_PER_THREAD], int (&ranks)[KEYS_PER_THREAD], DigitExtractorT digit_extractor, int (&exclusive_digit_prefix)[BINS_TRACKED_PER_THREAD])

Rank keys. For the lower RADIX_DIGITS threads, digit counts for each digit are provided for the corresponding thread.

Parameters
  • keys – Keys for this tile

  • ranks – For each key, the local rank within the tile (out parameter)

  • digit_extractor – The digit extractor

  • exclusive_digit_prefix – The exclusive prefix sum for the digits [(threadIdx.x * BINS_TRACKED_PER_THREAD) … (threadIdx.x * BINS_TRACKED_PER_THREAD) + BINS_TRACKED_PER_THREAD - 1]

Public Types

enum [anonymous]

Values:

enumerator BINS_TRACKED_PER_THREAD

Number of bin-starting offsets tracked per thread.

struct TempStorage : public hipcub::Uninitialized<_TempStorage>

{BlockScan}

Public Types

enum [anonymous]

Values:

typedef UnitWord<_TempStorage>::DeviceWord DeviceWord

Biggest memory-access word that T is a whole multiple of and is not larger than the alignment of T.

Public Functions

__host__ __device__ __forceinline__ inline _TempStorage &Alias()

Alias.

Public Members

DeviceWord storage[WORDS]

Backing storage.