cuda_graph
argsort(iterable, key)
¶
Sort the list of tensors following provided lambda function. :param iterable: iterable object to sort :param key: lambda function to sort the iterable object :return: indices to sort the iterable object
cuda_graphs_wrapper(model, inputs)
¶
Wrapper to run the model with cuda graphs. @param model: model to save as a CUDA graph @param inputs: inputs to the model @return: an inference function that runs the model with cuda graphs
get_pool_size(inputs, existing_pools)
¶
Get the size of the pool to use for the CUDA graphs: - pool size should be at least as big as the largest existing pool size - if pool size < 1Gb, increase its size up to next power of 2 to avoid having many unusuable small pools
:param inputs: list of inputs to be copied in the pool :param existing_pools: list of existing pools :return: size of the pool in bytes
prepare_inputs(inputs, pools)
¶
Copy the inputs in the CUDA graphs memory pool and return tensor copies. Follows a greedy bin packing algorithm (first-fit decreasing) to minimize the number of pools: - sort the items in decreasing order of size ; - insert them one by one into the first bin that has room for it.
:param inputs: list of tensors to copy in the pool :param pools: list of available pools :return: copy of input tensors having their underlying storage in the memory pool