Skip to content

Conversation

@rudrabeniwal
Copy link
Collaborator

Solution

Executors

  • Executors should accept a vulkan Buffer instead of on-RAM ByteBuffer. They should copy their content with Vulkan API to their input buffers.
  • Executors should also accept output vulkan Buffers and copy results to them, instead of creating new on-RAM ByteBuffers each time.

Implementation

The execute method in AbstractExecutor.scala and SequenceExecutor.scala are modified to accept and return Seq[io.computenode.cyfra.vulkan.memory.Buffer].

This involved changing the method signatures

def execute(input: Seq[Buffer]): Seq[Buffer] = {
def execute(inputs: Seq[Buffer], dataLength: Int): Seq[Buffer] = pushStack { stack =>

and updating the internal logic to use Buffer objects for data transfer, typically involving host-visible buffers for input/output and device-local buffers for computation.

AbstractExecutor

  def execute(input: Seq[Buffer]): Seq[Buffer] = {
    for (i <- bufferActions.indices if bufferActions(i) == BufferAction.LoadTo) do {
      val inputHostBuffer = input(i)
      val gpuDeviceBuffer = buffers(i)
      Buffer.copyBuffer(inputHostBuffer, gpuDeviceBuffer, inputHostBuffer.size, commandPool).block().destroy()
    }

    pushStack { stack =>
      val fence = new Fence(device)
      val pCommandBuffer = stack.callocPointer(1).put(0, commandBuffer)
      val submitInfo = VkSubmitInfo
        .calloc(stack)
        .sType$Default()
        .pCommandBuffers(pCommandBuffer)

      check(VK10.vkQueueSubmit(queue.get, submitInfo, fence.get), "Failed to submit command buffer to queue")
      fence.block().destroy()
    }

    val output = for (i <- bufferActions.indices if bufferActions(i) == BufferAction.LoadFrom) yield {
      val gpuDeviceBuffer = buffers(i)
      val outputHostBuffer = new Buffer(
        gpuDeviceBuffer.size,
        VK_BUFFER_USAGE_TRANSFER_DST_BIT,
        VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
        VMA_MEMORY_USAGE_GPU_TO_CPU,
        allocator
      )
      Buffer.copyBuffer(gpuDeviceBuffer, outputHostBuffer, gpuDeviceBuffer.size, commandPool).block().destroy()
      outputHostBuffer
    }
    output
  }

SequenceExecutor

  def execute(inputs: Seq[Buffer], dataLength: Int): Seq[Buffer] = pushStack { stack =>
    timed("Vulkan full execute"):
      val setToBuffers = createBuffers(dataLength)

      def buffersWithAction(bufferAction: BufferAction): Seq[Buffer] =
        computeSequence.sequence.collect { case x: Compute =>
          pipelineToDescriptorSets(x.pipeline).map(setToBuffers).zip(x.pumpLayoutLocations).flatMap(x => x._1.zip(x._2)).collect {
            case (buffer, action) if (action.action & bufferAction.action) != 0 => buffer
          }
        }.flatten

      buffersWithAction(BufferAction.LoadTo).zipWithIndex.foreach { case (gpuDeviceBuffer, i) =>
        val inputHostBuffer = inputs(i)
        Buffer.copyBuffer(inputHostBuffer, gpuDeviceBuffer, inputHostBuffer.size, commandPool).block().destroy()
      }

      val fence = new Fence(device)
      val commandBuffer = recordCommandBuffer(dataLength)
      val pCommandBuffer = stack.callocPointer(1).put(0, commandBuffer)
      val submitInfo = VkSubmitInfo
        .calloc(stack)
        .sType$Default()
        .pCommandBuffers(pCommandBuffer)

      timed("Vulkan render command"):
        check(vkQueueSubmit(queue.get, submitInfo, fence.get), "Failed to submit command buffer to queue")
        fence.block().destroy()

      val output = buffersWithAction(BufferAction.LoadFrom).map { gpuDeviceBuffer =>
        val outputHostBuffer = new Buffer(
          gpuDeviceBuffer.size,
          VK_BUFFER_USAGE_TRANSFER_DST_BIT,
          VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
          VMA_MEMORY_USAGE_GPU_TO_CPU, 
          allocator
        )
        Buffer.copyBuffer(gpuDeviceBuffer, outputHostBuffer, gpuDeviceBuffer.size, commandPool).block().destroy()
        outputHostBuffer
      }

      commandPool.freeCommandBuffer(commandBuffer)
      setToBuffers.keys.foreach(_.update(Seq.empty))
      setToBuffers.flatMap(_._2).foreach(_.destroy())

      output
  }

  def destroy(): Unit =
    descriptorSets.foreach(_.destroy())

}

The main idea is:

  1. Input: The input Seq[Buffer] are assumed to be host-visible buffers. Their contents will be copied to the internal device-local compute buffers using vkCmdCopyBuffer (via the Buffer.copyBuffer helper that utilizes a command pool).
  2. Output: New host-visible Buffer objects will be created. The results from the internal device-local compute buffers will be copied to these new output buffers, again using vkCmdCopyBuffer.

The general-purpose stagingBuffer used for ByteBuffer transfers can be removed, as direct Buffer to Buffer copies (staged internally by the Buffer.copyBuffer) will be used.

Input Buffer Copying (BufferAction.LoadTo):
* The inputs: Seq[Buffer] are assumed to be already created Buffer objects, typically with VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT and VK_BUFFER_USAGE_TRANSFER_SRC_BIT.
* computeBuffer refers to the internal buffers created by createBuffers, which are device-local (VMA_MEMORY_USAGE_GPU_ONLY) and have VK_BUFFER_USAGE_STORAGE_BUFFER_BIT and VK_BUFFER_USAGE_TRANSFER_DST_BIT (from action.action).
* Buffer.copyBuffer(inputHostBuffer, computeBuffer, inputHostBuffer.size, commandPool):
* This helper method internally creates a temporary command buffer from the commandPool.
* It records a vkCmdCopyBuffer command to copy data from inputHostBuffer to computeBuffer.
* It submits this command buffer to a queue and returns a Fence which is then waited upon (block()) to ensure the copy completes. The fence is then destroyed.
* This is suitable for copying between host-visible and device-local memory.

Output Buffer Copying (BufferAction.LoadFrom):
* computeBuffer is an internal device-local buffer holding computation results. It needs VK_BUFFER_USAGE_TRANSFER_SRC_BIT (from action.action).
* val outputHostBuffer = new Buffer(...): A new Buffer is created for each piece of output data.
* VK_BUFFER_USAGE_TRANSFER_DST_BIT: So it can be the target of a copy.
* VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT: Makes the buffer memory mappable and visible to the host (CPU). HOST_COHERENT ensures writes by the GPU are visible to the CPU without explicit flushing.
* VMA_MEMORY_USAGE_GPU_TO_CPU: VMA hint for memory that is written by GPU and read by CPU.
* Buffer.copyBuffer(computeBuffer, outputHostBuffer, computeBuffer.size, commandPool):
* Similar to the input copy, this uses vkCmdCopyBuffer within a temporary command buffer to transfer data from the device-local computeBuffer to the newly created host-visible outputHostBuffer.
* The operation is synchronized using a fence.
* The Seq of these outputHostBuffer objects is then returned.


Solution

GMem

  • GMem should not hold a reference to ByteBuffer. Instead, it should reference Vulkan Buffer. This buffer should be copied to input buffers in Executor, and then the compute sequence should be executed.
  • GMem should expose a constructor that builds a GMem from Array of correct type, and one that loads data from buffer and returns it as an Array.
  • GMem should also expose method to clean-up the buffer.

Implementation

  1. The toReadOnlyBuffer method will be removed from GMem and RamGMem.
  2. A new abstract member vulkanBuffer of type io.computenode.cyfra.vulkan.memory.Buffer will be added to the GMem trait.
  3. FloatMem and Vec4FloatMem will be updated to store and expose this Vulkan Buffer. Their constructors will need to accept an io.computenode.cyfra.vulkan.memory.Buffer.
class FloatMem(val size: Int, val vulkanBuffer: Buffer) extends RamGMem[Float32, Float]:
class Vec4FloatMem(val size: Int, val vulkanBuffer: Buffer) extends RamGMem[Vec4[Float32], fRGBA]:
  1. toArray(using GContext) methods in FloatMem and Vec4FloatMem to read data from their Vulkan Buffer back into an Array.

FloatMem

  def toArray(using context: GContext): Array[Float] =
    val allocator = context.vkContext.allocator
    val commandPool = context.vkContext.commandPool
    val bufferSize = size.toLong * FloatMem.FloatSize

    val stagingBuffer = new Buffer(
      bufferSize.toInt,
      VK_BUFFER_USAGE_TRANSFER_DST_BIT,
      VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
      VMA_MEMORY_USAGE_GPU_TO_CPU,
      allocator
    )

    Buffer.copyBuffer(vulkanBuffer, stagingBuffer, bufferSize, commandPool).block().close()
    
    val byteBuffer = stagingBuffer.map()
    val floatBuffer = byteBuffer.asFloatBuffer()
    val result = new Array[Float](size)
    floatBuffer.get(result)
    stagingBuffer.unmap()
    stagingBuffer.destroy()
    result

Vec4FloatMem

  def toArray(using context: GContext): Array[fRGBA] = {
    val allocator = context.vkContext.allocator
    val commandPool = context.vkContext.commandPool
    val bufferSize = size.toLong * Vec4FloatMem.Vec4FloatSize

    val stagingBuffer = new Buffer(
      bufferSize.toInt,
      VK_BUFFER_USAGE_TRANSFER_DST_BIT,
      VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
      VMA_MEMORY_USAGE_GPU_TO_CPU,
      allocator
    )

    Buffer.copyBuffer(vulkanBuffer, stagingBuffer, bufferSize, commandPool).block().close()

    val byteBuffer = stagingBuffer.map()
    val floatBuffer = byteBuffer.asFloatBuffer()
    val result = new Array[fRGBA](size)
    for (i <- 0 until size)
      result(i) = (floatBuffer.get(), floatBuffer.get(), floatBuffer.get(), floatBuffer.get())
    
    stagingBuffer.unmap()
    stagingBuffer.destroy()
    result
  }
  1. Added apply methods that take a size and GContext to create an uninitialized GMem backed by a Vulkan Buffer.

FloatMem

object FloatMem {
  val FloatSize = 4

  def apply(floats: Array[Float])(using context: GContext): FloatMem =
    val size = floats.length
    val bufferSize = size.toLong * FloatSize
    val allocator = context.vkContext.allocator
    val commandPool = context.vkContext.commandPool

    val stagingBuffer = new Buffer(
      bufferSize.toInt,
      VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
      VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
      VMA_MEMORY_USAGE_CPU_ONLY,
      allocator
    )

    val byteBuffer = stagingBuffer.map()
    byteBuffer.asFloatBuffer().put(floats)
    stagingBuffer.unmap()

    val deviceBuffer = new Buffer(
      bufferSize.toInt,
      VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
      0, 
      VMA_MEMORY_USAGE_GPU_ONLY,
      allocator
    )

    Buffer.copyBuffer(stagingBuffer, deviceBuffer, bufferSize, commandPool).block().close()
    stagingBuffer.destroy()

    new FloatMem(size, deviceBuffer)

  def apply(size: Int)(using context: GContext): FloatMem = 
    val bufferSize = size.toLong * FloatSize
    val allocator = context.vkContext.allocator
    val deviceBuffer = new Buffer(
      bufferSize.toInt,
      VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
      0,
      VMA_MEMORY_USAGE_GPU_ONLY,
      allocator
    )
    new FloatMem(size, deviceBuffer)
}

Vec4FloatMem

object Vec4FloatMem:
  val Vec4FloatSize = 16

  def apply(vecs: Array[fRGBA])(using context: GContext): Vec4FloatMem = {
    val size = vecs.length
    val bufferSize = size.toLong * Vec4FloatSize
    val allocator = context.vkContext.allocator
    val commandPool = context.vkContext.commandPool

    val stagingBuffer = new Buffer(
      bufferSize.toInt,
      VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
      VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
      VMA_MEMORY_USAGE_CPU_ONLY,
      allocator
    )

    val byteBuffer = stagingBuffer.map()
    val floatBuffer = byteBuffer.asFloatBuffer()
    vecs.foreach { case (x, y, z, a) =>
      floatBuffer.put(x)
      floatBuffer.put(y)
      floatBuffer.put(z)
      floatBuffer.put(a)
    }
    stagingBuffer.unmap()

    val deviceBuffer = new Buffer(
      bufferSize.toInt,
      VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
      0, 
      VMA_MEMORY_USAGE_GPU_ONLY,
      allocator
    )

    Buffer.copyBuffer(stagingBuffer, deviceBuffer, bufferSize, commandPool).block().close()
    stagingBuffer.destroy()

    new Vec4FloatMem(size, deviceBuffer)
  }

  def apply(size: Int)(using context: GContext): Vec4FloatMem =
    val bufferSize = size.toLong * Vec4FloatSize
    val allocator = context.vkContext.allocator
    val deviceBuffer = new Buffer(
      bufferSize.toInt,
      VK_BUFFER_USAGE_TRANSFER_DST_BIT | VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
      0,
      VMA_MEMORY_USAGE_GPU_ONLY,
      allocator
    )
    new Vec4FloatMem(size, deviceBuffer)

These changes should provide the necessary functionality for creating GMem instances from arrays, retrieving data back to arrays, and cleaning up the underlying Vulkan buffers( def cleanup(): Unit = vulkanBuffer.destroy()).


GFunction

The primary role of GFunction has been simplified:

  1. No Longer Holds a ComputePipeline:

    • Previously: GFunction had a val pipeline: ComputePipeline = context.compile(this) field. This meant that upon creation, a GFunction instance would immediately try to compile itself using the provided GContext.
    • Now: This pipeline field has been removed. GFunction is now purely a container for the function logic (fn) and its associated type information (via Tags and GStructSchema).
    • Reasoning:
      • The responsibility for compiling and caching pipelines more naturally belongs to GContext.
      • This decouples GFunction from the compilation process, making it a simpler data structure.
      • It resolves the circular dependency or tight coupling where GFunction needed a GContext to compile, and GContext needed GFunction details to compile.
  2. No Implicit GContext in Primary Constructor:

    • Previously: The primary constructor case class GFunction[...](fn: ...)(implicit context: GContext) took an implicit GContext.
    • Now: The implicit GContext has been removed from the primary constructor.
    • Reasoning: Since GFunction no longer compiles itself upon instantiation, it doesn't need direct access to GContext at that point. The factory methods in the GFunction companion object (apply and from2D) still take (using context: GContext) because the context might be needed by the caller or for future extensions, but the GFunction instance itself doesn't store it.

GFunction is now a more passive data holder representing the intent of a GPU computation, rather than an active object that manages its own compiled state.

case class GFunction[
  G <: GStruct[G] : GStructSchema : Tag,
  H <: Value : Tag : FromExpr,
  R <: Value : Tag : FromExpr
](
  val fn: (G, Int32, GArray[H]) => R 
) {
  def arrayInputs: List[Tag[_]] = List(summon[Tag[H]])
  def arrayOutputs: List[Tag[_]] = List(summon[Tag[R]])
}

object GFunction:
  def apply[
    H <: Value : Tag : FromExpr,
    R <: Value : Tag : FromExpr
  ](userSimpleFn: H => R): GFunction[GStruct.Empty, H, R] =
    new GFunction[GStruct.Empty, H, R](
      (_: GStruct.Empty, workerIdx: Int32, gArray: GArray[H]) => userSimpleFn(gArray.at(workerIdx))
    )

  def from2D[
    G <: GStruct[G] : GStructSchema : Tag,
    H <: Value : Tag : FromExpr,
    R <: Value : Tag : FromExpr
  ](width: Int, userFn2D: (G, (Int32, Int32), GArray2D[H]) => R): GFunction[G, H, R] =
    new GFunction[G, H, R](
      (g: G, index: Int32, garray: GArray[H]) =>
        val x: Int32 = index mod width
        val y: Int32 = index / width
        val arr2d = GArray2D(width, garray)
        userFn2D(g, (x, y), arr2d)
    )

GContext

GContext Compilation and Execution Logic:
* Refined createPipeline in GContext to correctly derive the expression tree from GFunction.fn.

class GContext(debug: Boolean = false):
  val vkContext = VulkanContext(debug)
  private val pipelineCache = mutable.Map[Any, ComputePipeline]()

  private def createPipeline[G <: GStruct[G] : GStructSchema, H <: Value : Tag : FromExpr, R <: Value : Tag : FromExpr](
    function: GFunction[G, H, R]
  ): ComputePipeline = {
    val uniformStructSchemaImpl = summon[GStructSchema[G]]
    val tagGImpl: Tag[G] = uniformStructSchemaImpl.structTag 

    val uniformStruct = uniformStructSchemaImpl.fromTree(
      ExpressionCompiler.UniformStructRef[G](using tagGImpl).asInstanceOf[E[G]]
    )
    val tree = function
      .fn 
      .apply(
        uniformStruct,
        ExpressionCompiler.WorkerIndex, 
        GArray[H](0)
      )
    val shaderCode = DSLCompiler.compile(tree, function.arrayInputs, function.arrayOutputs, uniformStructSchemaImpl)
    dumpSpvToFile(shaderCode, "program.spv") // TODO remove before release

    val inputBinding = Binding(0, InputBufferSize(typeStride(summon[Tag[H]])))
    val outputBinding = Binding(1, InputBufferSize(typeStride(summon[Tag[R]])))
    
    val uniformBindingOpt = Option.when(uniformStructSchemaImpl.fields.nonEmpty)(
      Binding(2, UniformSize(GMem.totalStride(uniformStructSchemaImpl)))
    )
    
    val bindings = Seq(inputBinding, outputBinding) ++ uniformBindingOpt.toSeq
    val layoutInfo = LayoutInfo(Seq(LayoutSet(0, bindings)))
    
    val shader = new Shader(shaderCode, new org.joml.Vector3i(256, 1, 1), layoutInfo, "main", vkContext.device)
    new ComputePipeline(shader, vkContext)
  }
  • In execute:
    • sourceBuffersForExecutor collects buffers like mem.vulkanBuffer and the uniformStagingVkBuffer that the SequenceExecutor will read from.
    • bufferActions map is configured:
      • LayoutLocation(0, 0) (main input) uses BufferAction.LoadTo.
      • LayoutLocation(0, 1) (main output) uses BufferAction.LoadFrom, indicating it should be returned by the executor.
      • LayoutLocation(0, 2) (uniforms) uses BufferAction.LoadTo if uniforms exist.
    • A host-visible uniformStagingVkBuffer is created for uniform data, populated, and added to sourceBuffersForExecutor. This buffer is destroyed after the execution.
    • SequenceExecutor is instantiated and executed.
    • The first buffer from outputVulkanBuffers is taken as the result.
    • The result GMem is created by checking the type tag of R and instantiating the corresponding concrete GMem type (FloatMem, Vec4FloatMem). An UnsupportedOperationException is thrown for unhandled types, and the orphaned buffer is destroyed.
  def execute[
    G <: GStruct[G] : Tag : GStructSchema,
    H <: Value : Tag : FromExpr, 
    R <: Value : FromExpr : Tag 
  ](mem: GMem[H], uniformStruct: G, fn: GFunction[G, H, R]): GMem[R] = {
    val pipeline = pipelineCache.getOrElseUpdate(fn.fn, createPipeline(fn))

    val sourceBuffersForExecutor = ListBuffer[Buffer]()
    val bufferActions = mutable.Map[LayoutLocation, BufferAction]()

    bufferActions.put(LayoutLocation(0, 0), BufferAction.LoadTo)
    sourceBuffersForExecutor.addOne(mem.vulkanBuffer)

    bufferActions.put(LayoutLocation(0, 1), BufferAction.LoadFrom) 

    var uniformStagingBufferOpt: Option[Buffer] = None
    val uniformStructSchema = summon[GStructSchema[G]]
    if (uniformStructSchema.fields.nonEmpty) {
      val uniformCPUByteBuffer = GMem.serializeUniform(uniformStruct)
      val uniformStagingVkBuffer = new Buffer(
        uniformCPUByteBuffer.remaining(), // Changed from .toLong to direct Int, or .toInt if remaining() can exceed Int
        VK_BUFFER_USAGE_TRANSFER_SRC_BIT,
        VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
        VMA_MEMORY_USAGE_CPU_ONLY,
        vkContext.allocator
      )
      val mappedUniform = uniformStagingVkBuffer.map()
      mappedUniform.put(uniformCPUByteBuffer)
      uniformStagingVkBuffer.unmap()
      
      uniformStagingBufferOpt = Some(uniformStagingVkBuffer)
      bufferActions.put(LayoutLocation(0, 2), BufferAction.LoadTo)
      sourceBuffersForExecutor.addOne(uniformStagingVkBuffer)
    }

    val computeStep = Compute(pipeline, bufferActions.toMap)
    val sequence = ComputationSequence(Seq(computeStep), dependencies = Nil) 
    val sequenceExecutor = new SequenceExecutor(sequence, vkContext) 

    val outputVulkanBuffers = sequenceExecutor.execute(sourceBuffersForExecutor.toSeq, mem.size)
    
    uniformStagingBufferOpt.foreach(_.destroy())

    if (outputVulkanBuffers.isEmpty) {
      throw new IllegalStateException("SequenceExecutor did not return an output buffer.")
    }
    val resultVulkanBuffer = outputVulkanBuffers.head

    val tagR = summon[Tag[R]]
    val resultMem = 
      if (tagR.tag =:= Tag[Float32].tag) { 
        new FloatMem(mem.size, resultVulkanBuffer).asInstanceOf[GMem[R]]
      } else if (tagR.tag =:= Tag[Vec4[Float32]].tag) { 
        new Vec4FloatMem(mem.size, resultVulkanBuffer).asInstanceOf[GMem[R]]
      } else {
        resultVulkanBuffer.destroy()
        throw new UnsupportedOperationException(s"Cannot create GMem for result type ${tagR.tag}. Output buffer has been destroyed.")
      }
    resultMem
  }
  • The dumpSpvToFile method now includes a try-catch for IOException and ensures code.rewind() is in a finally block, though rewind after write might not be strictly necessary if the buffer isn't immediately reused for reading by the caller of dumpSpvToFile. The primary use of rewind here is because shaderCode is passed to new Shader afterwards.
  private def dumpSpvToFile(code: ByteBuffer, path: String): Unit =
    try {
      val fc: FileChannel = new FileOutputStream(path).getChannel
      fc.write(code)
      fc.close()
    } catch {
      case e: IOException => e.printStackTrace()
    } finally {
      code.rewind()
    }

GMem

GMem.map method takes a fn of type GFunction[G, H, R], where G is a generic type parameter. If G is not GStruct.Empty, then GFunction[G, H, R] does not match GFunction[GStruct.Empty, H, R], leading to the error.

To fix this, GMem should provide two map overloads:

  1. One for functions that take a custom uniform struct G. This map method will also take the uniformStruct instance and pass it to the corresponding context.execute overload.
  2. One for functions that use GStruct.Empty as their uniform type. This map method will call the context.execute overload that expects a GFunction[GStruct.Empty, H, R].
trait GMem[H <: Value : Tag : FromExpr]: 
  def size: Int
  def vulkanBuffer: Buffer

  def map[
    G <: GStruct[G] : Tag : GStructSchema,
    R <: Value : FromExpr : Tag
  ](uniformStruct: G, fn: GFunction[G, H, R])(using context: GContext): GMem[R] =
    context.execute(this, uniformStruct, fn)

  def map[R <: Value : FromExpr : Tag]
    (fn: GFunction[GStruct.Empty, H, R])(using context: GContext): GMem[R] =
    context.execute(this, fn) 

Renderer and Animated Files (e.g., AnimatedFunctionRenderer, ImageRtRenderer, AnimationRtRenderer, Raytracing.scala)
Before:

Called fmem.map(fn) or similar, which only works if fn is a GFunction[GStruct.Empty, ...].
When fn used a custom uniform struct, this led to type errors.
After:

Now correctly call fmem.map(uniformStruct, fn) when fn uses a custom uniform struct.
The uniform struct (e.g., RaytracingIteration(i)) is explicitly created and passed to map.
This matches the correct overload and resolves type mismatches.
Example:

val uniformStruct = RaytracingIteration(i)
fmem.map(uniformStruct, fn)

SequenceExecutorTest:

The test now creates a Vulkan Buffer, copies the input data into it, and passes this Buffer to sequenceExecutor.execute.
The output is read by mapping the result Vulkan Buffer and extracting the data as an array.
All Vulkan resources (Buffer, pipelines, executor, shader) are explicitly destroyed at the end of the test.

…eExecutor to accept a Seq of Buffer for input and expect a Buffer for output. Also removed MapExecutor
… buffers. Update GMem and RamGMem traits to include Vulkan buffer handling and cleanup methods. Introduce mapping and unmapping functionality in Buffer class for direct memory access.

Now GContext needs to be updated to align with the changes
…ecute methods in GContext and GFunction to support uniform structures. Enhance SequenceExecutorTest to utilize new Buffer class for input and output operations.
@rudrabeniwal rudrabeniwal linked an issue Jun 5, 2025 that may be closed by this pull request
@rudrabeniwal rudrabeniwal requested a review from szymon-rd June 5, 2025 08:46
Copy link
Collaborator

@MarconZet MarconZet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked vulkan code

Comment on lines 44 to 59
def map(): ByteBuffer = {
pushStack { stack =>
val pData = stack.callocPointer(1)
check(vmaMapMemory(allocator.get, allocation, pData), s"Failed to map buffer memory for buffer handle $handle allocation $allocation")
val dataPtr = pData.get(0)
if (dataPtr == NULL) {
throw new VulkanAssertionError(s"vmaMapMemory returned NULL for buffer handle $handle, allocation $allocation", -1)
}
memByteBuffer(dataPtr, this.size)
}
}

def unmap(): Unit = {
vmaUnmapMemory(allocator.get, allocation)
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think maping should use more functional approach. Unmapping should not need to be called explicitly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this?
Buffer.scala

  def map[R](f: ByteBuffer => R): R = {
    var dataPtr: Long = NULL
    try {
      dataPtr = pushStack { stack =>
        val pData = stack.callocPointer(1)
        check(vmaMapMemory(allocator.get, allocation, pData), s"Failed to map buffer memory for buffer handle $handle allocation $allocation")
        val ptr = pData.get(0)
        if (ptr == NULL) {
          throw new VulkanAssertionError(s"vmaMapMemory returned NULL for buffer handle $handle, allocation $allocation", -1)
        }
        ptr
      }
      val byteBuffer = memByteBuffer(dataPtr, this.size)
      f(byteBuffer)
    } finally {
      if (dataPtr != NULL) {
        vmaUnmapMemory(allocator.get, allocation)
      }
    }
  }

  def get(dst: Array[Byte]): Unit = {
    val len = Math.min(dst.length, size)
    this.map { mappedBuffer =>
      val bufferSlice = mappedBuffer.slice() 
      bufferSlice.limit(len)
      bufferSlice.get(dst, 0, len) 
    }
  }

  protected def close(): Unit =
    vmaDestroyBuffer(allocator.get, handle, allocation)
}

object Buffer {
  def copyBuffer(src: ByteBuffer, dst: Buffer, bytes: Long): Unit = {
    dst.map { dstMappedBuffer =>
      val srcSlice = src.slice()
      srcSlice.limit(bytes.toInt) 
      dstMappedBuffer.put(srcSlice)
      vmaFlushAllocation(dst.allocator.get, dst.allocation, 0, bytes)
    }
  }

  def copyBuffer(src: Buffer, dst: ByteBuffer, bytes: Long): Unit =
    src.map { srcMappedBuffer =>
      val srcSlice = srcMappedBuffer.slice()
      srcSlice.limit(bytes.toInt)
      dst.put(srcSlice)
    }

Copy link
Collaborator Author

@rudrabeniwal rudrabeniwal Jun 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

959e7a6 - Let me know if that works
I'm figuring out the type mismatch error in Raytracing.scala


import java.nio.ByteBuffer

private[cyfra] abstract class AbstractExecutor(dataLength: Int, val bufferActions: Seq[BufferAction], context: VulkanContext) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not much point in keeping AbstractExecutor now. All other executors were superseded by SequenceExecutor.

buffersWithAction(BufferAction.LoadTo).zipWithIndex.foreach { case (buffer, i) =>
Buffer.copyBuffer(inputs(i), stagingBuffer, buffer.size)
Buffer.copyBuffer(stagingBuffer, buffer, buffer.size, commandPool).block().destroy()
buffersWithAction(BufferAction.LoadTo).zipWithIndex.foreach { case (gpuDeviceBuffer, i) =>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't the idea to move all the data transfers out of Executor? Now the executor still moves the data RAM <-> GRAM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Memory/Buffer abstraction

3 participants