-
Notifications
You must be signed in to change notification settings - Fork 11
Refactor vulkan buffer gmem executor #45
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…eExecutor to accept a Seq of Buffer for input and expect a Buffer for output. Also removed MapExecutor
… buffers. Update GMem and RamGMem traits to include Vulkan buffer handling and cleanup methods. Introduce mapping and unmapping functionality in Buffer class for direct memory access. Now GContext needs to be updated to align with the changes
…ecute methods in GContext and GFunction to support uniform structures. Enhance SequenceExecutorTest to utilize new Buffer class for input and output operations.
MarconZet
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checked vulkan code
| def map(): ByteBuffer = { | ||
| pushStack { stack => | ||
| val pData = stack.callocPointer(1) | ||
| check(vmaMapMemory(allocator.get, allocation, pData), s"Failed to map buffer memory for buffer handle $handle allocation $allocation") | ||
| val dataPtr = pData.get(0) | ||
| if (dataPtr == NULL) { | ||
| throw new VulkanAssertionError(s"vmaMapMemory returned NULL for buffer handle $handle, allocation $allocation", -1) | ||
| } | ||
| memByteBuffer(dataPtr, this.size) | ||
| } | ||
| } | ||
|
|
||
| def unmap(): Unit = { | ||
| vmaUnmapMemory(allocator.get, allocation) | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think maping should use more functional approach. Unmapping should not need to be called explicitly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this?
Buffer.scala
def map[R](f: ByteBuffer => R): R = {
var dataPtr: Long = NULL
try {
dataPtr = pushStack { stack =>
val pData = stack.callocPointer(1)
check(vmaMapMemory(allocator.get, allocation, pData), s"Failed to map buffer memory for buffer handle $handle allocation $allocation")
val ptr = pData.get(0)
if (ptr == NULL) {
throw new VulkanAssertionError(s"vmaMapMemory returned NULL for buffer handle $handle, allocation $allocation", -1)
}
ptr
}
val byteBuffer = memByteBuffer(dataPtr, this.size)
f(byteBuffer)
} finally {
if (dataPtr != NULL) {
vmaUnmapMemory(allocator.get, allocation)
}
}
}
def get(dst: Array[Byte]): Unit = {
val len = Math.min(dst.length, size)
this.map { mappedBuffer =>
val bufferSlice = mappedBuffer.slice()
bufferSlice.limit(len)
bufferSlice.get(dst, 0, len)
}
}
protected def close(): Unit =
vmaDestroyBuffer(allocator.get, handle, allocation)
}
object Buffer {
def copyBuffer(src: ByteBuffer, dst: Buffer, bytes: Long): Unit = {
dst.map { dstMappedBuffer =>
val srcSlice = src.slice()
srcSlice.limit(bytes.toInt)
dstMappedBuffer.put(srcSlice)
vmaFlushAllocation(dst.allocator.get, dst.allocation, 0, bytes)
}
}
def copyBuffer(src: Buffer, dst: ByteBuffer, bytes: Long): Unit =
src.map { srcMappedBuffer =>
val srcSlice = srcMappedBuffer.slice()
srcSlice.limit(bytes.toInt)
dst.put(srcSlice)
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
959e7a6 - Let me know if that works
I'm figuring out the type mismatch error in Raytracing.scala
|
|
||
| import java.nio.ByteBuffer | ||
|
|
||
| private[cyfra] abstract class AbstractExecutor(dataLength: Int, val bufferActions: Seq[BufferAction], context: VulkanContext) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not much point in keeping AbstractExecutor now. All other executors were superseded by SequenceExecutor.
| buffersWithAction(BufferAction.LoadTo).zipWithIndex.foreach { case (buffer, i) => | ||
| Buffer.copyBuffer(inputs(i), stagingBuffer, buffer.size) | ||
| Buffer.copyBuffer(stagingBuffer, buffer, buffer.size, commandPool).block().destroy() | ||
| buffersWithAction(BufferAction.LoadTo).zipWithIndex.foreach { case (gpuDeviceBuffer, i) => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't the idea to move all the data transfers out of Executor? Now the executor still moves the data RAM <-> GRAM.
Solution
Executors
Implementation
The
executemethod inAbstractExecutor.scalaandSequenceExecutor.scalaare modified to accept and returnSeq[io.computenode.cyfra.vulkan.memory.Buffer].This involved changing the method signatures
and updating the internal logic to use
Bufferobjects for data transfer, typically involving host-visible buffers for input/output and device-local buffers for computation.AbstractExecutorSequenceExecutorThe main idea is:
Seq[Buffer]are assumed to be host-visible buffers. Their contents will be copied to the internal device-local compute buffers usingvkCmdCopyBuffer(via theBuffer.copyBufferhelper that utilizes a command pool).Bufferobjects will be created. The results from the internal device-local compute buffers will be copied to these new output buffers, again usingvkCmdCopyBuffer.The general-purpose
stagingBufferused forByteBuffertransfers can be removed, as directBuffertoBuffercopies (staged internally by theBuffer.copyBuffer) will be used.Input Buffer Copying (
BufferAction.LoadTo):* The
inputs: Seq[Buffer]are assumed to be already createdBufferobjects, typically withVK_MEMORY_PROPERTY_HOST_VISIBLE_BITandVK_BUFFER_USAGE_TRANSFER_SRC_BIT.*
computeBufferrefers to the internal buffers created bycreateBuffers, which are device-local (VMA_MEMORY_USAGE_GPU_ONLY) and haveVK_BUFFER_USAGE_STORAGE_BUFFER_BITandVK_BUFFER_USAGE_TRANSFER_DST_BIT(fromaction.action).*
Buffer.copyBuffer(inputHostBuffer, computeBuffer, inputHostBuffer.size, commandPool):* This helper method internally creates a temporary command buffer from the
commandPool.* It records a
vkCmdCopyBuffercommand to copy data frominputHostBuffertocomputeBuffer.* It submits this command buffer to a queue and returns a
Fencewhich is then waited upon (block()) to ensure the copy completes. The fence is then destroyed.* This is suitable for copying between host-visible and device-local memory.
Output Buffer Copying (
BufferAction.LoadFrom):*
computeBufferis an internal device-local buffer holding computation results. It needsVK_BUFFER_USAGE_TRANSFER_SRC_BIT(fromaction.action).*
val outputHostBuffer = new Buffer(...): A newBufferis created for each piece of output data.*
VK_BUFFER_USAGE_TRANSFER_DST_BIT: So it can be the target of a copy.*
VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT: Makes the buffer memory mappable and visible to the host (CPU).HOST_COHERENTensures writes by the GPU are visible to the CPU without explicit flushing.*
VMA_MEMORY_USAGE_GPU_TO_CPU: VMA hint for memory that is written by GPU and read by CPU.*
Buffer.copyBuffer(computeBuffer, outputHostBuffer, computeBuffer.size, commandPool):* Similar to the input copy, this uses
vkCmdCopyBufferwithin a temporary command buffer to transfer data from the device-localcomputeBufferto the newly created host-visibleoutputHostBuffer.* The operation is synchronized using a fence.
* The
Seqof theseoutputHostBufferobjects is then returned.Solution
GMem
Implementation
toReadOnlyBuffermethod will be removed fromGMemandRamGMem.vulkanBufferof typeio.computenode.cyfra.vulkan.memory.Bufferwill be added to theGMemtrait.FloatMemandVec4FloatMemwill be updated to store and expose this VulkanBuffer. Their constructors will need to accept anio.computenode.cyfra.vulkan.memory.Buffer.toArray(using GContext)methods inFloatMemandVec4FloatMemto read data from their VulkanBufferback into anArray.FloatMemVec4FloatMemapplymethods that take asizeandGContextto create an uninitializedGMembacked by a VulkanBuffer.FloatMemVec4FloatMemThese changes should provide the necessary functionality for creating
GMeminstances from arrays, retrieving data back to arrays, and cleaning up the underlying Vulkan buffers(def cleanup(): Unit = vulkanBuffer.destroy()).GFunction
The primary role of
GFunctionhas been simplified:No Longer Holds a
ComputePipeline:GFunctionhad aval pipeline: ComputePipeline = context.compile(this)field. This meant that upon creation, aGFunctioninstance would immediately try to compile itself using the providedGContext.pipelinefield has been removed.GFunctionis now purely a container for the function logic (fn) and its associated type information (viaTags andGStructSchema).GContext.GFunctionfrom the compilation process, making it a simpler data structure.GFunctionneeded aGContextto compile, andGContextneededGFunctiondetails to compile.No Implicit
GContextin Primary Constructor:case class GFunction[...](fn: ...)(implicit context: GContext)took an implicitGContext.GContexthas been removed from the primary constructor.GFunctionno longer compiles itself upon instantiation, it doesn't need direct access toGContextat that point. The factory methods in theGFunctioncompanion object (applyandfrom2D) still take(using context: GContext)because the context might be needed by the caller or for future extensions, but theGFunctioninstance itself doesn't store it.GFunctionis now a more passive data holder representing the intent of a GPU computation, rather than an active object that manages its own compiled state.GContext
GContextCompilation and Execution Logic:* Refined
createPipelineinGContextto correctly derive the expression tree fromGFunction.fn.execute:sourceBuffersForExecutorcollects buffers likemem.vulkanBufferand theuniformStagingVkBufferthat theSequenceExecutorwill read from.bufferActionsmap is configured:LayoutLocation(0, 0)(main input) usesBufferAction.LoadTo.LayoutLocation(0, 1)(main output) usesBufferAction.LoadFrom, indicating it should be returned by the executor.LayoutLocation(0, 2)(uniforms) usesBufferAction.LoadToif uniforms exist.uniformStagingVkBufferis created for uniform data, populated, and added tosourceBuffersForExecutor. This buffer is destroyed after the execution.SequenceExecutoris instantiated and executed.outputVulkanBuffersis taken as the result.GMemis created by checking the type tag ofRand instantiating the corresponding concreteGMemtype (FloatMem,Vec4FloatMem). AnUnsupportedOperationExceptionis thrown for unhandled types, and the orphaned buffer is destroyed.dumpSpvToFilemethod now includes a try-catch forIOExceptionand ensurescode.rewind()is in afinallyblock, thoughrewindafterwritemight not be strictly necessary if the buffer isn't immediately reused for reading by the caller ofdumpSpvToFile. The primary use ofrewindhere is becauseshaderCodeis passed tonew Shaderafterwards.GMem
GMem.mapmethod takes afnof typeGFunction[G, H, R], whereGis a generic type parameter. IfGis notGStruct.Empty, thenGFunction[G, H, R]does not matchGFunction[GStruct.Empty, H, R], leading to the error.To fix this,
GMemshould provide twomapoverloads:G. Thismapmethod will also take theuniformStructinstance and pass it to the correspondingcontext.executeoverload.GStruct.Emptyas their uniform type. Thismapmethod will call thecontext.executeoverload that expects aGFunction[GStruct.Empty, H, R].Renderer and Animated Files (e.g., AnimatedFunctionRenderer, ImageRtRenderer, AnimationRtRenderer, Raytracing.scala)
Before:
Called fmem.map(fn) or similar, which only works if fn is a GFunction[GStruct.Empty, ...].
When fn used a custom uniform struct, this led to type errors.
After:
Now correctly call fmem.map(uniformStruct, fn) when fn uses a custom uniform struct.
The uniform struct (e.g., RaytracingIteration(i)) is explicitly created and passed to map.
This matches the correct overload and resolves type mismatches.
Example:
SequenceExecutorTest:
The test now creates a Vulkan Buffer, copies the input data into it, and passes this Buffer to sequenceExecutor.execute.
The output is read by mapping the result Vulkan Buffer and extracting the data as an array.
All Vulkan resources (Buffer, pipelines, executor, shader) are explicitly destroyed at the end of the test.