Skip to content

Conversation

@rudrabeniwal
Copy link
Collaborator

Vulkan Compute API for Cyfra

Hey everyone! I wanted to share the compute API component I've been working on. This is just a small piece of the bigger system, but it adds a high-level abstraction layer for GPU compute operations using Vulkan.

Implementation Approach

I decided to wrap the LWJGL Vulkan bindings with a more user-friendly Scala API that handles boilerplate. This was my first time working with Vulkan, so there was definitely a learning curve, but I think the abstraction turned out good!

Here's how the API layers work:

User API         : Buffer, ComputeContext, Shader, Pipeline
                   ↓
Vulkan wrappers  : VkBuffer, VulkanContext, VkShader, VkComputePipeline
                   ↓
LWJGL bindings   : VK10.*, Vma.*, etc.

The main components are:

  • ComputeContext: Controls the Vulkan environment
  • Buffer: Manages GPU memory
  • Shader: Handles SPIR-V shader code
  • Pipeline: Manages compute pipelines

API Components

Buffer Management (api/Buffer.scala)

Represents memory on the GPU with methods for data transfer between host and device:

class Buffer(
    private[api] val vulkanContext: VulkanContext, 
    val size: Int, 
    val isHostVisible: Boolean = false
) extends AutoCloseable {

  // Copy data from host to device
  def copyFrom(src: ByteBuffer): Try[Unit]
  
  // Copy data from device to host
  def copyToHost(): Try[ByteBuffer]
  
  // Create a duplicate buffer with same contents
  def duplicate(): Try[Buffer]
  
  // Direct memory mapping (for host-visible buffers)
  def map(): Try[ByteBuffer]
  
  // Resource cleanup
  override def close(): Unit
}

Compute Context (ComputeContext.scala)

Entry point for all compute operations, manages the Vulkan environment:

class ComputeContext(enableValidation: Boolean = false) extends AutoCloseable {
  private val vulkanContext = new VulkanContext(enableValidation)
  private val executionLock = new ReentrantLock()
  
  // Buffer creation
  def createBuffer(size: Int, isHostVisible: Boolean = false): Try[Buffer]
  def createBufferWithData(data: ByteBuffer, isHostVisible: Boolean): Try[Buffer]
  def createIntBuffer(data: Array[Int], isHostVisible: Boolean): Try[Buffer]
  def createFloatBuffer(data: Array[Float], isHostVisible: Boolean): Try[Buffer]
  
  // Shader creation
  def createShader(
    spirvCode: ByteBuffer, 
    workgroupSize: Vector3i,
    layout: LayoutInfo, 
    entryPoint: String = "main"
  ): Try[Shader]
  
  // Pipeline creation
  def createPipeline(shader: Shader): Try[Pipeline]
  
  // Execution
  def execute(
    pipeline: Pipeline, 
    inputs: Seq[Buffer], 
    outputs: Seq[Buffer], 
    elemCount: Int
  ): Try[Unit]
  
  // Cleanup
  override def close(): Unit
}

Usage example:

// Initialize with validation for debugging
val context = new ComputeContext(enableValidation = true)

Shader Management (Shader.scala)

Encapsulates SPIR-V shader code and metadata:

class Shader(
    private[api] val vulkanContext: VulkanContext,
    spirvCode: ByteBuffer,
    workgroupSize: Vector3i,
    layoutInfo: LayoutInfo,
    entryPoint: String = "main"
) extends AutoCloseable {

  def getWorkgroupDimensions: Vector3i
  def getLayoutInfo: LayoutInfo
  def getEntryPoint: String
  
  override def close(): Unit
}

object Shader {
  def loadFromFile(...): Try[Shader]
  def loadFromResource(...): Try[Shader]
}

Pipeline Execution (Pipeline.scala)

Manages compute pipeline creation and dispatch:

class Pipeline(
    private[api] val vulkanContext: VulkanContext,
    shader: Shader
) extends AutoCloseable {

  def getShader: Shader
  
  // Calculate workgroups based on element count
  def calculateWorkgroupCount(elementCount: Int): Vector3i
  
  // Execute pipeline on specified buffers
  def execute(
    context: ComputeContext,
    inputs: Seq[Buffer],
    outputs: Seq[Buffer],
    elementCount: Int
  ): Try[Unit]
  
  override def close(): Unit
}

Layout Models (LayoutModel.scala)

Defines shader interface for buffer binding:

case class LayoutInfo(sets: Seq[LayoutSet])
case class LayoutSet(id: Int, bindings: Seq[Binding])
case class Binding(id: Int, size: Int)

object LayoutInfo {
  // Helper for standard input/output layout
  def standardIOLayout(inputSet: Int, outputSet: Int, elemSize: Int): LayoutInfo = 
    LayoutInfo(Seq(
      LayoutSet(inputSet, Seq(Binding(0, elemSize))),
      LayoutSet(outputSet, Seq(Binding(0, elemSize)))
    ))
}

Memory Management

We use the Vulkan Memory Allocator (VMA) for efficient GPU memory allocation:

// Inside Buffer.scala constructor
private[api] val vkBuffer: VkBuffer = {
  if (isHostVisible) {
    new VkBuffer(
      size,
      VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT,
      VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT,
      VMA_MEMORY_USAGE_CPU_TO_GPU,
      vulkanContext.allocator
    )
  } else {
    new VkBuffer(
      size,
      VK_BUFFER_USAGE_STORAGE_BUFFER_BIT | VK_BUFFER_USAGE_TRANSFER_SRC_BIT | VK_BUFFER_USAGE_TRANSFER_DST_BIT,
      0,
      VMA_MEMORY_USAGE_GPU_ONLY,
      vulkanContext.allocator
    )
  }
}

Memory types:

  • VMA_MEMORY_USAGE_CPU_TO_GPU: For host-visible buffers that need frequent updates
  • VMA_MEMORY_USAGE_GPU_ONLY: For device-local memory with optimal performance

Command Buffer Lifecycle

Command buffers are created, recorded, submitted, and recycled:

// Recording a command buffer (simplified from AbstractExecutor.scala)
val commandBuffer = commandPool.createCommandBuffer()

val commandBufferBeginInfo = VkCommandBufferBeginInfo
  .calloc(stack)
  .sType$Default()
  .flags(0)

vkBeginCommandBuffer(commandBuffer, commandBufferBeginInfo)
recordCommandBuffer(commandBuffer)  // Custom recording logic
vkEndCommandBuffer(commandBuffer)

// Submitting the command buffer
val fence = new Fence(device)
val submitInfo = VkSubmitInfo
  .calloc(stack)
  .sType$Default()
  .pCommandBuffers(pCommandBuffers)

vkQueueSubmit(queue.get, submitInfo, fence.get)
fence.block().destroy()  // Wait for completion

Thread Safety

The API uses locks to ensure thread-safe execution:

// From ComputeContext.scala
def execute(...): Try[Unit] = Try {
  if (closed.get()) {
    throw new IllegalStateException("ComputeContext has been closed")
  }
  
  executionLock.lock()
  try {
    // Execution code...
  } finally {
    executionLock.unlock()
  }
}

Usage Examples

Basic Compute Operation

import io.computenode.cyfra.api.*
import org.joml.Vector3i
import io.computenode.cyfra.vulkan.compute.Shader.loadShader
import scala.util.{Success, Failure}

// Create context
val context = new ComputeContext(enableValidation = true)

try {
  // Load shader from resources
  val spirvCode = loadShader("copy_test.spv")
  val layoutInfo = LayoutInfo.standardIOLayout(0, 1, 4)
  
  // Create input data
  val inputData = Array.tabulate(1024)(i => i)
  
  // Create buffers, shader and pipeline
  for {
    inputBuffer <- context.createIntBuffer(inputData, isHostVisible = true)
    outputBuffer <- context.createBuffer(4 * 1024)
    shader <- context.createShader(spirvCode, new Vector3i(128, 1, 1), layoutInfo)
    pipeline <- context.createPipeline(shader)
    _ <- context.execute(pipeline, Seq(inputBuffer), Seq(outputBuffer), 1024)
    resultBuffer <- outputBuffer.copyToHost()
  } yield {
    // Process results
    val results = Array.fill(1024)(0)
    for (i <- 0 until 1024) {
      results(i) = resultBuffer.getInt()
    }
    
    println("Results: " + results.take(10).mkString(", "))
    
    // Cleanup resources
    pipeline.close()
    shader.close()
    inputBuffer.close()
    outputBuffer.close()
  }
} finally {
  context.close()
}

Complex Multi-Buffer Example

// Create multiple buffers for complex computation
val inputBuffer1 = context.createIntBuffer(data1, isHostVisible = true).get
val inputBuffer2 = context.createIntBuffer(data2, isHostVisible = true).get
val outputBuffer = context.createBuffer(4 * resultSize).get

// Execute with multiple input buffers
context.execute(
  pipeline,
  Seq(inputBuffer1, inputBuffer2),  // Multiple input buffers
  Seq(outputBuffer),
  elemCount
) match {
  case Success(_) =>
    val result = outputBuffer.copyToHost().get
    // Process result...
    
  case Failure(e) =>
    println(s"Execution failed: ${e.getMessage}")
    e.printStackTrace()
}

Direct Buffer Mapping

// For host-visible buffers, directly map memory
inputBuffer.map() match {
  case Success(mapped) =>
    // Write directly to mapped GPU memory
    for (i <- 0 until 1024) {
      mapped.putInt(i * 4, i * 2)  // Double each value
    }
    // Unmap happens automatically when buffer is closed
    
  case Failure(e) =>
    println(s"Failed to map buffer: ${e.getMessage}")
}

Current Issues I'm Working On

As this is just the first version, there are some issues I'm still figuring out:

  1. Memory Management: Right now you have to close resources in reverse order, which is annoying:

    // Close in reverse order of creation!
    outputBuffer.close()
    inputBuffer.close()
    pipeline.close()
    shader.close()
    context.close()
  2. Shader Support: Only compute shaders supported, but we'll need to expand this for the full rendering pipeline.

  3. Error Reporting: The stack traces for Vulkan errors could be more helpful.

  4. Buffer Sizing: Buffers must be pre-sized correctly, no resizing

I'm hoping to get time to work on the resource auto-closure system next week!

Please provide any feedback or suggest fixes! I'm especially looking for insights on improving the API design and memory management approach.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant