API reference documentation is available here.
Adding to your Xcode project
- Open project settings > Package Dependencies
- Click the + button to add a package dependency
- Enter SDK URL (
https://github.com/HumeAI/hume-swift-sdk.git) - Set the version rule (we recommend pinning to a specific version)
- Click "Add Package"
- Add the
Privacy - Microphone Usage Descriptionentry to yourInfo.plist - (Optional) If you plan to support background audio, select the "Audio, Airplay, and Picture and Picture" option in the "Background Modes" section of your project capabilities.
Adding to your Package.swift
dependencies: [
.package(url: "https://github.com/HumeAI/hume-swift-sdk.git", from: "x.x.x")
]
The SDK provides a VoiceProvider abstraction that manages active socket connection against the /chat endpoint. This abstraction handles and coordinates the audio stack.
Capabilities
- Pipes output audio from
audio_outputevents into SoundPlayer to play back in realtime. VoiceProvider.connect(...)opens and connects to the/chatsocket, waits for thechat_metadataevent to be received, and starts the microphone.VoiceProvider.disconnect()closes the socket, stops the microphone, and stops all playback.
Example
import Hume
let token = try await myAccessTokenClient.fetchAccessToken()
humeClient = HumeClient(options: .accessToken(token: token))
let voiceProvider = VoiceProviderFactory.getVoiceProvider(client: humeClient)
voiceProvider.delegate = myDelegate
// Request permission to record audio. Be sure to add `Privacy - Microphone Usage Description`
// to your Info.plist
if MicrophonePermission.current == .undetermined {
let granted = await MicrophonePermission.requestPermissions()
guard granted else {
print("user declined mic permsissions")
return
}
} else if MicrophonePermission.current == .denied {
print("user previously declined mic permissions") // ask user to update in settings
return
}
let sessionSettings = SessionSettings(
systemPrompt: "my optional system prompt",
variables: ["myCustomVariable": myValue, "datetime": Date().formattedForSessionSettings()])
try await voiceProvider.connect(
configId: myConfigId,
configVersion: nil,
sessionSettings: sessionSettings)
// Sending user text input manually
await self.voiceProvider.sendUserInput(message: "Hey, how are you?")Implement VoiceProviderDelegate methods to be notified of events, errors, meter data, state, etc.
Example
import Hume
let token = try await myAccessTokenClient.fetchAccessToken()
humeClient = HumeClient(options: .accessToken(token: token))
let ttsClient = humeClient.tts
let postedUtterances: [PostedUtterance] = [PostedUtterance(
description: voiceDescription,
speed: speed,
trailingSilence: trailingSilence,
text: text,
voice: .postedUtteranceVoiceWithId(PostedUtteranceVoiceWithId(id: "<config ID>", provider: .humeAi))
)]
let fmt = .wav(FormatWav()
let request = PostedTts(
context: nil,
numGenerations: 1,
splitUtterances: nil,
stripHeaders: nil,
utterances: postedUtterances,
instantMode: true,
format: fmt)
let stream = tts.synthesizeFileStreaming(request: request)
for try await data in stream {
// convert data to SoundClip
guard let soundClip = SoundClip.from(data) else {
print("warn: failed to create sound clip")
return
}
// play SoundClip with ttsPlayer
try await ttsPlayer.play(soundClip: soundClip, format: fmt)
_data.append(data)
}This SDK is in beta, and there may be breaking changes between versions without a major version update. Therefore, we recommend pinning the package version to a specific version. This way, you can install the same version each time without breaking changes.
- Audio interruptions (e.g. phone calls) are not yet handled.
- Manually starting/stopping
AVAudioSessionwill likely break an active voice session. Leave all audio handling toAudioHub. If you need to add your own output audio nodes, see `AudioHub.addNode(_:) - Input metering is not yet implemented.
