-
Notifications
You must be signed in to change notification settings - Fork 0
[Feat] OpenAI Realtime API 웹소켓으로 AI 전화 회화 기능 구현 #92
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @HI-JIN2, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a significant new 'AI Conversation' feature, enabling real-time voice interactions with AI characters. It leverages Ktor WebSockets and the OpenAI real-time API for seamless audio streaming and integrates a new dedicated UI module for character selection and the conversation experience. The application's navigation and onboarding have been updated to highlight this new core functionality.
Highlights
- New AI Conversation Feature: A new 'AI Conversation' module (
:presentation:ai_conversation) has been implemented, enabling users to engage in real-time voice conversations with AI characters. - Real-time Audio Streaming Integration: Ktor WebSockets and the OpenAI real-time API are now integrated to provide bidirectional audio streaming. This includes fetching a real-time API token, sending user audio (PCM data), and playing back AI-generated audio.
- Updated App Navigation and Onboarding: The main application's navigation graph and onboarding flow have been updated to include and prioritize the new AI Conversation feature, making it a primary entry point for users.
- Dependency and Network Configuration: Necessary Ktor client, WebSocket, and serialization dependencies have been added, along with an OpenAI API key configuration, to support the new real-time communication infrastructure. The Ktor HTTP client engine was also switched to CIO for improved performance.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a real-time AI conversation feature. The review identified critical issues such as potential null pointer exceptions, incorrect URL construction, and unsafe type casting. Additionally, architectural concerns were raised regarding duplicated logic and misplaced dependencies. Addressing these points will significantly improve the stability and maintainability of the new feature.
| suspend operator fun invoke(): String { | ||
| val token = assistantRepository.getRealTimeApiToken() | ||
| Log.d("token", token.toString()) | ||
| return token.toString() | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation has a critical flaw. assistantRepository.getRealTimeApiToken() returns a String?. If it returns null, token.toString() will produce the literal string "null". Returning "null" as a valid token will likely cause failures in subsequent operations.
You should handle the null case explicitly to ensure contract correctness. Throwing an exception is a good way to signal that a required value could not be obtained.
Also, the Log.d call should be removed from production code.
suspend operator fun invoke(): String {
return assistantRepository.getRealTimeApiToken()
?: throw IllegalStateException("Failed to retrieve real-time API token.")
}| val input_audio_format: String, | ||
| val output_audio_format: String, | ||
| val input_audio_transcription: InputAudioTranscription, | ||
| val turn_detection: String, // null 허용 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Spacer(modifier = Modifier.height(60.dp)) | ||
|
|
||
| Image( | ||
| painterResource(character!!.img), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The character parameter can be null if the navigation argument is invalid or missing. Using the non-null assertion operator (!!) will cause a NullPointerException and crash the app in this scenario.
You must handle the null case gracefully. For instance, you could show an error message and navigate back, or display a default/loading state. This also applies to other places where character is force-unwrapped.
| host = "api.openai.com", | ||
| path = "/v1/realtime/sessions/$secret", | ||
| request = { | ||
| url("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17&modalities=audio") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The WebSocket URL construction is incorrect. The path parameter, which contains the session secret, is being ignored because the url() function call with a hardcoded string overrides it. This will cause the connection to use the wrong endpoint and fail, as the required session secret is not included in the URL.
Please remove the hardcoded url() call and let Ktor construct the URL from the host and path parameters. If query parameters are needed, they should be added within a url builder block.
| if (read > 0) { | ||
| val audioChunk = buffer.copyOf(read) | ||
|
|
||
| CoroutineScope(Dispatchers.IO).launch { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Creating a new CoroutineScope for every operation is an anti-pattern that can lead to unmanaged coroutines and potential resource leaks.
Since startAudioStreaming is already running inside a coroutine on Dispatchers.IO, you should declare it as a suspend function. This would allow you to call other suspend functions like RealtimeMessageSender.sendPcmAudio directly, without launching a new, separate coroutine. This advice applies to other similar CoroutineScope usages in this file.
| //@Serializable | ||
| //data class TurnDetection( | ||
| // // 현재 예시에선 null이므로 생략 가능. 나중에 구조 생기면 필드 추가 | ||
| //) | ||
| // | ||
| //@Serializable | ||
| //data class Tool( | ||
| // // 현재 예시에선 빈 객체 리스트, 추후 구조 생기면 필드 추가 | ||
| //) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| } | ||
| } | ||
| } catch (e: Exception) { | ||
| println("❌ WebSocket error: ${e.message}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using println for logging is not recommended for production code. It lacks severity levels and can't be easily disabled in release builds. Please use the standard Android Log class (e.g., Log.e(...)) for better error reporting and filtering.
| println("❌ WebSocket error: ${e.message}") | |
| android.util.Log.e("RealTimeService", "WebSocket error", e) |
|
|
||
| for (frame in incoming) { | ||
| if (frame is Frame.Text) { | ||
| val json = frame.readText() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| import com.saegil.domain.model.Recruitment | ||
| import com.saegil.domain.repository.AssistantRepository | ||
| import com.saegil.domain.repository.MapRepository | ||
| import kotlinx.coroutines.flow.Flow | ||
| import java.lang.reflect.Constructor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| modifier = modifier | ||
| .fillMaxSize(), | ||
| .verticalScroll(rememberScrollState()) | ||
| .fillMaxSize(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
⭐️ 변경된 내용
1. OpenAI Realtime API 통합
WebSocket 기반 실시간 통신 구현
client_secret토큰을 받아와서 세션 생성 및 연결오디오 스트리밍 기능
메시지 처리
response.audio.delta메시지 처리로 스트리밍 오디오 재생session.created,response.completed등 다양한 이벤트 처리2. AI 전화 회화 기능 구현
새로운 모듈 추가:
presentation/ai_conversation화면 구성
AiConversationListScreen): 캐릭터 선택 화면 (새롬, 길동)AiConversationScreen): 실제 AI와의 전화 통화 화면AiConversationEndScreen): 통화 종료 후 화면캐릭터 시스템
SaegilCharacterenum으로 캐릭터 관리3. 데이터 레이어 구현
Repository 패턴
RealTimeRepository및RealTimeRepositoryImpl구현AssistantRepository에 Realtime API 토큰 조회 기능 추가UseCase 구현
GetRealTimeTokenUsecase: Realtime API 토큰 조회StartRealtimeChatUseCase: 실시간 채팅 시작EndRealtimeChatUseCase: 실시간 채팅 종료서비스 레이어
RealTimeService및RealTimeServiceImpl: WebSocket 통신 및 오디오 처리RealtimeMessageSender: Realtime API 메시지 포맷팅 및 전송GetRealTimeApiTokenReponse: API 응답 모델4. 네비게이션 개선
5. 기타 수정사항
📌 이 부분은 꼭 봐주세요!
오디오 권한 필요: AI 전화 회화 기능 사용 시
RECORD_AUDIO권한이 필요합니다.API 키 설정:
BuildConfig.OPEN_AI_API_KEY에 OpenAI API 키가 설정되어 있어야 합니다.오디오 처리:
WebSocket 연결:
gpt-4o-realtime-preview-2024-12-17audio만 사용테스트 필요: 실제 디바이스에서 마이크 권한 및 오디오 재생 테스트가 필요합니다.
🏞️ 스크린샷 (Optional)