[Feat] OpenAI Realtime API 웹소켓으로 AI 전화 회화 기능 구현 #92

HI-JIN2 · 2025-07-15T07:57:40Z

⭐️ 변경된 내용

1. OpenAI Realtime API 통합

WebSocket 기반 실시간 통신 구현
- OpenAI Realtime API와의 WebSocket 연결 구현
- client_secret 토큰을 받아와서 세션 생성 및 연결
- 실시간 오디오 스트리밍 지원
오디오 스트리밍 기능
- 마이크 입력을 PCM 형식으로 실시간 녹음 및 전송
- AI 응답 오디오를 Base64 디코딩하여 실시간 재생
- 오디오 큐 시스템으로 겹침 문제 해결
- AudioRecord/AudioTrack을 활용한 저수준 오디오 처리
메시지 처리
- response.audio.delta 메시지 처리로 스트리밍 오디오 재생
- session.created, response.completed 등 다양한 이벤트 처리
- PCM 오디오 버퍼 append 및 commit 로직 구현

2. AI 전화 회화 기능 구현

새로운 모듈 추가: presentation/ai_conversation
- AI 전화 회화 전용 모듈 생성 및 설정
화면 구성
- 리스트 화면 (AiConversationListScreen): 캐릭터 선택 화면 (새롬, 길동)
- 전화 화면 (AiConversationScreen): 실제 AI와의 전화 통화 화면
- 종료 화면 (AiConversationEndScreen): 통화 종료 후 화면
캐릭터 시스템
- SaegilCharacter enum으로 캐릭터 관리
- 각 캐릭터별 이미지, 닉네임, 코멘트 설정

3. 데이터 레이어 구현

Repository 패턴
- RealTimeRepository 및 RealTimeRepositoryImpl 구현
- AssistantRepository에 Realtime API 토큰 조회 기능 추가
UseCase 구현
- GetRealTimeTokenUsecase: Realtime API 토큰 조회
- StartRealtimeChatUseCase: 실시간 채팅 시작
- EndRealtimeChatUseCase: 실시간 채팅 종료
서비스 레이어
- RealTimeService 및 RealTimeServiceImpl: WebSocket 통신 및 오디오 처리
- RealtimeMessageSender: Realtime API 메시지 포맷팅 및 전송
- GetRealTimeApiTokenReponse: API 응답 모델

4. 네비게이션 개선

AI 전화 회화 화면들을 네비게이션 그래프에 추가
화면 간 이동 로직 구현

5. 기타 수정사항

News 화면 UI 깨짐 현상 수정
Onboarding 화면 코멘트 수정
빌드 설정 및 네비게이션 관련 버그 수정

📌 이 부분은 꼭 봐주세요!

오디오 권한 필요: AI 전화 회화 기능 사용 시 RECORD_AUDIO 권한이 필요합니다.
API 키 설정: BuildConfig.OPEN_AI_API_KEY에 OpenAI API 키가 설정되어 있어야 합니다.
오디오 처리:
- 샘플 레이트: 16kHz
- 채널: 모노
- 포맷: PCM 16bit
- 오디오 버퍼는 10개 청크마다 commit됩니다.
WebSocket 연결:
- 모델: gpt-4o-realtime-preview-2024-12-17
- 모달리티: audio만 사용
테스트 필요: 실제 디바이스에서 마이크 권한 및 오디오 재생 테스트가 필요합니다.

🏞️ 스크린샷 (Optional)

gemini-code-assist

Summary of Changes

Hello @HI-JIN2, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new 'AI Conversation' feature, enabling real-time voice interactions with AI characters. It leverages Ktor WebSockets and the OpenAI real-time API for seamless audio streaming and integrates a new dedicated UI module for character selection and the conversation experience. The application's navigation and onboarding have been updated to highlight this new core functionality.

Highlights

New AI Conversation Feature: A new 'AI Conversation' module (:presentation:ai_conversation) has been implemented, enabling users to engage in real-time voice conversations with AI characters.
Real-time Audio Streaming Integration: Ktor WebSockets and the OpenAI real-time API are now integrated to provide bidirectional audio streaming. This includes fetching a real-time API token, sending user audio (PCM data), and playing back AI-generated audio.
Updated App Navigation and Onboarding: The main application's navigation graph and onboarding flow have been updated to include and prioritize the new AI Conversation feature, making it a primary entry point for users.
Dependency and Network Configuration: Necessary Ktor client, WebSocket, and serialization dependencies have been added, along with an OpenAI API key configuration, to support the new real-time communication infrastructure. The Ktor HTTP client engine was also switched to CIO for improved performance.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a real-time AI conversation feature. The review identified critical issues such as potential null pointer exceptions, incorrect URL construction, and unsafe type casting. Additionally, architectural concerns were raised regarding duplicated logic and misplaced dependencies. Addressing these points will significantly improve the stability and maintainability of the new feature.

gemini-code-assist · 2025-07-15T08:01:30Z

domain/src/main/java/com/saegil/domain/usecase/GetRealTimeTokenUsecase.kt

+    suspend operator fun invoke(): String {
+       val token =  assistantRepository.getRealTimeApiToken()
+        Log.d("token", token.toString())
+        return token.toString()
+    }


This implementation has a critical flaw. assistantRepository.getRealTimeApiToken() returns a String?. If it returns null, token.toString() will produce the literal string "null". Returning "null" as a valid token will likely cause failures in subsequent operations.

You should handle the null case explicitly to ensure contract correctness. Throwing an exception is a good way to signal that a required value could not be obtained.

Also, the Log.d call should be removed from production code.

suspend operator fun invoke(): String { return assistantRepository.getRealTimeApiToken() ?: throw IllegalStateException("Failed to retrieve real-time API token.") }

gemini-code-assist · 2025-07-15T08:01:30Z

data/src/main/java/com/saegil/data/model/GetRealTimeApiTokenReponse.kt

+    val input_audio_format: String,
+    val output_audio_format: String,
+    val input_audio_transcription: InputAudioTranscription,
+    val turn_detection: String,  // null 허용


The comment indicates that turn_detection can be null, but its type is a non-nullable String. This will cause a JsonDecodingException if the API returns null for this field. To prevent a crash, the type should be made nullable.

val turn_detection: String?

gemini-code-assist · 2025-07-15T08:01:30Z

...conversation/src/main/java/com/saegil/ai_conversation/aiconversation/AiConversationScreen.kt

+            Spacer(modifier = Modifier.height(60.dp))
+
+            Image(
+                painterResource(character!!.img),


The character parameter can be null if the navigation argument is invalid or missing. Using the non-null assertion operator (!!) will cause a NullPointerException and crash the app in this scenario.

You must handle the null case gracefully. For instance, you could show an error message and navigate back, or display a default/loading state. This also applies to other places where character is force-unwrapped.

gemini-code-assist · 2025-07-15T08:01:30Z

data/src/main/java/com/saegil/data/remote/RealTimeServiceImpl.kt

+                host = "api.openai.com",
+                path = "/v1/realtime/sessions/$secret",
+                request = {
+                    url("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17&modalities=audio")


The WebSocket URL construction is incorrect. The path parameter, which contains the session secret, is being ignored because the url() function call with a hardcoded string overrides it. This will cause the connection to use the wrong endpoint and fail, as the required session secret is not included in the URL.

Please remove the hardcoded url() call and let Ktor construct the URL from the host and path parameters. If query parameters are needed, they should be added within a url builder block.

gemini-code-assist · 2025-07-15T08:01:30Z

data/src/main/java/com/saegil/data/remote/RealTimeServiceImpl.kt

+            if (read > 0) {
+                val audioChunk = buffer.copyOf(read)
+
+                CoroutineScope(Dispatchers.IO).launch {


Creating a new CoroutineScope for every operation is an anti-pattern that can lead to unmanaged coroutines and potential resource leaks.

Since startAudioStreaming is already running inside a coroutine on Dispatchers.IO, you should declare it as a suspend function. This would allow you to call other suspend functions like RealtimeMessageSender.sendPcmAudio directly, without launching a new, separate coroutine. This advice applies to other similar CoroutineScope usages in this file.

gemini-code-assist · 2025-07-15T08:01:31Z

data/src/main/java/com/saegil/data/model/GetRealTimeApiTokenReponse.kt

+//@Serializable
+//data class TurnDetection(
+//    // 현재 예시에선 null이므로 생략 가능. 나중에 구조 생기면 필드 추가
+//)
+//
+//@Serializable
+//data class Tool(
+//    // 현재 예시에선 빈 객체 리스트, 추후 구조 생기면 필드 추가
+//)


This commented-out code should be removed to improve code clarity and maintainability.

gemini-code-assist · 2025-07-15T08:01:31Z

data/src/main/java/com/saegil/data/remote/RealTimeServiceImpl.kt

+                }
+            }
+        } catch (e: Exception) {
+            println("❌ WebSocket error: ${e.message}")


Using println for logging is not recommended for production code. It lacks severity levels and can't be easily disabled in release builds. Please use the standard Android Log class (e.g., Log.e(...)) for better error reporting and filtering.

Suggested change

println("❌ WebSocket error: ${e.message}")

android.util.Log.e("RealTimeService", "WebSocket error", e)

gemini-code-assist · 2025-07-15T08:01:31Z

data/src/main/java/com/saegil/data/remote/RealTimeServiceImpl.kt

+
+                for (frame in incoming) {
+                    if (frame is Frame.Text) {
+                         val json = frame.readText()


There appears to be a typo with an extra space before val. Please remove it for correct formatting.

Suggested change

val json = frame.readText()

val json = frame.readText()

gemini-code-assist · 2025-07-15T08:01:31Z

domain/src/main/java/com/saegil/domain/usecase/GetRealTimeTokenUsecase.kt

+import com.saegil.domain.model.Recruitment
+import com.saegil.domain.repository.AssistantRepository
+import com.saegil.domain.repository.MapRepository
+import kotlinx.coroutines.flow.Flow
+import java.lang.reflect.Constructor


These imports are unused and should be removed to keep the code clean.

gemini-code-assist · 2025-07-15T08:01:32Z

presentation/news/src/main/java/com/saegil/news/newsquiz/NewsQuizScreen.kt

        modifier = modifier
-            .fillMaxSize(),
+            .verticalScroll(rememberScrollState())
+        .fillMaxSize(),


The indentation of this modifier is inconsistent with the line above it, which harms code readability. Please align it correctly with the rest of the modifier chain.

Suggested change

.fillMaxSize(),

.fillMaxSize(),

HI-JIN2 added 15 commits July 15, 2025 03:15

feat: open ai의 realtime api 사용을 위한 client_secret value를 받아오고, 파싱함

2121ab0

feat: 소켓 연결

7e27ec9

feat: audio 소켓 통신 (+연결, 종료)

c5224ba

feat: 소켓 통신 오디오 재생 구현

62828fa

fix: 큐에 넣어서 오디오 겹침 해결

c2e6f6b

fix: 오디오 재생 좀더 자연스럽게

82f1e23

feat: 전화 화면 디테일

538290e

feat: 리스트 화면 디테일

266fbea

fix: 네비게이션 수정

b7756eb

feat: Ai 전화 회화 리스트 화면 디테일

0871895

fix: key

8af8278

fix: 리베이스 오류

9845e31

chore: build config

3a49ef0

chore: onboarding comment

fb16d31

chore: news UI 깨지는거 수정

ed04ebf

gemini-code-assist bot reviewed Jul 15, 2025

View reviewed changes

HI-JIN2 changed the title ~~Feat/realtime api~~ [Feat] OpenAI Realtime API 웹소켓으로 AI 전화 회화 기능 구현 Nov 30, 2025

HI-JIN2 marked this pull request as ready for review December 3, 2025 08:10

	println("❌ WebSocket error: ${e.message}")
	android.util.Log.e("RealTimeService", "WebSocket error", e)

[Feat] OpenAI Realtime API 웹소켓으로 AI 전화 회화 기능 구현 #92

Are you sure you want to change the base?

[Feat] OpenAI Realtime API 웹소켓으로 AI 전화 회화 기능 구현 #92

Uh oh!

Conversation

HI-JIN2 commented Jul 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⭐️ 변경된 내용

1. OpenAI Realtime API 통합

2. AI 전화 회화 기능 구현

3. 데이터 레이어 구현

4. 네비게이션 개선

5. 기타 수정사항

📌 이 부분은 꼭 봐주세요!

🏞️ 스크린샷 (Optional)

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Jul 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

HI-JIN2 commented Jul 15, 2025 •

edited

Loading