Skip to content

Conversation

@HI-JIN2
Copy link
Contributor

@HI-JIN2 HI-JIN2 commented Jul 15, 2025

⭐️ 변경된 내용

1. OpenAI Realtime API 통합

  • WebSocket 기반 실시간 통신 구현

    • OpenAI Realtime API와의 WebSocket 연결 구현
    • client_secret 토큰을 받아와서 세션 생성 및 연결
    • 실시간 오디오 스트리밍 지원
  • 오디오 스트리밍 기능

    • 마이크 입력을 PCM 형식으로 실시간 녹음 및 전송
    • AI 응답 오디오를 Base64 디코딩하여 실시간 재생
    • 오디오 큐 시스템으로 겹침 문제 해결
    • AudioRecord/AudioTrack을 활용한 저수준 오디오 처리
  • 메시지 처리

    • response.audio.delta 메시지 처리로 스트리밍 오디오 재생
    • session.created, response.completed 등 다양한 이벤트 처리
    • PCM 오디오 버퍼 append 및 commit 로직 구현

2. AI 전화 회화 기능 구현

  • 새로운 모듈 추가: presentation/ai_conversation

    • AI 전화 회화 전용 모듈 생성 및 설정
  • 화면 구성

    • 리스트 화면 (AiConversationListScreen): 캐릭터 선택 화면 (새롬, 길동)
    • 전화 화면 (AiConversationScreen): 실제 AI와의 전화 통화 화면
    • 종료 화면 (AiConversationEndScreen): 통화 종료 후 화면
  • 캐릭터 시스템

    • SaegilCharacter enum으로 캐릭터 관리
    • 각 캐릭터별 이미지, 닉네임, 코멘트 설정

3. 데이터 레이어 구현

  • Repository 패턴

    • RealTimeRepositoryRealTimeRepositoryImpl 구현
    • AssistantRepository에 Realtime API 토큰 조회 기능 추가
  • UseCase 구현

    • GetRealTimeTokenUsecase: Realtime API 토큰 조회
    • StartRealtimeChatUseCase: 실시간 채팅 시작
    • EndRealtimeChatUseCase: 실시간 채팅 종료
  • 서비스 레이어

    • RealTimeServiceRealTimeServiceImpl: WebSocket 통신 및 오디오 처리
    • RealtimeMessageSender: Realtime API 메시지 포맷팅 및 전송
    • GetRealTimeApiTokenReponse: API 응답 모델

4. 네비게이션 개선

  • AI 전화 회화 화면들을 네비게이션 그래프에 추가
  • 화면 간 이동 로직 구현

5. 기타 수정사항

  • News 화면 UI 깨짐 현상 수정
  • Onboarding 화면 코멘트 수정
  • 빌드 설정 및 네비게이션 관련 버그 수정

📌 이 부분은 꼭 봐주세요!

  1. 오디오 권한 필요: AI 전화 회화 기능 사용 시 RECORD_AUDIO 권한이 필요합니다.

  2. API 키 설정: BuildConfig.OPEN_AI_API_KEY에 OpenAI API 키가 설정되어 있어야 합니다.

  3. 오디오 처리:

    • 샘플 레이트: 16kHz
    • 채널: 모노
    • 포맷: PCM 16bit
    • 오디오 버퍼는 10개 청크마다 commit됩니다.
  4. WebSocket 연결:

    • 모델: gpt-4o-realtime-preview-2024-12-17
    • 모달리티: audio만 사용
  5. 테스트 필요: 실제 디바이스에서 마이크 권한 및 오디오 재생 테스트가 필요합니다.

🏞️ 스크린샷 (Optional)

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @HI-JIN2, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant new 'AI Conversation' feature, enabling real-time voice interactions with AI characters. It leverages Ktor WebSockets and the OpenAI real-time API for seamless audio streaming and integrates a new dedicated UI module for character selection and the conversation experience. The application's navigation and onboarding have been updated to highlight this new core functionality.

Highlights

  • New AI Conversation Feature: A new 'AI Conversation' module (:presentation:ai_conversation) has been implemented, enabling users to engage in real-time voice conversations with AI characters.
  • Real-time Audio Streaming Integration: Ktor WebSockets and the OpenAI real-time API are now integrated to provide bidirectional audio streaming. This includes fetching a real-time API token, sending user audio (PCM data), and playing back AI-generated audio.
  • Updated App Navigation and Onboarding: The main application's navigation graph and onboarding flow have been updated to include and prioritize the new AI Conversation feature, making it a primary entry point for users.
  • Dependency and Network Configuration: Necessary Ktor client, WebSocket, and serialization dependencies have been added, along with an OpenAI API key configuration, to support the new real-time communication infrastructure. The Ktor HTTP client engine was also switched to CIO for improved performance.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a real-time AI conversation feature. The review identified critical issues such as potential null pointer exceptions, incorrect URL construction, and unsafe type casting. Additionally, architectural concerns were raised regarding duplicated logic and misplaced dependencies. Addressing these points will significantly improve the stability and maintainability of the new feature.

Comment on lines +14 to +18
suspend operator fun invoke(): String {
val token = assistantRepository.getRealTimeApiToken()
Log.d("token", token.toString())
return token.toString()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

This implementation has a critical flaw. assistantRepository.getRealTimeApiToken() returns a String?. If it returns null, token.toString() will produce the literal string "null". Returning "null" as a valid token will likely cause failures in subsequent operations.

You should handle the null case explicitly to ensure contract correctness. Throwing an exception is a good way to signal that a required value could not be obtained.

Also, the Log.d call should be removed from production code.

    suspend operator fun invoke(): String {
       return assistantRepository.getRealTimeApiToken()
           ?: throw IllegalStateException("Failed to retrieve real-time API token.")
    }

val input_audio_format: String,
val output_audio_format: String,
val input_audio_transcription: InputAudioTranscription,
val turn_detection: String, // null 허용

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The comment indicates that turn_detection can be null, but its type is a non-nullable String. This will cause a JsonDecodingException if the API returns null for this field. To prevent a crash, the type should be made nullable.

    val turn_detection: String?

Spacer(modifier = Modifier.height(60.dp))

Image(
painterResource(character!!.img),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The character parameter can be null if the navigation argument is invalid or missing. Using the non-null assertion operator (!!) will cause a NullPointerException and crash the app in this scenario.

You must handle the null case gracefully. For instance, you could show an error message and navigate back, or display a default/loading state. This also applies to other places where character is force-unwrapped.

Comment on lines +41 to +44
host = "api.openai.com",
path = "/v1/realtime/sessions/$secret",
request = {
url("wss://api.openai.com/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17&modalities=audio")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The WebSocket URL construction is incorrect. The path parameter, which contains the session secret, is being ignored because the url() function call with a hardcoded string overrides it. This will cause the connection to use the wrong endpoint and fail, as the required session secret is not included in the URL.

Please remove the hardcoded url() call and let Ktor construct the URL from the host and path parameters. If query parameters are needed, they should be added within a url builder block.

if (read > 0) {
val audioChunk = buffer.copyOf(read)

CoroutineScope(Dispatchers.IO).launch {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Creating a new CoroutineScope for every operation is an anti-pattern that can lead to unmanaged coroutines and potential resource leaks.

Since startAudioStreaming is already running inside a coroutine on Dispatchers.IO, you should declare it as a suspend function. This would allow you to call other suspend functions like RealtimeMessageSender.sendPcmAudio directly, without launching a new, separate coroutine. This advice applies to other similar CoroutineScope usages in this file.

Comment on lines +35 to +43
//@Serializable
//data class TurnDetection(
// // 현재 예시에선 null이므로 생략 가능. 나중에 구조 생기면 필드 추가
//)
//
//@Serializable
//data class Tool(
// // 현재 예시에선 빈 객체 리스트, 추후 구조 생기면 필드 추가
//)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This commented-out code should be removed to improve code clarity and maintainability.

}
}
} catch (e: Exception) {
println("❌ WebSocket error: ${e.message}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using println for logging is not recommended for production code. It lacks severity levels and can't be easily disabled in release builds. Please use the standard Android Log class (e.g., Log.e(...)) for better error reporting and filtering.

Suggested change
println("WebSocket error: ${e.message}")
android.util.Log.e("RealTimeService", "WebSocket error", e)


for (frame in incoming) {
if (frame is Frame.Text) {
val json = frame.readText()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There appears to be a typo with an extra space before val. Please remove it for correct formatting.

Suggested change
val json = frame.readText()
val json = frame.readText()

Comment on lines +4 to +8
import com.saegil.domain.model.Recruitment
import com.saegil.domain.repository.AssistantRepository
import com.saegil.domain.repository.MapRepository
import kotlinx.coroutines.flow.Flow
import java.lang.reflect.Constructor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

These imports are unused and should be removed to keep the code clean.

modifier = modifier
.fillMaxSize(),
.verticalScroll(rememberScrollState())
.fillMaxSize(),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The indentation of this modifier is inconsistent with the line above it, which harms code readability. Please align it correctly with the rest of the modifier chain.

Suggested change
.fillMaxSize(),
.fillMaxSize(),

@HI-JIN2 HI-JIN2 changed the title Feat/realtime api [Feat] OpenAI Realtime API 웹소켓으로 AI 전화 회화 기능 구현 Nov 30, 2025
@HI-JIN2 HI-JIN2 marked this pull request as ready for review December 3, 2025 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants