Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[IDLE-531] 배치 작업 정상화 및 최적화 #260

Merged
merged 7 commits into from
Jan 30, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
package com.swm.idle.batch.common.scheduler
package com.swm.idle.batch.common.launcher

import com.swm.idle.batch.job.CrawlingJobConfig
import com.swm.idle.batch.job.JobConfig
import org.springframework.batch.core.JobParameters
import org.springframework.batch.core.JobParametersBuilder
import org.springframework.batch.core.configuration.JobRegistry
import org.springframework.batch.core.launch.JobLauncher
import org.springframework.scheduling.annotation.Scheduled
import org.springframework.stereotype.Component

@Component
class CrawlingJobScheduler(
class CrawlingJobLauncher(
private val jobLauncher: JobLauncher,
private val crawlingJobConfig: CrawlingJobConfig,
private val jobRegistry: JobRegistry,
private val crawlingJobConfig: JobConfig,
) {

@Scheduled(cron = "0 0 23 * * *")
Expand All @@ -22,4 +24,11 @@ class CrawlingJobScheduler(
jobLauncher.run(crawlingJobConfig.crawlingJob(), jobParameters)
}

fun jobStart() {
val jobParameters: JobParameters = JobParametersBuilder()
.addLong("timestamp", System.currentTimeMillis())
.toJobParameters()

jobLauncher.run(jobRegistry.getJob("crawlingJob"), jobParameters)
}
Comment on lines +27 to +33
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

코드 중복 제거 및 예외 처리 개선 필요

  1. scheduleJob()jobStart() 메소드에서 JobParameters 생성 로직이 중복됩니다.
  2. "crawlingJob"이 매직 스트링으로 사용되고 있습니다.
  3. jobStart()에서 예외 처리가 누락되었습니다.

다음과 같은 개선을 제안합니다:

+    private fun createJobParameters(): JobParameters =
+        JobParametersBuilder()
+            .addLong("timestamp", System.currentTimeMillis())
+            .toJobParameters()
+
+    private companion object {
+        const val CRAWLING_JOB_NAME = "crawlingJob"
+    }
+
     fun jobStart() {
-        val jobParameters: JobParameters = JobParametersBuilder()
-            .addLong("timestamp", System.currentTimeMillis())
-            .toJobParameters()
-
-        jobLauncher.run(jobRegistry.getJob("crawlingJob"), jobParameters)
+        runCatching {
+            jobLauncher.run(
+                jobRegistry.getJob(CRAWLING_JOB_NAME),
+                createJobParameters()
+            )
+        }.onFailure { e ->
+            throw BatchJobException("배치 작업 실행 중 오류 발생", e)
+        }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fun jobStart() {
val jobParameters: JobParameters = JobParametersBuilder()
.addLong("timestamp", System.currentTimeMillis())
.toJobParameters()
jobLauncher.run(jobRegistry.getJob("crawlingJob"), jobParameters)
}
private fun createJobParameters(): JobParameters =
JobParametersBuilder()
.addLong("timestamp", System.currentTimeMillis())
.toJobParameters()
private companion object {
const val CRAWLING_JOB_NAME = "crawlingJob"
}
fun jobStart() {
runCatching {
jobLauncher.run(
jobRegistry.getJob(CRAWLING_JOB_NAME),
createJobParameters()
)
}.onFailure { e ->
throw BatchJobException("배치 작업 실행 중 오류 발생", e)
}
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
package com.swm.idle.batch.crawler

enum class CrawlerConsts(val location: String, val value: String) {
CRAWLING_TARGET_URL_FORMAT("CRAWLING_TARGET_URL_FORMAT","https://www.work24.go.kr/wk/a/b/1200/retriveDtlEmpSrchList.do?basicSetupYn=&careerTo=&keywordJobCd=&occupation=&seqNo=&cloDateEndtParam=&payGbn=&templateInfo=&rot2WorkYn=&shsyWorkSecd=&srcKeywordParam=%EC%9A%94%EC%96%91%EB%B3%B4%ED%98%B8%EC%82%AC&resultCnt=50&keywordJobCont=&cert=&moreButtonYn=Y&minPay=&codeDepth2Info=11000&currentPageNo=1&eventNo=&mode=&major=&resrDutyExcYn=&eodwYn=&sortField=DATE&staArea=&sortOrderBy=DESC&keyword=%EC%9A%94%EC%96%91%EB%B3%B4%ED%98%B8%EC%82%AC&termSearchGbn=all&carrEssYns=&benefitSrchAndOr=O&disableEmpHopeGbn=&actServExcYn=&keywordStaAreaNm=&maxPay=&emailApplyYn=&codeDepth1Info=11000&keywordEtcYn=&regDateStdtParam={yesterday}&publDutyExcYn=&keywordJobCdSeqNo=&viewType=&exJobsCd=&templateDepthNmInfo=&region=&employGbn=&empTpGbcd=&computerPreferential=&infaYn=&cloDateStdtParam=&siteClcd=WORK&searchMode=Y&birthFromYY=&indArea=&careerTypes=&subEmpHopeYn=&tlmgYn=&academicGbn=&templateDepthNoInfo=&foriegn=&entryRoute=&mealOfferClcd=&basicSetupYnChk=&station=&holidayGbn=&srcKeyword=%EC%9A%94%EC%96%91%EB%B3%B4%ED%98%B8%EC%82%AC&academicGbnoEdu=noEdu&enterPriseGbn=all&cloTermSearchGbn=all&birthToYY=&keywordWantedTitle=&stationNm=&benefitGbn=&notSrcKeywordParam=&keywordFlag=&notSrcKeyword=&essCertChk=&depth2SelCode=&keywordBusiNm=&preferentialGbn=&rot3WorkYn=&regDateEndtParam={yesterday}&pfMatterPreferential=&pageIndex={pageIndex}&termContractMmcnt=&careerFrom=&laborHrShortYn=#scrollLoc"),
JOB_POSTING_COUNT_PER_PAGE("JOB_POSTING_COUNT_PER_PAGE","50"),
Comment on lines +4 to +5
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

URL 및 페이지 크기가 하드코딩되어 있음

URL과 페이지 크기가 하드코딩되어 있어 환경별 설정이 어렵습니다.

설정을 외부화하고 환경별로 다르게 설정할 수 있도록 수정이 필요합니다:

-    CRAWLING_TARGET_URL_FORMAT("CRAWLING_TARGET_URL_FORMAT","https://www.work24.go.kr/wk/a/b/1200/retriveDtlEmpSrchList.do?basicSetupYn=&careerTo=&keywordJobCd=&occupation=&seqNo=&cloDateEndtParam=&payGbn=&templateInfo=&rot2WorkYn=&shsyWorkSecd=&srcKeywordParam=%EC%9A%94%EC%96%91%EB%B3%B4%ED%98%B8%EC%82%AC&resultCnt=50&keywordJobCont=&cert=&moreButtonYn=Y&minPay=&codeDepth2Info=11000&currentPageNo=1&eventNo=&mode=&major=&resrDutyExcYn=&eodwYn=&sortField=DATE&staArea=&sortOrderBy=DESC&keyword=%EC%9A%94%EC%96%91%EB%B3%B4%ED%98%B8%EC%82%AC&termSearchGbn=all&carrEssYns=&benefitSrchAndOr=O&disableEmpHopeGbn=&actServExcYn=&keywordStaAreaNm=&maxPay=&emailApplyYn=&codeDepth1Info=11000&keywordEtcYn=&regDateStdtParam={yesterday}&publDutyExcYn=&keywordJobCdSeqNo=&viewType=&exJobsCd=&templateDepthNmInfo=&region=&employGbn=&empTpGbcd=&computerPreferential=&infaYn=&cloDateStdtParam=&siteClcd=WORK&searchMode=Y&birthFromYY=&indArea=&careerTypes=&subEmpHopeYn=&tlmgYn=&academicGbn=&templateDepthNoInfo=&foriegn=&entryRoute=&mealOfferClcd=&basicSetupYnChk=&station=&holidayGbn=&srcKeyword=%EC%9A%94%EC%96%91%EB%B3%B4%ED%98%B8%EC%82%AC&academicGbnoEdu=noEdu&enterPriseGbn=all&cloTermSearchGbn=all&birthToYY=&keywordWantedTitle=&stationNm=&benefitGbn=&notSrcKeywordParam=&keywordFlag=&notSrcKeyword=&essCertChk=&depth2SelCode=&keywordBusiNm=&preferentialGbn=&rot3WorkYn=&regDateEndtParam={yesterday}&pfMatterPreferential=&pageIndex={pageIndex}&termContractMmcnt=&careerFrom=&laborHrShortYn=#scrollLoc"),
-    JOB_POSTING_COUNT_PER_PAGE("JOB_POSTING_COUNT_PER_PAGE","50"),
+    CRAWLING_TARGET_URL_FORMAT("CRAWLING_TARGET_URL_FORMAT", System.getenv("CRAWLING_TARGET_URL_FORMAT") ?: DEFAULT_URL),
+    JOB_POSTING_COUNT_PER_PAGE("JOB_POSTING_COUNT_PER_PAGE", System.getenv("JOB_POSTING_COUNT_PER_PAGE") ?: "50"),

Committable suggestion skipped: line range outside the PR's diff.

JOB_POSTING_COUNT("JOB_POSTING_COUNT","//*[@id=\"mForm\"]/div[2]/div/div[1]/div[1]/span/span"),

//공고 정보
TITLE("TITLE", "//*[@id=\"contents\"]/div/div/div/div[1]/div[3]/div[1]/div[1]/strong"),
CONTENT("CONTENT", "//*[@id=\"tab-panel01\"]/div[1]/div"),

//근무 정보
PAY_INFO("PAY_INFO", "//*[@id=\"tab-panel02\"]/div/table/tbody/tr[1]/td[2]"),
WORK_TIME("WORK_TIME","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[2]/td"),
WORK_SCHEDULE("WORK_SCHEDULE","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[3]/td[2]"),

//모집 정보
RECRUITMENT_PROCESS("RECRUITMENT_PROCESS","//*[@id=\"tab-panel05\"]/div[2]/div/div[2]/p[1]"),
REQUIRED_DOCUMENT("REQUIRED_DOCUMENT","//*[@id=\"tab-panel05\"]/div[2]/div/div[2]/p[2]"),
APPLY_METHOD("APPLY_METHOD","//*[@id=\"tab-panel05\"]/div[2]/div/div[2]/p[1]"),
APPLY_DEADLINE("APPLY_DEADLINE","//*[@id=\"tab-panel05\"]/div[2]/div/div[1]/div[1]/p"),
CREATED_AT("CREATED_AT","//*[@id=\"contents\"]/div/div/div/div[1]/div[5]/div[11]/div[2]/table/tbody/tr[1]/td[1]"),

//센터 정보
CENTER_NAME("CENTER_NAME","//*[@id=\"contents\"]/div/div/div/div[1]/div[3]/div[1]/div[1]/p/strong"),
CENTER_ADDRESS1("CENTER_ADDRESS1","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
CENTER_ADDRESS2("CENTER_ADDRESs2","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
CENTER_ADDRESS3("CENTER_ADDRESS3","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
Comment on lines +27 to +28
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

오타 수정 필요

CENTER_ADDRESs2에 오타가 있습니다.

다음과 같이 수정해주세요:

-    CENTER_ADDRESS2("CENTER_ADDRESs2","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
+    CENTER_ADDRESS2("CENTER_ADDRESS2","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
CENTER_ADDRESS2("CENTER_ADDRESs2","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
CENTER_ADDRESS3("CENTER_ADDRESS3","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
CENTER_ADDRESS2("CENTER_ADDRESS2","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
CENTER_ADDRESS3("CENTER_ADDRESS3","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),


//노인 주소
CLIENT_ADDRESS1("CLIENT_ADDRESS1","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),
CLIENT_ADDRESS2("CLIENT_ADDRESS2","//*[@id=\"tab-panel02\"]/div/table/tbody/tr[5]/td/div[1]/p"),

//ChromDriver-Options
HEADLESS("HEADLESS","--headless"),
NO_SANDBOX("NO_SANDBOX","--no-sandbox"),
DISABLE_DEV_SHM_USAGE("DISABLE_DEV_SHM_USAGE","--disable-dev-shm-usage"),
DISABLE_GPU("DISABLE_GPU","--disable-gpu"),
WINDOW_SIZE("WINDOW_SIZE","window-size=1920x1080"),
DISABLE_SOFTWARE_RASTERIZER("DISABLE_SOFTWARE_RASTERIZER","--disable-software-rasterizer"),
IGNORE_SSL_ERRORS("IGNORE_SSL_ERRORS","--ignore-ssl-errors=yes"),
IGNORE_CERTIFICATE_ERRORS("IGNORE_CERTIFICATE_ERRORS","--ignore-certificate-errors");

companion object {
fun getChromOptions(): Array<String> {
return arrayOf(
HEADLESS.value,
NO_SANDBOX.value,
DISABLE_DEV_SHM_USAGE.value,
DISABLE_GPU.value,
WINDOW_SIZE.value,
DISABLE_SOFTWARE_RASTERIZER.value,
IGNORE_SSL_ERRORS.value,
IGNORE_CERTIFICATE_ERRORS.value
)
}
}

fun getIntValue(): Int {
return value.toInt()
}
Comment on lines +59 to +61
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

예외 처리 보완 필요

getIntValue 메소드에서 숫자가 아닌 값에 대한 예외 처리가 없습니다.

다음과 같이 예외 처리를 추가해주세요:

     fun getIntValue(): Int {
-        return value.toInt()
+        return try {
+            value.toInt()
+        } catch (e: NumberFormatException) {
+            throw IllegalStateException("값을 정수로 변환할 수 없습니다: $value", e)
+        }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fun getIntValue(): Int {
return value.toInt()
}
fun getIntValue(): Int {
return try {
value.toInt()
} catch (e: NumberFormatException) {
throw IllegalStateException("값을 정수로 변환할 수 없습니다: $value", e)
}
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
package com.swm.idle.batch.crawler

import io.github.oshai.kotlinlogging.KotlinLogging
import org.openqa.selenium.chrome.ChromeDriver
import org.openqa.selenium.chrome.ChromeDriverService
import org.openqa.selenium.chrome.ChromeOptions
import java.io.File

object DriverInitializer {
private val logger = KotlinLogging.logger { }

fun init(): ChromeDriver {
return runCatching {
ChromeDriver(
ChromeDriverService.Builder()
.usingDriverExecutable(File(System.getenv("CHROMEDRIVER_BIN")))
.build()
.also { logger.info { System.getenv("CHROMEDRIVER_BIN") } },
ChromeOptions().apply {
addArguments(*CrawlerConsts.getChromOptions())
setBinary(System.getenv("CHROME_BIN"))
}.also { logger.info { System.getenv("CHROME_BIN")} }
)
}.getOrElse {
logger.error { "ChromeDriver initialization failed: ${it.message}" }
throw RuntimeException("ChromeDriver initialization failed, application will exit.") // 이후 코드가 실행되지 않도록 예외 던짐
}
}
Comment on lines +12 to +28
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

환경 변수 검증 및 리소스 정리 로직 추가 필요

  1. 환경 변수 CHROMEDRIVER_BINCHROME_BIN의 존재 여부와 유효성을 검증해야 합니다.
  2. ChromeDriver 인스턴스의 정리(cleanup) 로직이 필요합니다.
  3. 예외 메시지가 한글로 작성되어 있습니다 (국제화 고려).

다음과 같은 개선을 제안합니다:

+    private fun validateEnvironment() {
+        val requiredEnvVars = listOf("CHROMEDRIVER_BIN", "CHROME_BIN")
+        val missingEnvVars = requiredEnvVars.filter { System.getenv(it).isNullOrBlank() }
+        
+        if (missingEnvVars.isNotEmpty()) {
+            throw IllegalStateException("Required environment variables not set: $missingEnvVars")
+        }
+    }
+
     fun init(): ChromeDriver {
+        validateEnvironment()
         return runCatching {
             ChromeDriver(
                 ChromeDriverService.Builder()
                    .usingDriverExecutable(File(System.getenv("CHROMEDRIVER_BIN")))
                    .build()
                    .also { logger.info { System.getenv("CHROMEDRIVER_BIN") } },
                ChromeOptions().apply {
                    addArguments(*CrawlerConsts.getChromOptions())
                    setBinary(System.getenv("CHROME_BIN"))
                }.also { logger.info { System.getenv("CHROME_BIN")} }
            )
         }.getOrElse {
             logger.error { "ChromeDriver initialization failed: ${it.message}" }
-            throw RuntimeException("ChromeDriver initialization failed, application will exit.") // 이후 코드가 실행되지 않도록 예외 던짐
+            throw RuntimeException("ChromeDriver initialization failed, application will exit.")
         }
     }
+
+    fun shutdown(driver: ChromeDriver) {
+        runCatching {
+            driver.quit()
+        }.onFailure {
+            logger.error { "Failed to shutdown ChromeDriver: ${it.message}" }
+        }
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
fun init(): ChromeDriver {
return runCatching {
ChromeDriver(
ChromeDriverService.Builder()
.usingDriverExecutable(File(System.getenv("CHROMEDRIVER_BIN")))
.build()
.also { logger.info { System.getenv("CHROMEDRIVER_BIN") } },
ChromeOptions().apply {
addArguments(*CrawlerConsts.getChromOptions())
setBinary(System.getenv("CHROME_BIN"))
}.also { logger.info { System.getenv("CHROME_BIN")} }
)
}.getOrElse {
logger.error { "ChromeDriver initialization failed: ${it.message}" }
throw RuntimeException("ChromeDriver initialization failed, application will exit.") // 이후 코드가 실행되지 않도록 예외 던짐
}
}
private fun validateEnvironment() {
val requiredEnvVars = listOf("CHROMEDRIVER_BIN", "CHROME_BIN")
val missingEnvVars = requiredEnvVars.filter { System.getenv(it).isNullOrBlank() }
if (missingEnvVars.isNotEmpty()) {
throw IllegalStateException("Required environment variables not set: $missingEnvVars")
}
}
fun init(): ChromeDriver {
validateEnvironment()
return runCatching {
ChromeDriver(
ChromeDriverService.Builder()
.usingDriverExecutable(File(System.getenv("CHROMEDRIVER_BIN")))
.build()
.also { logger.info { System.getenv("CHROMEDRIVER_BIN") } },
ChromeOptions().apply {
addArguments(*CrawlerConsts.getChromOptions())
setBinary(System.getenv("CHROME_BIN"))
}.also { logger.info { System.getenv("CHROME_BIN")} }
)
}.getOrElse {
logger.error { "ChromeDriver initialization failed: ${it.message}" }
throw RuntimeException("ChromeDriver initialization failed, application will exit.")
}
}
fun shutdown(driver: ChromeDriver) {
runCatching {
driver.quit()
}.onFailure {
logger.error { "Failed to shutdown ChromeDriver: ${it.message}" }
}
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
package com.swm.idle.batch.crawler

import com.swm.idle.batch.step.PostingReader
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.support.ui.ExpectedConditions
import org.openqa.selenium.support.ui.WebDriverWait
import java.time.Duration
import java.time.LocalDate
import java.time.format.DateTimeFormatter

class WorknetPageCrawler {
private var driver: WebDriver = DriverInitializer.init()

Comment on lines +12 to +14
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

리소스 관리를 위해 AutoCloseable 구현이 필요합니다.

WebDriver 리소스의 안전한 해제를 보장하기 위해 AutoCloseable을 구현하는 것이 좋습니다.

다음과 같이 수정하는 것을 제안드립니다:

-class WorknetPageCrawler {
+class WorknetPageCrawler : AutoCloseable {
     private var driver: WebDriver = DriverInitializer.init()
+
+    override fun close() {
+        driver.quit()
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
class WorknetPageCrawler {
private var driver: WebDriver = DriverInitializer.init()
class WorknetPageCrawler : AutoCloseable {
private var driver: WebDriver = DriverInitializer.init()
override fun close() {
driver.quit()
}

fun initCounts(reader: PostingReader) {
reader.crawlingUrl = CrawlerConsts.CRAWLING_TARGET_URL_FORMAT.value
.replace("{yesterday}", LocalDate.now().format(DateTimeFormatter.ofPattern("yyyyMMdd")))
.replace("{pageIndex}", "1")

moveToPage(reader)

reader.postingCount = driver
.findElement(By.xpath(CrawlerConsts.JOB_POSTING_COUNT.value))
.text.toInt()
.takeIf { it > 0 }
?: run {
driver.quit()
throw Exception("크롤링 할 공고가 없습니다.")
}

reader.pageCount = (reader.postingCount + CrawlerConsts.JOB_POSTING_COUNT_PER_PAGE.getIntValue() - 1) /
CrawlerConsts.JOB_POSTING_COUNT_PER_PAGE.getIntValue()
reader.lastPageJobPostingCount = reader.postingCount % CrawlerConsts.JOB_POSTING_COUNT_PER_PAGE.getIntValue()
driver.quit()
}

private fun moveToPage(reader: PostingReader) {
driver.get(reader.crawlingUrl)
WebDriverWait(driver, Duration.ofSeconds(10))
.also {
it.until(ExpectedConditions.visibilityOfElementLocated(By.xpath(CrawlerConsts.JOB_POSTING_COUNT.value)))
}
}
Comment on lines +37 to +43
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

타임아웃 설정 개선과 예외 처리가 필요합니다.

WebDriver 타임아웃이 하드코딩되어 있으며, 요소를 찾지 못했을 때의 예외 처리가 부족합니다.

다음과 같이 수정하는 것을 제안드립니다:

+    companion object {
+        private const val DRIVER_TIMEOUT_SECONDS = 10L
+    }

     private fun moveToPage(reader: PostingReader) {
+        try {
             driver.get(reader.crawlingUrl)
-            WebDriverWait(driver, Duration.ofSeconds(10))
+            WebDriverWait(driver, Duration.ofSeconds(DRIVER_TIMEOUT_SECONDS))
                 .also {
                     it.until(ExpectedConditions.visibilityOfElementLocated(By.xpath(CrawlerConsts.JOB_POSTING_COUNT.value)))
                 }
+        } catch (e: Exception) {
+            throw BatchException("페이지 로딩 중 오류가 발생했습니다: ${reader.crawlingUrl}", e)
+        }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private fun moveToPage(reader: PostingReader) {
driver.get(reader.crawlingUrl)
WebDriverWait(driver, Duration.ofSeconds(10))
.also {
it.until(ExpectedConditions.visibilityOfElementLocated(By.xpath(CrawlerConsts.JOB_POSTING_COUNT.value)))
}
}
companion object {
private const val DRIVER_TIMEOUT_SECONDS = 10L
}
private fun moveToPage(reader: PostingReader) {
try {
driver.get(reader.crawlingUrl)
WebDriverWait(driver, Duration.ofSeconds(DRIVER_TIMEOUT_SECONDS))
.also {
it.until(ExpectedConditions.visibilityOfElementLocated(By.xpath(CrawlerConsts.JOB_POSTING_COUNT.value)))
}
} catch (e: Exception) {
throw BatchException("페이지 로딩 중 오류가 발생했습니다: ${reader.crawlingUrl}", e)
}
}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
package com.swm.idle.batch.crawler

import com.swm.idle.batch.common.dto.CrawledJobPostingDto
import io.github.oshai.kotlinlogging.KotlinLogging
import org.openqa.selenium.By
import org.openqa.selenium.WebDriver
import org.openqa.selenium.support.ui.ExpectedConditions
import org.openqa.selenium.support.ui.WebDriverWait
import java.time.Duration
import java.time.LocalDate
import java.time.format.DateTimeFormatter
import org.openqa.selenium.WebElement

class WorknetPostCrawler {
private val logger = KotlinLogging.logger { }
private var driver: WebDriver = DriverInitializer.init()
private var errorCountMap: MutableMap<String, Int> = mutableMapOf()
Comment on lines +16 to +17
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

리소스 관리 개선 필요

WebDriver 인스턴스가 명시적으로 닫히지 않을 수 있으며, 예외 발생 시 리소스 누수가 발생할 수 있습니다.

AutoCloseable을 구현하여 리소스 관리를 개선하세요:

-class WorknetPostCrawler {
+class WorknetPostCrawler : AutoCloseable {
     private val logger = KotlinLogging.logger { }
     private var driver: WebDriver = DriverInitializer.init()
     private var errorCountMap: MutableMap<String, Int> = mutableMapOf()
+
+    override fun close() {
+        try {
+            driver.quit()
+        } catch (e: Exception) {
+            logger.error(e) { "WebDriver 종료 중 오류 발생" }
+        }
+    }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private var driver: WebDriver = DriverInitializer.init()
private var errorCountMap: MutableMap<String, Int> = mutableMapOf()
class WorknetPostCrawler : AutoCloseable {
private val logger = KotlinLogging.logger { }
private var driver: WebDriver = DriverInitializer.init()
private var errorCountMap: MutableMap<String, Int> = mutableMapOf()
override fun close() {
try {
driver.quit()
} catch (e: Exception) {
logger.error(e) { "WebDriver 종료 중 오류 발생" }
}
}


fun crawlPosts(end: Int, url: String): List<CrawledJobPostingDto> {
moveToPage(url)

val crawledPostings = mutableListOf<CrawledJobPostingDto>()
repeat(end) { i ->
val originalWindow = driver.windowHandle
val titleElement = findElementSafe(By.xpath("//*[@id=\"list${i+1}\"]/td[1]/div/div[2]/a")) ?: return@repeat

moveToPostDetailWindow(titleElement, originalWindow)

try {
val post: CrawledJobPostingDto = createPost()
crawledPostings.add(post)
} catch (e: Exception) {
logger.warn { "실패" }
}
Comment on lines +29 to +34
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

예외 처리 개선 필요

예외가 발생했을 때 단순히 "실패"라는 메시지만 로깅하고 있어 디버깅이 어렵습니다.

다음과 같이 예외 처리를 개선하세요:

             try {
                 val post: CrawledJobPostingDto = createPost()
                 crawledPostings.add(post)
             } catch (e: Exception) {
-                logger.warn { "실패" }
+                logger.error(e) { "게시물 크롤링 실패 - URL: ${driver.currentUrl}" }
+                errorCountMap["crawl_failure"] = errorCountMap.getOrDefault("crawl_failure", 0) + 1
             }

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 detekt (1.23.7)

[warning] 32-32: The caught exception is swallowed. The original exception could be lost.

(detekt.exceptions.SwallowedException)


backWindow(originalWindow)
}
errorCountMap.asSequence().forEach { (key, value) -> println("$key -> $value") }
driver.quit()
return crawledPostings
}

private fun moveToPage(url: String) {
driver.get(url)
WebDriverWait(driver, Duration.ofSeconds(10))
.until(
ExpectedConditions.visibilityOfElementLocated(By.cssSelector("#list1"))
)
}

private fun createPost(): CrawledJobPostingDto {
return CrawledJobPostingDto(
title = extractText(CrawlerConsts.TITLE),
content = extractText(CrawlerConsts.CONTENT),
createdAt = extractText(CrawlerConsts.CREATED_AT),
payInfo = extractText(CrawlerConsts.PAY_INFO),
workSchedule = extractText(CrawlerConsts.WORK_SCHEDULE),
recruitmentProcess = extractText(CrawlerConsts.RECRUITMENT_PROCESS),
applyMethod = extractText(CrawlerConsts.APPLY_METHOD),
requiredDocument = extractText(CrawlerConsts.REQUIRED_DOCUMENT),
centerName = extractText(CrawlerConsts.CENTER_NAME),
applyDeadline = extractApplyDeadline(CrawlerConsts.APPLY_DEADLINE),
workTime = extractWorkTime(CrawlerConsts.WORK_TIME),
centerAddress = extractAddress(
CrawlerConsts.CLIENT_ADDRESS1,
CrawlerConsts.CLIENT_ADDRESS2
),
clientAddress = extractAddress(
CrawlerConsts.CENTER_ADDRESS1,
CrawlerConsts.CENTER_ADDRESS2,
CrawlerConsts.CENTER_ADDRESS3
),
directUrl = driver.currentUrl
)
}


private inline fun <T> errorRecord(location: String, action: () -> T): T {
return runCatching { action() }
.getOrElse { e ->
logError(location)
throw e
}
}

private fun findElementSafe(by: By): WebElement? {
return runCatching { driver.findElement(by) }.getOrNull()
}

private fun moveToPostDetailWindow(titleElement: WebElement, originalWindow: String) {
titleElement.click()
WebDriverWait(driver, Duration.ofSeconds(10))
.until(ExpectedConditions.numberOfWindowsToBe(2))
driver.switchTo().window(driver.windowHandles.first { it != originalWindow })
}

private fun extractText(con: CrawlerConsts): String {
return errorRecord(con.location) { driver.findElement(By.xpath(con.value)).text }
}

private fun extractApplyDeadline(con: CrawlerConsts): String {
return errorRecord(con.location) {
driver.findElement(By.xpath(con.value)).text.let {
if (it.contains("채용시까지"))
LocalDate.now().plusDays(15).format(DateTimeFormatter.ofPattern("yyyyMMdd"))
else
it
}
}
}

private fun extractAddress(vararg cons: CrawlerConsts): String {
for (con in cons) {
runCatching {
val address = driver.findElement(By.xpath(con.value)).text
return address.replace("지도보기", "").trim().replace(Regex("\\(\\d{5}\\)"), "").trim()
} .getOrElse { e ->
logError(con.location)
throw e
}
}
throw NoSuchElementException("Center address not found using any of the provided XPaths")
}
Comment on lines +112 to +123
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

주소 추출 로직 개선 필요

주소 추출 시 모든 XPath가 실패할 경우 예외가 발생하며, 정규식 처리가 복잡합니다.

다음과 같이 로직을 개선하세요:

     private fun extractAddress(vararg cons: CrawlerConsts): String {
+        val addressPattern = Regex("\\(\\d{5}\\)|지도보기")
         for (con in cons) {
             runCatching {
                 val address = driver.findElement(By.xpath(con.value)).text
-                return address.replace("지도보기", "").trim().replace(Regex("\\(\\d{5}\\)"), "").trim()
+                return address.replace(addressPattern, "").trim()
             } .getOrElse { e ->
                 logError(con.location)
-                throw e
+                logger.warn(e) { "${con.location} 주소 추출 실패" }
+                continue
             }
         }
-        throw NoSuchElementException("Center address not found using any of the provided XPaths")
+        return "주소 정보 없음"
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private fun extractAddress(vararg cons: CrawlerConsts): String {
for (con in cons) {
runCatching {
val address = driver.findElement(By.xpath(con.value)).text
return address.replace("지도보기", "").trim().replace(Regex("\\(\\d{5}\\)"), "").trim()
} .getOrElse { e ->
logError(con.location)
throw e
}
}
throw NoSuchElementException("Center address not found using any of the provided XPaths")
}
private fun extractAddress(vararg cons: CrawlerConsts): String {
val addressPattern = Regex("\\(\\d{5}\\)|지도보기")
for (con in cons) {
runCatching {
val address = driver.findElement(By.xpath(con.value)).text
return address.replace(addressPattern, "").trim()
} .getOrElse { e ->
logError(con.location)
logger.warn(e) { "${con.location} 주소 추출 실패" }
continue
}
}
return "주소 정보 없음"
}


private fun extractWorkTime(con: CrawlerConsts): String {
return errorRecord(con.location) {
driver.findElement(By.xpath(con.value)).text
.replace("도움말", "")
.replace("(근무시간)", "")
.replace("\n", "")
}
}

private fun logError(location: String) {
errorCountMap[location] = errorCountMap.getOrDefault(location, 0) + 1
}

private fun backWindow(originalWindow: String?) {
driver.close()
driver.switchTo().window(originalWindow)
}
Comment on lines +138 to +141
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

윈도우 관리 안전성 개선 필요

윈도우 전환 시 originalWindow가 null일 경우 처리가 없습니다.

다음과 같이 안전성을 개선하세요:

     private fun backWindow(originalWindow: String?) {
+        if (originalWindow == null) {
+            logger.error { "원본 윈도우 핸들이 null입니다" }
+            return
+        }
         driver.close()
-        driver.switchTo().window(originalWindow)
+        runCatching {
+            driver.switchTo().window(originalWindow)
+        }.onFailure { e ->
+            logger.error(e) { "원본 윈도우로 전환 실패" }
+            throw IllegalStateException("원본 윈도우로 전환할 수 없습니다", e)
+        }
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private fun backWindow(originalWindow: String?) {
driver.close()
driver.switchTo().window(originalWindow)
}
private fun backWindow(originalWindow: String?) {
if (originalWindow == null) {
logger.error { "원본 윈도우 핸들이 null입니다" }
return
}
driver.close()
runCatching {
driver.switchTo().window(originalWindow)
}.onFailure { e ->
logger.error(e) { "원본 윈도우로 전환 실패" }
throw IllegalStateException("원본 윈도우로 전환할 수 없습니다", e)
}
}

}

This file was deleted.

Loading