Skip to content

Conversation

andy1li
Copy link
Member

@andy1li andy1li commented Jul 30, 2025

  • There're two ways in which the tester can hang, both related to companies.db

    • cping it
    • querying it to prepare the expected result
  • Fix / Workaround:

    • use hard link instead of cp
    • hardcode the results

- Added a new test file `stage_index_scan_test.go` to validate the expected results of SQL queries against a SQLite database.
- Introduced a map `expectedQueryResultMap` in `stage_index_scan.go` to store expected results for various queries, enhancing the test coverage and ensuring accuracy of query outputs.
@andy1li andy1li requested a review from rohitpaulk July 30, 2025 21:45
@andy1li andy1li self-assigned this Jul 30, 2025
- Replaced the use of `exec.Command` to copy the test database with `os.Symlink` for improved efficiency and clarity in the test setup process.
- Updated error logging to reflect the change in method for creating the test database symlink.
cursor[bot]

This comment was marked as outdated.

@rohitpaulk
Copy link
Member

@andy1li can you elaborate a bit more on how long each of these actions takes (copying vs. querying)? Which is the dominant factor?

It's weird that this hasn't happened so far but is suddenly a problem. Wonder if I should contact fly to see if it's an issue on their end.

@andy1li
Copy link
Member Author

andy1li commented Jul 30, 2025

@rohitpaulk Both copying and querying could time out (20s) on their own.

An inconsistency is that user's code could query the db without any issue.

Yeah, fly might provide more insights.

Just FYI, here're some disk and memory usage info that I gathered during debugging (Nothing out of the ordinary):

df -h:
Filesystem                Size      Used Available Use% Mounted on
devtmpfs                  1.9G         0      1.9G   0% /dev
none                      7.8G     35.3M      7.3G   0% /
/dev/vdb                  7.8G     35.3M      7.3G   0% /.fly-upper-layer
shm                       1.9G         0      1.9G   0% /dev/shm
tmpfs                     1.9G         0      1.9G   0% /sys/fs/cgroup
free -h:
              total        used        free      shared  buff/cache   available
Mem:           3.8G      131.0M        3.6G       21.9M      121.1M        3.5G
Swap:             0           0           0

- Changed the method for creating the test database link from `os.Symlink` to `os.Link` for improved functionality.
- Updated error logging to reflect the change in method for creating the test database link.
@rohitpaulk
Copy link
Member

rohitpaulk commented Jul 30, 2025

Cool let's work with fly on this, definitely seems off. We'll focus on the copy issue first.

@andy1li could you help me out here a bit please before I contact:

  • Add 2 simple logs:
    • One before copy
    • One after copy (that also includes the time elapsed)
  • Remove all extra logs we only want these two
  • Deploy and then run tests ~10 times to get values of how long it took
  • I'll take it from there - will gather screenshots and fly machine IDs and contact them

@andy1li
Copy link
Member Author

andy1li commented Jul 30, 2025

On it! Just to clarify:

One after copy (that also includes the time elapsed)

@rohitpaulk Will our current time-out err msg suffice, or do I need something special here?

image

@andy1li
Copy link
Member Author

andy1li commented Jul 30, 2025

10 runs completed:

https://app-staging.codecrafters.io/courses/sqlite/admin/submissions

image

Oops, it's more accurate to log cp /var/opt/tester/companies.db test.db.

@rohitpaulk
Copy link
Member

@andy1li so all of them failed on cp exceeding the allotted time? cp never went through and caused the failure where it occasionally goes through quick and the next one causes failures?

(Trying to ascertain if this is intermittent or not - it's harder to make a case with fly if it isn't intermittent)

@andy1li
Copy link
Member Author

andy1li commented Aug 1, 2025

@rohitpaulk Yes, all 10 of my recent submissions look like this (cp never went through):

⛳ Before running `cp companies.db test.db`.
[tester::#NZ8] timed out, test exceeded 20 seconds

Furthermore, not a single submission (out of about a 100) showed signs of going through cp since July 24 (expect for a couple where I created a symlink to replace cp when debugging.)

(For the user submissions, it's possible that cp went through but their own code hung before outputting anything. But it's impossible to tell for sure.)

@rohitpaulk
Copy link
Member

I've reached out to Fly.io about this with a repro: https://github.com/codecrafters-io/flyio-disk-performance-test

@rohitpaulk
Copy link
Member

Also pushed a commit here that uses symlinks for now: c2598c7

Checked major languages and all of them seem to follow symlinks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants