Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulator CPU utilization at 100% but nothing happens #548

Closed
penberg opened this issue Dec 25, 2024 · 12 comments
Closed

Simulator CPU utilization at 100% but nothing happens #548

penberg opened this issue Dec 25, 2024 · 12 comments
Labels
bug Something isn't working help wanted Extra attention is needed simulator

Comments

@penberg
Copy link
Collaborator

penberg commented Dec 25, 2024

I am seeing the simulator hit 100% CPU utilization but with nothing happening at all:

$ cargo run
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.37s
     Running `/home/penberg/src/penberg/limbo/target/debug/limbo_sim`
^C

I don't see this behavior on macOS.

@penberg penberg added bug Something isn't working help wanted Extra attention is needed simulator labels Dec 25, 2024
@penberg
Copy link
Collaborator Author

penberg commented Dec 25, 2024

The perf top shows we're apparently in step() function:

  16.77%  limbo_sim                                       [.] limbo_core::vdbe::Program::step
   6.84%  [kernel]                                        [k] io_idle
   3.35%  [kernel]                                        [k] __lock_acquire
   2.91%  limbo_sim                                       [.] limbo_core::storage::pager::Pager::cacheflush
   2.45%  limbo_sim                                       [.] <limbo_core::storage::wal::WalFile as limbo_core::storage::wal::Wal>::sync
   2.43%  [kernel]                                        [k] lock_is_held_type
   2.37%  [kernel]                                        [k] lock_release
   2.30%  [kernel]                                        [k] unwind_next_frame
   2.28%  limbo_sim                                       [.] limbo_core::Statement::step
   1.68%  [kernel]                                        [k] kasan_check_range
   1.57%  [kernel]                                        [k] rcu_is_watching

@penberg
Copy link
Collaborator Author

penberg commented Dec 25, 2024

RUST_LOG=trace claims we're busy looping in Halt:

[2024-12-25T10:00:19Z TRACE limbo_core::vdbe] 15    Halt               0     0     0                    0   
[2024-12-25T10:00:19Z TRACE limbo_core::vdbe] Halt auto_commit true
[2024-12-25T10:00:19Z TRACE limbo_core::vdbe] 15    Halt               0     0     0                    0   
[2024-12-25T10:00:19Z TRACE limbo_core::vdbe] Halt auto_commit true
[2024-12-25T10:00:19Z TRACE limbo_core::vdbe] 15    Halt               0     0     0                    0   
[2024-12-25T10:00:19Z TRACE limbo_core::vdbe] Halt auto_commit true
[2024-12-25T10:00:19Z TRACE limbo_core::vdbe] 15    Halt               0     0     0                    0   
[2024-12-25T10:00:19Z TRACE limbo_core::vdbe] Halt auto_commit true

@penberg
Copy link
Collaborator Author

penberg commented Dec 25, 2024

Something funny going on:

This passes:

$ RUST_LOG=trace cargo run -- --seed=13207426936588942312 --maximum-size=5

This panics:

$ RUST_LOG=trace cargo run -- --seed=13207426936588942312 --maximum-size=6
[2024-12-25T10:04:36Z ERROR limbo_sim] panic occurred                                                                                                       ?????????????d                                                                                                                                                  
[2024-12-25T10:04:36Z ERROR limbo_sim] cannot sample empty range                                                                                            ????????????????                                                                                                                                                
[2024-12-25T10:04:36Z ERROR limbo_sim] captured backtrace:                                                                                                  ???????????????                                                                                                                                                 
       0: limbo_sim::main::{{closure}}                                                                                                                      ??????????                                                                                                                                              
                 at ./main.rs:58:18                                                                                                                         ?????????                                                                                                                                   
       1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call                                                                                   ??????????????????l                                                                                                                                             
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/boxed.rs:2468:9                                                       ?????????????????????????9                                                                                                                                      
       2: std::panicking::rust_panic_with_hook                                                                                                              ????????????                                                                                                                                                    
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:809:13                                                     ??????????????????????????                                                                                                                                      
       3: std::panicking::begin_panic_handler::{{closure}}                                                                                                  ???????????????                                                                                                                                                 
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:667:13                                                     ??????????????????????????                                                                                                                                      
       4: std::sys::backtrace::__rust_end_short_backtrace                                                                                                   ??????????????e                                                                                                                                                 
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:170:18                                                 ???????????????????????????                                                                                                                                     
       5: rust_begin_unwind                                                                                                                                 ???????                                                                                                     
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:665:5                                                      ??????????????????????????                                                                                                                                      
       6: core::panicking::panic_fmt                                                                                                                        ?????????                                                                                                                                       
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:74:14                                                     ??????????????????????????                                                                                                                                      
       7: core::panicking::panic
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/panicking.rs:148:5
       8: rand::rng::Rng::gen_range
                 at /home/penberg/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rand-0.8.5/src/rng.rs:134:9
       9: <limbo_sim::generation::plan::InteractionPlan as limbo_sim::generation::ArbitraryFrom<limbo_sim::runner::env::SimulatorEnv>>::arbitrary_from
                 at ./generation/plan.rs:173:32
      10: limbo_sim::run_simulation::{{closure}}
                 at ./main.rs:183:18
      11: core::iter::adapters::map::map_fold::{{closure}}
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/iter/adapters/map.rs:88:28
      12: core::ops::try_trait::NeverShortCircuit<T>::wrap_mut_2::{{closure}}
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/ops/try_trait.rs:392:39
      13: <core::ops::range::RangeInclusive<T> as core::iter::range::RangeInclusiveIteratorImpl>::spec_try_fold
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/iter/range.rs:1189:21
      14: core::iter::range::<impl core::iter::traits::iterator::Iterator for core::ops::range::RangeInclusive<A>>::try_fold
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/iter/range.rs:1308:9
      15: core::iter::range::<impl core::iter::traits::iterator::Iterator for core::ops::range::RangeInclusive<A>>::fold
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/iter/mod.rs:378:18
      16: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/iter/adapters/map.rs:128:9
      17: core::iter::traits::iterator::Iterator::for_each
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/iter/traits/iterator.rs:813:9
      18: alloc::vec::Vec<T,A>::extend_trusted
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/vec/mod.rs:3524:17
      19: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/vec/spec_extend.rs:26:9
      20: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/vec/spec_from_iter_nested.rs:60:9
      21: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/vec/spec_from_iter.rs:33:9
      22: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/alloc/src/vec/mod.rs:3388:9
      23: core::iter::traits::iterator::Iterator::collect
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/iter/traits/iterator.rs:2001:9
      24: limbo_sim::run_simulation
                 at ./main.rs:182:21
      25: limbo_sim::main::{{closure}}
                 at ./main.rs:62:46
      26: std::panicking::try::do_call
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:557:40
      27: __rust_try
      28: std::panicking::try
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:520:19
      29: std::panic::catch_unwind
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panic.rs:358:14
      30: limbo_sim::main
                 at ./main.rs:62:18
      31: core::ops::function::FnOnce::call_once
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/ops/function.rs:250:5
      32: std::sys::backtrace::__rust_begin_short_backtrace
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/sys/backtrace.rs:154:18
      33: std::rt::lang_start::{{closure}}
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/rt.rs:195:18
      34: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/core/src/ops/function.rs:284:13
      35: std::panicking::try::do_call
                 at /rustc/90b35a6239c3d8bdabc530a6a0816f7ff89a0aaf/library/std/src/panicking.rs:557:40

This hits the infinite loop:

$ RUST_LOG=trace cargo run -- --seed=13207426936588942312 --maximum-size=9

@alpaylan
Copy link
Contributor

Do you think the behavior is a bug in the simulator, or if the simulator discovered a bug in Limbo?

penberg added a commit that referenced this issue Dec 25, 2024
…ting at least 1 interaction' from Alperen Keleş

Fixes the panicking case in
#548

Closes #549
@penberg
Copy link
Collaborator Author

penberg commented Dec 25, 2024

@alpaylan I am not sure. @jussisaurio commented that #533 fixes some issues that sound similar to what we're seeing here

@jussisaurio
Copy link
Collaborator

jussisaurio commented Dec 25, 2024

As I describe in #533 inserting more values than a table has results in an infinite loop. Initially I didn't check why, but now I did -- see addr 3:

limbo> create table b(b integer primary key);
limbo> explain insert into b values(1,2);
addr  opcode             p1    p2    p3    p4             p5  comment
----  -----------------  ----  ----  ----  -------------  --  -------
0     Init               0     21    0                    0   Start at 21
1     InitCoroutine      3     6     2                    0   
2     Integer            1     2     0                    0   r[2]=1
3     Integer            2     3     0                    0   r[3]=2 <-- register allocated for the extra column overwrites the initcoroutine register, messing up the jump target pc
4     Yield              3     0     0                    0   
5     EndCoroutine       3     0     0                    0   
6     OpenWriteAsync     0     2     0                    0   
7     OpenWriteAwait     0     0     0                    0   
8     Yield              3     20    0                    0   
9     Copy               2     1     0                    0   r[1]=r[2]
10    SoftNull           2     0     0                    0   
11    NotNull            1     13    0                    0   r[1]!=NULL -> goto 13
12    NewRowId           0     1     0                    0   
13    MustBeInt          1     0     0                    0   
14    NotExists          0     16    1                    0   
15    Halt               1555  0     0                    0   
16    MakeRecord         2     1     4                    0   r[4]=mkrec(r[2..2])
17    InsertAsync        0     4     1                    0   
18    InsertAwait        0     0     0                    0   
19    Goto               0     8     0                    0   
20    Halt               0     0     0                    0   
21    Transaction        0     1     0                    0   
22    Goto               0     1     0                    0 

this might still be an entirely separate issue (this one oscillates between two instructions forever)

@penberg
Copy link
Collaborator Author

penberg commented Dec 27, 2024

With 548f66e and cf1a3fb, the infinite loop now always reproduces on Linux.

@alpaylan
Copy link
Contributor

Once I'm done with #554, I can implement;

  • maximum time per interaction
  • maximum time for the whole simulation

so we can timeout at certain points, debug the output and understand the reason for the infinite loop bugs. The next big thing would be adding minimizing counterexamples, which requires a bit of thinking and architectural changes, but shouldn't be hard to add. That would give us the minimal sequence of interactions that cause a given bug.

@jussisaurio
Copy link
Collaborator

since this is a linux specific issue, it must be something related to flushing to disk on linux, right?

    pub fn end_tx(&self) -> Result<CheckpointStatus> {
        match self.cacheflush()? {
            CheckpointStatus::Done => {}
            CheckpointStatus::IO => return Ok(CheckpointStatus::IO),
        };
        self.wal.borrow().end_read_tx()?;
        Ok(CheckpointStatus::Done)
    }

this essentially fsyncs the wal, then checkpoints and fsyncs the db file. probably adding some logs in relevant places and running the simulation on linux would reveal what stage of self.cacheflush() it's actually stuck at

@alpaylan
Copy link
Contributor

alpaylan commented Jan 1, 2025

Can we make something that will produce log differentials for such cases?

Basically something that runs the code in both environments, collects logs, points out where and how they differ, so that we pinpoint the root causes.

@alpaylan
Copy link
Contributor

alpaylan commented Jan 6, 2025

@penberg I added the timeout for the whole simulator by aggregating time between interactions, but it would still fall into an infinite loop if a specific query runs forever. I'm not sure if it's possible to timeout sync code, any ideas?

@penberg
Copy link
Collaborator Author

penberg commented Jan 14, 2025

This no longer reproduces so closing.

@penberg penberg closed this as completed Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed simulator
Projects
None yet
Development

No branches or pull requests

3 participants