Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

It seems that the lock acquired with GET_LOCK is being released after 1 hour. #58625

Open
negachov opened this issue Dec 30, 2024 · 9 comments
Open
Labels
type/question The issue belongs to a question.

Comments

@negachov
Copy link
Contributor

1. Minimal reproduce step (Required)

The timeout specified for GET_LOCK is the waiting time to acquire the lock, and there is no timeout for the acquired lock itself, is there?

step 1. [console 1]

$ tiup playground v8.5.0 --db 8 --tiflash 0 --without-monitor

step 2. [console 2]

$ ruby ./get_lock_test.rb

get_lock_test.rb

require "active_record"

TIDB_SERVER_NUM = 8

def lock_test
  ActiveRecord::Base.establish_connection(
    adapter:  "mysql2",
    host:     "127.0.0.1",
    port:     rand(4000..(4000 + TIDB_SERVER_NUM - 1)),
    username: "root",
    password: "",
    database: "test",
  )

  lock_name = "get_lock_test_lock_name"
  loop do
    break if ActiveRecord::Base.connection.get_advisory_lock(lock_name)
    sleep 0.1
  end

  puts "lock pid:#{Process.pid}, now:#{Time.now}"
  loop do
    sleep 10
  end
  puts "unlock pid:#{Process.pid}, now:#{Time.now}"
end

$stdout.sync = true

pids = []
10.times do
  pid = fork do
    lock_test
    exit(0)
  end
  Process.setpgid(pid, pid)
  pids << pid
end

puts "pids: #{pids}"
[:INT, :TERM].each do |signal|
  trap(signal) do
    pids.each do |pid|
      Process.kill("TERM", pid)
    end
  end
end

loop do
  pid = Process.wait
  pids.delete(pid)

  break if pids.empty?
end

2. What did you expect to see? (Required)

[console 2]

$ ruby ./get_lock_test.rb
pids: [95164, 95165, 95166, 95167, 95168, 95169, 95170, 95171, 95172, 95173]
lock pid:95164, now:2024-12-30 22:38:46 +0900
lock pid:95169, now:2024-12-30 23:38:57 +0900
lock pid:95168, now:2024-12-31 00:39:07 +0900
lock pid:95166, now:2024-12-31 01:39:18 +0900

3. What did you see instead (Required)

The acquired lock is not released based on time.

[console 2]

$ ruby ./get_lock_test.rb
pids: [95164, 95165, 95166, 95167, 95168, 95169, 95170, 95171, 95172, 95173]
lock pid:95164, now:2024-12-30 22:38:46 +0900

4. What is your TiDB version? (Required)

mysql> select tidb_version() \G
*************************** 1. row ***************************
tidb_version(): Release Version: v8.5.0
Edition: Community
Git Commit Hash: d13e52ed6e22cc5789bed7c64c861578cd2ed55b
Git Branch: HEAD
UTC Build Time: 2024-12-18 02:27:10
GoVersion: go1.23.2
Race Enabled: false
Check Table Before Drop: false
Store: tikv
1 row in set (0.01 sec)
@negachov negachov added the type/bug The issue is confirmed as a bug. label Dec 30, 2024
@lance6716
Copy link
Contributor

@jebter jebter added type/question The issue belongs to a question. and removed type/bug The issue is confirmed as a bug. labels Dec 31, 2024
@negachov
Copy link
Contributor Author

@lance6716 That is the waiting time to acquire the lock, and the correct behavior should be that the acquired lock is not released until the session ends, right?

@lance6716
Copy link
Contributor

I'm not sure what's the behaviour of get_advisory_lock in ruby. Can you use a pure SQL example to example? I just thought the break if ActiveRecord::Base.connection.get_advisory_lock(lock_name) line blocks for 3600 seconds and breaks the loop.

@negachov
Copy link
Contributor Author

negachov commented Jan 1, 2025

@lance6716 How about this?

The following is a method to verify using the MySQL client.

step 1

Preparation

[console 1]

$ tiup playground v8.5.0 --db 1 --tiflash 0 --without-monitor

[console 2]

$ mysql --comments --host 127.0.0.1 --port 4000 -u root

[console 3]

$ mysql --comments --host 127.0.0.1 --port 4000 -u root

step 2

The session in Console 2 acquires the lock.

[console 3]

mysql> SELECT IS_USED_LOCK('my_lock'), NOW();
+-------------------------+---------------------+
| IS_USED_LOCK('my_lock') | NOW()               |
+-------------------------+---------------------+
|                    NULL | 2025-01-01 17:41:46 |
+-------------------------+---------------------+
1 row in set (0.01 sec)

[console 2]

mysql> SELECT GET_LOCK('my_lock', 1);
+------------------------+
| GET_LOCK('my_lock', 1) |
+------------------------+
|                      1 |
+------------------------+
1 row in set (0.01 sec)

mysql> SELECT IS_USED_LOCK('my_lock'), NOW();
+-------------------------+---------------------+
| IS_USED_LOCK('my_lock') | NOW()               |
+-------------------------+---------------------+
|               264241164 | 2025-01-01 17:45:41 |
+-------------------------+---------------------+
1 row in set (0.00 sec)

[console 3]

mysql> SELECT IS_USED_LOCK('my_lock'), NOW();
+-------------------------+---------------------+
| IS_USED_LOCK('my_lock') | NOW()               |
+-------------------------+---------------------+
|                       1 | 2025-01-01 17:45:51 |
+-------------------------+---------------------+
1 row in set (1.01 sec)

※The connection ID is different from what was expected (it is supposed to be the connection ID of the session in Console 2).

step 3(Execute 61 minutes after step 2)

The session in Console 2 still holds the lock, but it cannot be verified from the session in Console 3, and the lock can be acquired there.
[console 2]

mysql> SELECT IS_USED_LOCK('my_lock'), NOW();
+-------------------------+---------------------+
| IS_USED_LOCK('my_lock') | NOW()               |
+-------------------------+---------------------+
|               264241164 | 2025-01-01 18:48:28 |
+-------------------------+---------------------+
1 row in set (0.00 sec)

[console 3]

mysql> SELECT IS_USED_LOCK('my_lock'), NOW();
+-------------------------+---------------------+
| IS_USED_LOCK('my_lock') | NOW()               |
+-------------------------+---------------------+
|                    NULL | 2025-01-01 18:48:35 |
+-------------------------+---------------------+
1 row in set (1.02 sec)

mysql> SELECT GET_LOCK('my_lock', 1);
+------------------------+
| GET_LOCK('my_lock', 1) |
+------------------------+
|                      1 |
+------------------------+
1 row in set (0.00 sec)

@lance6716
Copy link
Contributor

/cc @morgo PTAL

@lance6716
Copy link
Contributor

/cc @bb7133

@morgo
Copy link
Contributor

morgo commented Jan 1, 2025

The implemementation of GET_LOCK is to use a transaction underneath, so it will be limited to 1hr by default.

I agree that there are actually two separate issues:

  1. The timeout to acquire the lock is limited to 1hr (this is well documented)
  2. The maximum time the lock is held for (this is not well documented, and should be documented as a separate bullet point to make it clear). This can be extended with max-txn-ttl.

I checked the pessimistic transactions page to make sure it is documented there too. It is (see point #7).

@negachov
Copy link
Contributor Author

negachov commented Jan 3, 2025

@morgo I see.
The application considering migration to TiDB is a continuously running process. It uses get_lock for redundancy, allowing a single process to handle tasks. In the event of a failure where the process dies, a standby process successfully acquires the lock with get_lock and continues the operation.

While deployments occur daily under normal circumstances, the system may run continuously during long holidays or similar periods.

So, migration options would include:

Setting max-txn-ttl to 25,920,000 (30 days)
Switching to an alternative method
Is that correct?

@morgo
Copy link
Contributor

morgo commented Jan 3, 2025

Setting max-txn-ttl to 25,920,000 (30 days)
Switching to an alternative method
Is that correct?

Yes that's right. Someone from PingCAP may have comments on what the performance downsides you need to consider of a long max-txn-ttl. It is not something I have tested myself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/question The issue belongs to a question.
Projects
None yet
Development

No branches or pull requests

4 participants