Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lkl: add LKL_HIJACK_SYSCTL to configure sysctl values #319

Merged
merged 1 commit into from
Feb 10, 2017

Conversation

thehajime
Copy link
Member

@thehajime thehajime commented Feb 9, 2017

Signed-off-by: Hajime Tazaki [email protected]


This change is Reviewable

*
* @sysctls - Configure sysctl parameters as the form of "key|value;..."
*/
void lkl_sysctl_parse_write(char* sysctls);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this function be just in init.c? I think only hijack needs it and it's confusing to have it in lkl.h

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this function be just in init.c? I think only hijack needs it and it's confusing to have it in lkl.h

right now, yes. but I want to use those functions from #255 so, exposing via lkl.h is useful I believe. a custom application without hijack lib also can use this.

int ret;
int fd;
char *delim, *p;
char full_path[256];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it better to copy path into full_path first and then replace "." so that lkl_sysctl can be called with const char.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree. fixed this.

@thehajime thehajime force-pushed the feature-conf-sysctl branch 2 times, most recently from 20315d2 to c828b47 Compare February 9, 2017 08:21
Copy link
Member

@tavip tavip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would better to use the sysctl(8) syntax for lkl_sysctl_parse_write instead of our own?

@thehajime
Copy link
Member Author

Do you think it would better to use the sysctl(8) syntax for lkl_sysctl_parse_write instead of our own?

Do you mean like this

LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem=4096 87380 2147483647" instead of LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem|4096 87380 2147483647" ?

@tavip
Copy link
Member

tavip commented Feb 9, 2017

Do you mean like this

LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem=4096 87380 2147483647" instead of LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem|4096 87380 2147483647" ?

Yes, but we would also need " like:

LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem=\"4096 87380 2147483647\""

which makes it harder to implement because we need to take " into account.

However, I think we can use the current format and move the lkl_syscatl_parse_write to libhijack and leave the implementation for the format I proposed for later, at which time we can add it as an LKL API.

@thehajime
Copy link
Member Author

Yes, but we would also need " like:
LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem="4096 87380 2147483647""

which makes it harder to implement because we need to take " into account.

sorry, I didn't understand why we need ".
current implementation should recognize the above multi-value parameter (i.e., 4096 87380 2147483647) without ".

However, I think we can use the current format and move the lkl_syscatl_parse_write to libhijack and leave the implementation for the format I proposed for later, at which time we can add it as an LKL API.

hmm, as I said, I'd like to use this from external programs (not hijack lib, #255) so, I would improve now if needed.

@tavip
Copy link
Member

tavip commented Feb 9, 2017

Yes, but we would also need " like:
LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem="4096 87380 2147483647""

which makes it harder to implement because we need to take " into account.
sorry, I didn't understand why we need ".
current implementation should recognize the above multi-value parameter (i.e., 4096 87380 2147483647) without ".

But what happens if you have multiple sysctl options? Like

LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem="4096 87380 2147483647" net.ipv4.tcp_rmem="4096 87380 2147483647""

@thehajime
Copy link
Member Author

But what happens if you have multiple sysctl options? Like

LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem="4096 87380 2147483647" net.ipv4.tcp_rmem="4096 87380 2147483647""

like LKL_HIJACK_NET_NEIGHBOR, it should have multiple sysctl options; in this case,

LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem|4096 87380 2147483647;net.ipv4.tcp_rmem|4096 87380 2147483647"

(though, the delimiters | and ; should be escaped if we want to use them as a key or a value)

I tested it and found there is a bug :)
I almost copied the code from add_neighbor() from hijack/init.c, but I replaced strtok_r to strtok since mingw doesn't have strtok_r.

but I forgot that strtok can't be nested, and thus the above multi-sysctl options case wasn't handled properly..

I will think about this and fix it.

@thehajime thehajime force-pushed the feature-conf-sysctl branch from c828b47 to 41ebb56 Compare February 9, 2017 14:29
@thehajime
Copy link
Member Author

now it should be okay on both linux and mingw.


/* Configure sysctl parameters as the form of "key=value;key=value;..." */
void lkl_sysctl_parse_write(char *sysctls)
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should copy the source string to allow const strings, as above.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, should be fixed now.

there would be several similar fixes in hijack/init.c with strtok_r involved, but I expect other PRs.

This variable can be used like this:

$ LKL_HIJACK_SYSCTL="net.ipv4.tcp_wmem=4096 87380 2147483647" \
  ./bin/lkl-hijack.sh sleep 1000

Signed-off-by: Hajime Tazaki <[email protected]>
@thehajime thehajime force-pushed the feature-conf-sysctl branch from 41ebb56 to 40badbc Compare February 9, 2017 23:58
@thehajime thehajime merged commit 45caa74 into lkl:master Feb 10, 2017
@thehajime thehajime deleted the feature-conf-sysctl branch February 10, 2017 08:42
thehajime pushed a commit to thehajime/linux that referenced this pull request Oct 24, 2024
Commit 76d54bf ("nvme-tcp: don't access released socket during
error recovery") added a mutex_lock() call for the queue->queue_lock
in nvme_tcp_get_address(). However, the mutex_lock() races with
mutex_destroy() in nvme_tcp_free_queue(), and causes the WARN below.

DEBUG_LOCKS_WARN_ON(lock->magic != lock)
WARNING: CPU: 3 PID: 34077 at kernel/locking/mutex.c:587 __mutex_lock+0xcf0/0x1220
Modules linked in: nvmet_tcp nvmet nvme_tcp nvme_fabrics iw_cm ib_cm ib_core pktcdvd nft_fib_inet nft_fib_ipv4 nft_fib_ipv6 nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nf_tables qrtr sunrpc ppdev 9pnet_virtio 9pnet pcspkr netfs parport_pc parport e1000 i2c_piix4 i2c_smbus loop fuse nfnetlink zram bochs drm_vram_helper drm_ttm_helper ttm drm_kms_helper xfs drm sym53c8xx floppy nvme scsi_transport_spi nvme_core nvme_auth serio_raw ata_generic pata_acpi dm_multipath qemu_fw_cfg [last unloaded: ib_uverbs]
CPU: 3 UID: 0 PID: 34077 Comm: udisksd Not tainted 6.11.0-rc7 lkl#319
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
RIP: 0010:__mutex_lock+0xcf0/0x1220
Code: 08 84 d2 0f 85 c8 04 00 00 8b 15 ef b6 c8 01 85 d2 0f 85 78 f4 ff ff 48 c7 c6 20 93 ee af 48 c7 c7 60 91 ee af e8 f0 a7 6d fd <0f> 0b e9 5e f4 ff ff 48 b8 00 00 00 00 00 fc ff df 4c 89 f2 48 c1
RSP: 0018:ffff88811305f760 EFLAGS: 00010286
RAX: 0000000000000000 RBX: ffff88812c652058 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000004 RDI: 0000000000000001
RBP: ffff88811305f8b0 R08: 0000000000000001 R09: ffffed1075c36341
R10: ffff8883ae1b1a0b R11: 0000000000010498 R12: 0000000000000000
R13: 0000000000000000 R14: dffffc0000000000 R15: ffff88812c652058
FS:  00007f9713ae4980(0000) GS:ffff8883ae180000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fcd78483c7c CR3: 0000000122c38000 CR4: 00000000000006f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
 <TASK>
 ? __warn.cold+0x5b/0x1af
 ? __mutex_lock+0xcf0/0x1220
 ? report_bug+0x1ec/0x390
 ? handle_bug+0x3c/0x80
 ? exc_invalid_op+0x13/0x40
 ? asm_exc_invalid_op+0x16/0x20
 ? __mutex_lock+0xcf0/0x1220
 ? nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp]
 ? __pfx___mutex_lock+0x10/0x10
 ? __lock_acquire+0xd6a/0x59e0
 ? nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp]
 nvme_tcp_get_address+0xc2/0x1e0 [nvme_tcp]
 ? __pfx_nvme_tcp_get_address+0x10/0x10 [nvme_tcp]
 nvme_sysfs_show_address+0x81/0xc0 [nvme_core]
 dev_attr_show+0x42/0x80
 ? __asan_memset+0x1f/0x40
 sysfs_kf_seq_show+0x1f0/0x370
 seq_read_iter+0x2cb/0x1130
 ? rw_verify_area+0x3b1/0x590
 ? __mutex_lock+0x433/0x1220
 vfs_read+0x6a6/0xa20
 ? lockdep_hardirqs_on+0x78/0x100
 ? __pfx_vfs_read+0x10/0x10
 ksys_read+0xf7/0x1d0
 ? __pfx_ksys_read+0x10/0x10
 ? __x64_sys_openat+0x105/0x1d0
 do_syscall_64+0x93/0x180
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? __pfx_ksys_read+0x10/0x10
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on_prepare+0x16d/0x400
 ? do_syscall_64+0x9f/0x180
 ? lockdep_hardirqs_on+0x78/0x100
 ? do_syscall_64+0x9f/0x180
 ? do_syscall_64+0x9f/0x180
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x7f9713f55cfa
Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 e8 74 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 42 75 f8 ff 48 8b
RSP: 002b:00007ffd7f512e70 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
RAX: ffffffffffffffda RBX: 000055c38f316859 RCX: 00007f9713f55cfa
RDX: 0000000000000fff RSI: 00007ffd7f512eb0 RDI: 0000000000000011
RBP: 00007ffd7f512e90 R08: 0000000000000000 R09: 00000000ffffffff
R10: 0000000000000000 R11: 0000000000000246 R12: 000055c38f317148
R13: 0000000000000000 R14: 00007f96f4004f30 R15: 000055c3b6b623c0
 </TASK>

The WARN is observed when the blktests test case nvme/014 is repeated
with tcp transport. It is rare, and 200 times repeat is required to
recreate in some test environments.

To avoid the WARN, check the NVME_TCP_Q_LIVE flag before locking
queue->queue_lock. The flag is cleared long time before the lock gets
destroyed.

Signed-off-by: Hannes Reinecke <[email protected]>
Signed-off-by: Shin'ichiro Kawasaki <[email protected]>
Signed-off-by: Keith Busch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants