* Re: [PATCH 1/3] ima: Remove inode lock [not found] ` <15bb94a306d3432de55c0a12f29e7ed2b5fa3ba1.camel@huaweicloud.com> @ 2024-10-16 9:59 ` Roberto Sassu 2024-10-18 9:24 ` Roberto Sassu 0 siblings, 1 reply; 13+ messages in thread From: Roberto Sassu @ 2024-10-16 9:59 UTC (permalink / raw) To: Paul Moore Cc: zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, ebpqwerty472123, Roberto Sassu, linux-mm, akpm, vbabka, lorenzo.stoakes On Fri, 2024-10-11 at 15:30 +0200, Roberto Sassu wrote: > On Wed, 2024-10-09 at 18:43 +0200, Roberto Sassu wrote: > > On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote: > > > On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote: > > > > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote: > > > > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu > > > > > <roberto.sassu@huaweicloud.com> wrote: > > > > > > > > > > > > From: Roberto Sassu <roberto.sassu@huawei.com> > > > > > > > > > > > > Move out the mutex in the ima_iint_cache structure to a new structure > > > > > > called ima_iint_cache_lock, so that a lock can be taken regardless of > > > > > > whether or not inode integrity metadata are stored in the inode. > > > > > > > > > > > > Introduce ima_inode_security() to simplify accessing the new structure in > > > > > > the inode security blob. > > > > > > > > > > > > Move the mutex initialization and annotation in the new function > > > > > > ima_inode_alloc_security() and introduce ima_iint_lock() and > > > > > > ima_iint_unlock() to respectively lock and unlock the mutex. > > > > > > > > > > > > Finally, expand the critical region in process_measurement() guarded by > > > > > > iint->mutex up to where the inode was locked, use only one iint lock in > > > > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and > > > > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer(). > > > > > > > > > > > > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> > > > > > > --- > > > > > > security/integrity/ima/ima.h | 26 ++++++++--- > > > > > > security/integrity/ima/ima_api.c | 4 +- > > > > > > security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++----- > > > > > > security/integrity/ima/ima_main.c | 39 +++++++--------- > > > > > > 4 files changed, 104 insertions(+), 42 deletions(-) > > > > > > > > > > I'm not an IMA expert, but it looks reasonable to me, although > > > > > shouldn't this carry a stable CC in the patch metadata? > > > > > > > > > > Reviewed-by: Paul Moore <paul@paul-moore.com> > > > > > > > > Sorry, one more thing ... did you verify this patchset resolves the > > > > syzbot problem? I saw at least one reproducer. > > > > > > Uhm, could not reproduce the deadlock with the reproducer. However, > > > without the patch I have a lockdep warning, and with I don't. > > > > > > I asked syzbot to try the patches. Let's see. > > > > I actually got a different lockdep warning: > > > > [ 904.603365] ====================================================== > > [ 904.604264] WARNING: possible circular locking dependency detected > > [ 904.605141] 6.12.0-rc2+ #20 Not tainted > > [ 904.605697] ------------------------------------------------------ > > I can reproduce by executing the syzbot reproducer and in another > terminal by logging in with SSH (not all the times). > > If I understood what the lockdep warning means, this is the scenario. > > A task accesses a seq_file which is in the IMA policy, so we enter the > critical region guarded by the iint lock. But before we get the chance > to measure the file, a second task calls remap_file_pages() on the same > seq_file. > > remap_file_pages() takes the mmap lock and, if the event matches the > IMA policy too, the second task waits for the iint lock to be released. > > Now, the first task starts to measure the seq_file and takes the > seq_file lock. I don't know if 3 processes must be involved, but I was > thinking that reading the seq_file from the first task can trigger a > page fault, which requires to take the mmap lock. > > At this point, we reach a deadlock. The first task waits for the mmap > lock to be released, and the second waits for the iint lock to be > released, which both cannot happen. + mm, mm folks (I leave the lockdep warning down for the new people included in the thread). I think I understood what the problem is. Any lock that is taken inside security_file_mmap() would cause lock inversion since: vm_mmap_pgoff(): ret = security_mmap_file(file, prot, flag); if (!ret) { if (mmap_write_lock_killable(mm)) SYSCALL_DEFINE5(remap_file_pages, ...): if (mmap_write_lock_killable(mm)) return -EINTR; [...] file = get_file(vma->vm_file); ret = security_mmap_file(vma->vm_file, prot, flags); if (ret) goto out_fput; Since I didn't see a good way to change the order of the second, I changed the order of the first, i.e. I put security_file_mmap() under the mmap lock. (I don't know if this is a good idea.) I cannot reproduce the lockdep warning anymore. @mm folks, what is your opinion? Thanks Roberto > Roberto > > > [ 904.606577] systemd/1 is trying to acquire lock: > > [ 904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0 > > [ 904.608290] > > [ 904.608290] but task is already holding lock: > > [ 904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40 > > [ 904.610429] > > [ 904.610429] which lock already depends on the new lock. > > [ 904.610429] > > [ 904.611574] > > [ 904.611574] the existing dependency chain (in reverse order) is: > > [ 904.612628] > > [ 904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}: > > [ 904.613681] __mutex_lock+0xaf/0x760 > > [ 904.614266] mutex_lock_nested+0x27/0x40 > > [ 904.614897] ima_iint_lock+0x24/0x40 > > [ 904.615490] process_measurement+0x176/0xef0 > > [ 904.616168] ima_file_mmap+0x98/0x120 > > [ 904.616767] security_mmap_file+0x408/0x560 > > [ 904.617444] __do_sys_remap_file_pages+0x2fa/0x4c0 > > [ 904.618194] __x64_sys_remap_file_pages+0x29/0x40 > > [ 904.618937] x64_sys_call+0x6e8/0x4550 > > [ 904.619546] do_syscall_64+0x71/0x180 > > [ 904.620155] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 904.620952] > > [ 904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}: > > [ 904.621813] __might_fault+0x6f/0xb0 > > [ 904.622400] _copy_to_iter+0x12e/0xa80 > > [ 904.623009] seq_read_iter+0x593/0x6b0 > > [ 904.623629] proc_reg_read_iter+0x31/0xe0 > > [ 904.624276] vfs_read+0x256/0x3d0 > > [ 904.624822] ksys_read+0x6d/0x160 > > [ 904.625372] __x64_sys_read+0x1d/0x30 > > [ 904.625964] x64_sys_call+0x1068/0x4550 > > [ 904.626594] do_syscall_64+0x71/0x180 > > [ 904.627188] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 904.627975] > > [ 904.627975] -> #0 (&p->lock){+.+.}-{4:4}: > > [ 904.628787] __lock_acquire+0x17f3/0x2320 > > [ 904.629432] lock_acquire+0xf2/0x420 > > [ 904.630013] __mutex_lock+0xaf/0x760 > > [ 904.630596] mutex_lock_nested+0x27/0x40 > > [ 904.631225] seq_read_iter+0x62/0x6b0 > > [ 904.631831] kernfs_fop_read_iter+0x1ef/0x2c0 > > [ 904.632599] __kernel_read+0x113/0x350 > > [ 904.633206] integrity_kernel_read+0x23/0x40 > > [ 904.633902] ima_calc_file_hash_tfm+0x14e/0x230 > > [ 904.634621] ima_calc_file_hash+0x97/0x250 > > [ 904.635281] ima_collect_measurement+0x4be/0x530 > > [ 904.636008] process_measurement+0x7c0/0xef0 > > [ 904.636689] ima_file_check+0x65/0x80 > > [ 904.637295] security_file_post_open+0xb1/0x1b0 > > [ 904.638008] path_openat+0x216/0x1280 > > [ 904.638605] do_filp_open+0xab/0x140 > > [ 904.639185] do_sys_openat2+0xba/0x120 > > [ 904.639805] do_sys_open+0x4c/0x80 > > [ 904.640366] __x64_sys_openat+0x23/0x30 > > [ 904.640992] x64_sys_call+0x2575/0x4550 > > [ 904.641616] do_syscall_64+0x71/0x180 > > [ 904.642207] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 904.643003] > > [ 904.643003] other info that might help us debug this: > > [ 904.643003] > > [ 904.644149] Chain exists of: > > [ 904.644149] &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth] > > [ 904.644149] > > [ 904.645763] Possible unsafe locking scenario: > > [ 904.645763] > > [ 904.646614] CPU0 CPU1 > > [ 904.647264] ---- ---- > > [ 904.647909] lock(&ima_iint_lock_mutex_key[depth]); > > [ 904.648617] lock(&mm->mmap_lock); > > [ 904.649479] lock(&ima_iint_lock_mutex_key[depth]); > > [ 904.650543] lock(&p->lock); > > [ 904.650974] > > [ 904.650974] *** DEADLOCK *** > > [ 904.650974] > > [ 904.651826] 1 lock held by systemd/1: > > [ 904.652376] #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40 > > [ 904.653759] > > [ 904.653759] stack backtrace: > > [ 904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20 > > [ 904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 > > [ 904.656497] Call Trace: > > [ 904.656856] <TASK> > > [ 904.657166] dump_stack_lvl+0x134/0x1a0 > > [ 904.657728] dump_stack+0x14/0x30 > > [ 904.658206] print_circular_bug+0x38d/0x450 > > [ 904.658812] check_noncircular+0xed/0x120 > > [ 904.659396] ? srso_return_thunk+0x5/0x5f > > [ 904.659972] ? srso_return_thunk+0x5/0x5f > > [ 904.660569] __lock_acquire+0x17f3/0x2320 > > [ 904.661145] lock_acquire+0xf2/0x420 > > [ 904.661664] ? seq_read_iter+0x62/0x6b0 > > [ 904.662217] ? srso_return_thunk+0x5/0x5f > > [ 904.662886] __mutex_lock+0xaf/0x760 > > [ 904.663408] ? seq_read_iter+0x62/0x6b0 > > [ 904.663961] ? seq_read_iter+0x62/0x6b0 > > [ 904.664525] ? srso_return_thunk+0x5/0x5f > > [ 904.665098] ? mark_lock+0x4e/0x750 > > [ 904.665610] ? mutex_lock_nested+0x27/0x40 > > [ 904.666194] ? find_held_lock+0x3a/0x100 > > [ 904.666770] mutex_lock_nested+0x27/0x40 > > [ 904.667337] seq_read_iter+0x62/0x6b0 > > [ 904.667869] kernfs_fop_read_iter+0x1ef/0x2c0 > > [ 904.668536] __kernel_read+0x113/0x350 > > [ 904.669079] integrity_kernel_read+0x23/0x40 > > [ 904.669698] ima_calc_file_hash_tfm+0x14e/0x230 > > [ 904.670349] ? __lock_acquire+0xc32/0x2320 > > [ 904.670937] ? srso_return_thunk+0x5/0x5f > > [ 904.671525] ? __lock_acquire+0xfbb/0x2320 > > [ 904.672113] ? srso_return_thunk+0x5/0x5f > > [ 904.672693] ? srso_return_thunk+0x5/0x5f > > [ 904.673280] ? lock_acquire+0xf2/0x420 > > [ 904.673818] ? kernfs_iop_getattr+0x4a/0xb0 > > [ 904.674424] ? srso_return_thunk+0x5/0x5f > > [ 904.674997] ? find_held_lock+0x3a/0x100 > > [ 904.675564] ? srso_return_thunk+0x5/0x5f > > [ 904.676156] ? srso_return_thunk+0x5/0x5f > > [ 904.676740] ? srso_return_thunk+0x5/0x5f > > [ 904.677322] ? local_clock_noinstr+0x9/0xb0 > > [ 904.677923] ? srso_return_thunk+0x5/0x5f > > [ 904.678502] ? srso_return_thunk+0x5/0x5f > > [ 904.679075] ? lock_release+0x4e2/0x570 > > [ 904.679639] ima_calc_file_hash+0x97/0x250 > > [ 904.680227] ima_collect_measurement+0x4be/0x530 > > [ 904.680901] ? srso_return_thunk+0x5/0x5f > > [ 904.681496] ? srso_return_thunk+0x5/0x5f > > [ 904.682070] ? __kernfs_iattrs+0x4a/0x140 > > [ 904.682658] ? srso_return_thunk+0x5/0x5f > > [ 904.683242] ? process_measurement+0x7c0/0xef0 > > [ 904.683876] ? srso_return_thunk+0x5/0x5f > > [ 904.684462] process_measurement+0x7c0/0xef0 > > [ 904.685078] ? srso_return_thunk+0x5/0x5f > > [ 904.685654] ? srso_return_thunk+0x5/0x5f > > [ 904.686228] ? _raw_spin_unlock_irqrestore+0x5d/0xd0 > > [ 904.686938] ? srso_return_thunk+0x5/0x5f > > [ 904.687523] ? srso_return_thunk+0x5/0x5f > > [ 904.688098] ? srso_return_thunk+0x5/0x5f > > [ 904.688672] ? local_clock_noinstr+0x9/0xb0 > > [ 904.689273] ? srso_return_thunk+0x5/0x5f > > [ 904.689846] ? srso_return_thunk+0x5/0x5f > > [ 904.690430] ? srso_return_thunk+0x5/0x5f > > [ 904.691005] ? srso_return_thunk+0x5/0x5f > > [ 904.691583] ? srso_return_thunk+0x5/0x5f > > [ 904.692180] ? local_clock_noinstr+0x9/0xb0 > > [ 904.692841] ? srso_return_thunk+0x5/0x5f > > [ 904.693419] ? srso_return_thunk+0x5/0x5f > > [ 904.693990] ? lock_release+0x4e2/0x570 > > [ 904.694544] ? srso_return_thunk+0x5/0x5f > > [ 904.695115] ? kernfs_put_active+0x5d/0xc0 > > [ 904.695708] ? srso_return_thunk+0x5/0x5f > > [ 904.696286] ? kernfs_fop_open+0x376/0x6b0 > > [ 904.696872] ima_file_check+0x65/0x80 > > [ 904.697409] security_file_post_open+0xb1/0x1b0 > > [ 904.698058] path_openat+0x216/0x1280 > > [ 904.698589] do_filp_open+0xab/0x140 > > [ 904.699106] ? srso_return_thunk+0x5/0x5f > > [ 904.699693] ? lock_release+0x554/0x570 > > [ 904.700264] ? srso_return_thunk+0x5/0x5f > > [ 904.700836] ? do_raw_spin_unlock+0x76/0x140 > > [ 904.701450] ? srso_return_thunk+0x5/0x5f > > [ 904.702021] ? _raw_spin_unlock+0x3f/0xa0 > > [ 904.702606] ? srso_return_thunk+0x5/0x5f > > [ 904.703178] ? alloc_fd+0x1ca/0x3b0 > > [ 904.703685] do_sys_openat2+0xba/0x120 > > [ 904.704223] ? file_free+0x8d/0x110 > > [ 904.704729] do_sys_open+0x4c/0x80 > > [ 904.705221] __x64_sys_openat+0x23/0x30 > > [ 904.705784] x64_sys_call+0x2575/0x4550 > > [ 904.706337] do_syscall_64+0x71/0x180 > > [ 904.706863] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 904.707587] RIP: 0033:0x7f3462123037 > > [ 904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac > > [ 904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 > > [ 904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037 > > [ 904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c > > [ 904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000 > > [ 904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8 > > [ 904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690 > > [ 904.716877] </TASK> > > > > Roberto > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-16 9:59 ` [PATCH 1/3] ima: Remove inode lock Roberto Sassu @ 2024-10-18 9:24 ` Roberto Sassu 2024-10-18 10:49 ` Kirill A. Shutemov 0 siblings, 1 reply; 13+ messages in thread From: Roberto Sassu @ 2024-10-18 9:24 UTC (permalink / raw) To: Paul Moore, ebpqwerty472123, kirill.shutemov Cc: zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, lorenzo.stoakes, linux-fsdevel On Wed, 2024-10-16 at 11:59 +0200, Roberto Sassu wrote: > On Fri, 2024-10-11 at 15:30 +0200, Roberto Sassu wrote: > > On Wed, 2024-10-09 at 18:43 +0200, Roberto Sassu wrote: > > > On Wed, 2024-10-09 at 18:25 +0200, Roberto Sassu wrote: > > > > On Wed, 2024-10-09 at 11:37 -0400, Paul Moore wrote: > > > > > On Wed, Oct 9, 2024 at 11:36 AM Paul Moore <paul@paul-moore.com> wrote: > > > > > > On Tue, Oct 8, 2024 at 12:57 PM Roberto Sassu > > > > > > <roberto.sassu@huaweicloud.com> wrote: > > > > > > > > > > > > > > From: Roberto Sassu <roberto.sassu@huawei.com> > > > > > > > > > > > > > > Move out the mutex in the ima_iint_cache structure to a new structure > > > > > > > called ima_iint_cache_lock, so that a lock can be taken regardless of > > > > > > > whether or not inode integrity metadata are stored in the inode. > > > > > > > > > > > > > > Introduce ima_inode_security() to simplify accessing the new structure in > > > > > > > the inode security blob. > > > > > > > > > > > > > > Move the mutex initialization and annotation in the new function > > > > > > > ima_inode_alloc_security() and introduce ima_iint_lock() and > > > > > > > ima_iint_unlock() to respectively lock and unlock the mutex. > > > > > > > > > > > > > > Finally, expand the critical region in process_measurement() guarded by > > > > > > > iint->mutex up to where the inode was locked, use only one iint lock in > > > > > > > __ima_inode_hash(), since the mutex is now in the inode security blob, and > > > > > > > replace the inode_lock()/inode_unlock() calls in ima_check_last_writer(). > > > > > > > > > > > > > > Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> > > > > > > > --- > > > > > > > security/integrity/ima/ima.h | 26 ++++++++--- > > > > > > > security/integrity/ima/ima_api.c | 4 +- > > > > > > > security/integrity/ima/ima_iint.c | 77 ++++++++++++++++++++++++++----- > > > > > > > security/integrity/ima/ima_main.c | 39 +++++++--------- > > > > > > > 4 files changed, 104 insertions(+), 42 deletions(-) > > > > > > > > > > > > I'm not an IMA expert, but it looks reasonable to me, although > > > > > > shouldn't this carry a stable CC in the patch metadata? > > > > > > > > > > > > Reviewed-by: Paul Moore <paul@paul-moore.com> > > > > > > > > > > Sorry, one more thing ... did you verify this patchset resolves the > > > > > syzbot problem? I saw at least one reproducer. > > > > > > > > Uhm, could not reproduce the deadlock with the reproducer. However, > > > > without the patch I have a lockdep warning, and with I don't. > > > > > > > > I asked syzbot to try the patches. Let's see. > > > > > > I actually got a different lockdep warning: > > > > > > [ 904.603365] ====================================================== > > > [ 904.604264] WARNING: possible circular locking dependency detected > > > [ 904.605141] 6.12.0-rc2+ #20 Not tainted > > > [ 904.605697] ------------------------------------------------------ > > > > I can reproduce by executing the syzbot reproducer and in another > > terminal by logging in with SSH (not all the times). > > > > If I understood what the lockdep warning means, this is the scenario. > > > > A task accesses a seq_file which is in the IMA policy, so we enter the > > critical region guarded by the iint lock. But before we get the chance > > to measure the file, a second task calls remap_file_pages() on the same > > seq_file. > > > > remap_file_pages() takes the mmap lock and, if the event matches the > > IMA policy too, the second task waits for the iint lock to be released. > > > > Now, the first task starts to measure the seq_file and takes the > > seq_file lock. I don't know if 3 processes must be involved, but I was > > thinking that reading the seq_file from the first task can trigger a > > page fault, which requires to take the mmap lock. > > > > At this point, we reach a deadlock. The first task waits for the mmap > > lock to be released, and the second waits for the iint lock to be > > released, which both cannot happen. > > + mm, mm folks > > (I leave the lockdep warning down for the new people included in the > thread). > > I think I understood what the problem is. Any lock that is taken inside > security_file_mmap() would cause lock inversion since: + Kirill Ok, to be clear, we are talking about a regression introduced by commit ea7e2d5e49c05 ("mm: call the security_mmap_file() LSM hook in remap_file_pages()"). Originally, Mimi asked this patch to be included: https://lore.kernel.org/lkml/1336963631-3541-1-git-send-email-zohar@us.ibm.com/ This was due to the commit 196f518 ("IMA: explicit IMA i_flag to remove global lock on inode_delete"), which added an inode lock to protect the new flag S_IMA, used to track integrity metadata only for the set of inodes evaluated by IMA. The problem was that, for mmapped files, the lock is taken in the opposite order than the convention (i_mutex and then mmap_sem), causing a lock inversion and a lockdep warning. Linus proposed to split security_file_mmap() into security_mmap_file() and security_mmap_addr(), so that the former can be taken out of the mmap_sem lock and remove the lock inversion. The final commit was made by Al Viro, 8b3ec6814c83 ("take security_mmap_file() outside of - >mmap_sem"). Commit ea7e2d5e49c05 put again a security_mmap_file() call inside the mmap_sem (now mmap_lock) lock, causing the recent syzbot reports. Paul asked if the inode lock can be removed from IMA, and actually partially can be done: https://lore.kernel.org/linux-integrity/20241008165732.2603647-1-roberto.sassu@huaweicloud.com/ The patch is good in its own, since it merges two critical sections in one. However, the inode lock cannot be removed completely: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/security/integrity/ima/ima_main.c?h=v6.12-rc3#n386 This one is required to set the security.ima xattr in ima_appraise_measurement() -> ima_fix_xattr(), when IMA-Appraise is in fix mode (ima_appraise=fix in the kernel command line). I would say that my original suggestion of putting security_mmap_file() back to the mmap_lock lock probably is not the optimal solution. Maybe we could get around removing the remaining inode lock in IMA, but we would also limit future LSMs to not use the inode lock if they need. Probably it is hard, @Kirill would there be any way to safely move security_mmap_file() out of the mmap_lock lock? Thanks Roberto PS: I'm aware that Shu Han proposed a solution to the lock inversion: https://lore.kernel.org/linux-security-module/20240928180044.50-1-ebpqwerty472123@gmail.com/ but I think anyway it boils down to either moving security_mmap_file() to the mmap_lock lock again for all calls, or viceversa, moving security_mmap_file() out of the remap_file_pages() system call. > vm_mmap_pgoff(): > > ret = security_mmap_file(file, prot, flag); > if (!ret) { > if (mmap_write_lock_killable(mm)) > > > > SYSCALL_DEFINE5(remap_file_pages, ...): > > if (mmap_write_lock_killable(mm)) > return -EINTR; > > [...] > > file = get_file(vma->vm_file); > ret = security_mmap_file(vma->vm_file, prot, flags); > if (ret) > goto out_fput; > > > Since I didn't see a good way to change the order of the second, I > changed the order of the first, i.e. I put security_file_mmap() under > the mmap lock. > > (I don't know if this is a good idea.) > > I cannot reproduce the lockdep warning anymore. > > @mm folks, what is your opinion? > > Thanks > > Roberto > > > Roberto > > > > > [ 904.606577] systemd/1 is trying to acquire lock: > > > [ 904.607227] ffff88810e5c2580 (&p->lock){+.+.}-{4:4}, at: seq_read_iter+0x62/0x6b0 > > > [ 904.608290] > > > [ 904.608290] but task is already holding lock: > > > [ 904.609105] ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40 > > > [ 904.610429] > > > [ 904.610429] which lock already depends on the new lock. > > > [ 904.610429] > > > [ 904.611574] > > > [ 904.611574] the existing dependency chain (in reverse order) is: > > > [ 904.612628] > > > [ 904.612628] -> #2 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}: > > > [ 904.613681] __mutex_lock+0xaf/0x760 > > > [ 904.614266] mutex_lock_nested+0x27/0x40 > > > [ 904.614897] ima_iint_lock+0x24/0x40 > > > [ 904.615490] process_measurement+0x176/0xef0 > > > [ 904.616168] ima_file_mmap+0x98/0x120 > > > [ 904.616767] security_mmap_file+0x408/0x560 > > > [ 904.617444] __do_sys_remap_file_pages+0x2fa/0x4c0 > > > [ 904.618194] __x64_sys_remap_file_pages+0x29/0x40 > > > [ 904.618937] x64_sys_call+0x6e8/0x4550 > > > [ 904.619546] do_syscall_64+0x71/0x180 > > > [ 904.620155] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > [ 904.620952] > > > [ 904.620952] -> #1 (&mm->mmap_lock){++++}-{4:4}: > > > [ 904.621813] __might_fault+0x6f/0xb0 > > > [ 904.622400] _copy_to_iter+0x12e/0xa80 > > > [ 904.623009] seq_read_iter+0x593/0x6b0 > > > [ 904.623629] proc_reg_read_iter+0x31/0xe0 > > > [ 904.624276] vfs_read+0x256/0x3d0 > > > [ 904.624822] ksys_read+0x6d/0x160 > > > [ 904.625372] __x64_sys_read+0x1d/0x30 > > > [ 904.625964] x64_sys_call+0x1068/0x4550 > > > [ 904.626594] do_syscall_64+0x71/0x180 > > > [ 904.627188] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > [ 904.627975] > > > [ 904.627975] -> #0 (&p->lock){+.+.}-{4:4}: > > > [ 904.628787] __lock_acquire+0x17f3/0x2320 > > > [ 904.629432] lock_acquire+0xf2/0x420 > > > [ 904.630013] __mutex_lock+0xaf/0x760 > > > [ 904.630596] mutex_lock_nested+0x27/0x40 > > > [ 904.631225] seq_read_iter+0x62/0x6b0 > > > [ 904.631831] kernfs_fop_read_iter+0x1ef/0x2c0 > > > [ 904.632599] __kernel_read+0x113/0x350 > > > [ 904.633206] integrity_kernel_read+0x23/0x40 > > > [ 904.633902] ima_calc_file_hash_tfm+0x14e/0x230 > > > [ 904.634621] ima_calc_file_hash+0x97/0x250 > > > [ 904.635281] ima_collect_measurement+0x4be/0x530 > > > [ 904.636008] process_measurement+0x7c0/0xef0 > > > [ 904.636689] ima_file_check+0x65/0x80 > > > [ 904.637295] security_file_post_open+0xb1/0x1b0 > > > [ 904.638008] path_openat+0x216/0x1280 > > > [ 904.638605] do_filp_open+0xab/0x140 > > > [ 904.639185] do_sys_openat2+0xba/0x120 > > > [ 904.639805] do_sys_open+0x4c/0x80 > > > [ 904.640366] __x64_sys_openat+0x23/0x30 > > > [ 904.640992] x64_sys_call+0x2575/0x4550 > > > [ 904.641616] do_syscall_64+0x71/0x180 > > > [ 904.642207] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > [ 904.643003] > > > [ 904.643003] other info that might help us debug this: > > > [ 904.643003] > > > [ 904.644149] Chain exists of: > > > [ 904.644149] &p->lock --> &mm->mmap_lock --> &ima_iint_lock_mutex_key[depth] > > > [ 904.644149] > > > [ 904.645763] Possible unsafe locking scenario: > > > [ 904.645763] > > > [ 904.646614] CPU0 CPU1 > > > [ 904.647264] ---- ---- > > > [ 904.647909] lock(&ima_iint_lock_mutex_key[depth]); > > > [ 904.648617] lock(&mm->mmap_lock); > > > [ 904.649479] lock(&ima_iint_lock_mutex_key[depth]); > > > [ 904.650543] lock(&p->lock); > > > [ 904.650974] > > > [ 904.650974] *** DEADLOCK *** > > > [ 904.650974] > > > [ 904.651826] 1 lock held by systemd/1: > > > [ 904.652376] #0: ffff88810f4abf20 (&ima_iint_lock_mutex_key[depth]){+.+.}-{4:4}, at: ima_iint_lock+0x24/0x40 > > > [ 904.653759] > > > [ 904.653759] stack backtrace: > > > [ 904.654391] CPU: 2 UID: 0 PID: 1 Comm: systemd Not tainted 6.12.0-rc2+ #20 > > > [ 904.655360] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 > > > [ 904.656497] Call Trace: > > > [ 904.656856] <TASK> > > > [ 904.657166] dump_stack_lvl+0x134/0x1a0 > > > [ 904.657728] dump_stack+0x14/0x30 > > > [ 904.658206] print_circular_bug+0x38d/0x450 > > > [ 904.658812] check_noncircular+0xed/0x120 > > > [ 904.659396] ? srso_return_thunk+0x5/0x5f > > > [ 904.659972] ? srso_return_thunk+0x5/0x5f > > > [ 904.660569] __lock_acquire+0x17f3/0x2320 > > > [ 904.661145] lock_acquire+0xf2/0x420 > > > [ 904.661664] ? seq_read_iter+0x62/0x6b0 > > > [ 904.662217] ? srso_return_thunk+0x5/0x5f > > > [ 904.662886] __mutex_lock+0xaf/0x760 > > > [ 904.663408] ? seq_read_iter+0x62/0x6b0 > > > [ 904.663961] ? seq_read_iter+0x62/0x6b0 > > > [ 904.664525] ? srso_return_thunk+0x5/0x5f > > > [ 904.665098] ? mark_lock+0x4e/0x750 > > > [ 904.665610] ? mutex_lock_nested+0x27/0x40 > > > [ 904.666194] ? find_held_lock+0x3a/0x100 > > > [ 904.666770] mutex_lock_nested+0x27/0x40 > > > [ 904.667337] seq_read_iter+0x62/0x6b0 > > > [ 904.667869] kernfs_fop_read_iter+0x1ef/0x2c0 > > > [ 904.668536] __kernel_read+0x113/0x350 > > > [ 904.669079] integrity_kernel_read+0x23/0x40 > > > [ 904.669698] ima_calc_file_hash_tfm+0x14e/0x230 > > > [ 904.670349] ? __lock_acquire+0xc32/0x2320 > > > [ 904.670937] ? srso_return_thunk+0x5/0x5f > > > [ 904.671525] ? __lock_acquire+0xfbb/0x2320 > > > [ 904.672113] ? srso_return_thunk+0x5/0x5f > > > [ 904.672693] ? srso_return_thunk+0x5/0x5f > > > [ 904.673280] ? lock_acquire+0xf2/0x420 > > > [ 904.673818] ? kernfs_iop_getattr+0x4a/0xb0 > > > [ 904.674424] ? srso_return_thunk+0x5/0x5f > > > [ 904.674997] ? find_held_lock+0x3a/0x100 > > > [ 904.675564] ? srso_return_thunk+0x5/0x5f > > > [ 904.676156] ? srso_return_thunk+0x5/0x5f > > > [ 904.676740] ? srso_return_thunk+0x5/0x5f > > > [ 904.677322] ? local_clock_noinstr+0x9/0xb0 > > > [ 904.677923] ? srso_return_thunk+0x5/0x5f > > > [ 904.678502] ? srso_return_thunk+0x5/0x5f > > > [ 904.679075] ? lock_release+0x4e2/0x570 > > > [ 904.679639] ima_calc_file_hash+0x97/0x250 > > > [ 904.680227] ima_collect_measurement+0x4be/0x530 > > > [ 904.680901] ? srso_return_thunk+0x5/0x5f > > > [ 904.681496] ? srso_return_thunk+0x5/0x5f > > > [ 904.682070] ? __kernfs_iattrs+0x4a/0x140 > > > [ 904.682658] ? srso_return_thunk+0x5/0x5f > > > [ 904.683242] ? process_measurement+0x7c0/0xef0 > > > [ 904.683876] ? srso_return_thunk+0x5/0x5f > > > [ 904.684462] process_measurement+0x7c0/0xef0 > > > [ 904.685078] ? srso_return_thunk+0x5/0x5f > > > [ 904.685654] ? srso_return_thunk+0x5/0x5f > > > [ 904.686228] ? _raw_spin_unlock_irqrestore+0x5d/0xd0 > > > [ 904.686938] ? srso_return_thunk+0x5/0x5f > > > [ 904.687523] ? srso_return_thunk+0x5/0x5f > > > [ 904.688098] ? srso_return_thunk+0x5/0x5f > > > [ 904.688672] ? local_clock_noinstr+0x9/0xb0 > > > [ 904.689273] ? srso_return_thunk+0x5/0x5f > > > [ 904.689846] ? srso_return_thunk+0x5/0x5f > > > [ 904.690430] ? srso_return_thunk+0x5/0x5f > > > [ 904.691005] ? srso_return_thunk+0x5/0x5f > > > [ 904.691583] ? srso_return_thunk+0x5/0x5f > > > [ 904.692180] ? local_clock_noinstr+0x9/0xb0 > > > [ 904.692841] ? srso_return_thunk+0x5/0x5f > > > [ 904.693419] ? srso_return_thunk+0x5/0x5f > > > [ 904.693990] ? lock_release+0x4e2/0x570 > > > [ 904.694544] ? srso_return_thunk+0x5/0x5f > > > [ 904.695115] ? kernfs_put_active+0x5d/0xc0 > > > [ 904.695708] ? srso_return_thunk+0x5/0x5f > > > [ 904.696286] ? kernfs_fop_open+0x376/0x6b0 > > > [ 904.696872] ima_file_check+0x65/0x80 > > > [ 904.697409] security_file_post_open+0xb1/0x1b0 > > > [ 904.698058] path_openat+0x216/0x1280 > > > [ 904.698589] do_filp_open+0xab/0x140 > > > [ 904.699106] ? srso_return_thunk+0x5/0x5f > > > [ 904.699693] ? lock_release+0x554/0x570 > > > [ 904.700264] ? srso_return_thunk+0x5/0x5f > > > [ 904.700836] ? do_raw_spin_unlock+0x76/0x140 > > > [ 904.701450] ? srso_return_thunk+0x5/0x5f > > > [ 904.702021] ? _raw_spin_unlock+0x3f/0xa0 > > > [ 904.702606] ? srso_return_thunk+0x5/0x5f > > > [ 904.703178] ? alloc_fd+0x1ca/0x3b0 > > > [ 904.703685] do_sys_openat2+0xba/0x120 > > > [ 904.704223] ? file_free+0x8d/0x110 > > > [ 904.704729] do_sys_open+0x4c/0x80 > > > [ 904.705221] __x64_sys_openat+0x23/0x30 > > > [ 904.705784] x64_sys_call+0x2575/0x4550 > > > [ 904.706337] do_syscall_64+0x71/0x180 > > > [ 904.706863] entry_SYSCALL_64_after_hwframe+0x76/0x7e > > > [ 904.707587] RIP: 0033:0x7f3462123037 > > > [ 904.708120] Code: 55 9c 48 89 75 a0 89 7d a8 44 89 55 ac e8 a1 7a f8 ff 44 8b 55 ac 8b 55 9c 41 89 c0 48 8b 75 a0 8b 7d a8 b8 01 01 00 00 0f 05 <48> 3d 00 f0 ff ff 77 31 44 89 c7 89 45 ac e8 f6 7a f8 ff 8b 45 ac > > > [ 904.710744] RSP: 002b:00007ffdd1a79370 EFLAGS: 00000293 ORIG_RAX: 0000000000000101 > > > [ 904.711821] RAX: ffffffffffffffda RBX: ffffffffffffffff RCX: 00007f3462123037 > > > [ 904.712829] RDX: 0000000000080100 RSI: 00007ffdd1a79420 RDI: 00000000ffffff9c > > > [ 904.713839] RBP: 00007ffdd1a793e0 R08: 0000000000000000 R09: 0000000000000000 > > > [ 904.714848] R10: 0000000000000000 R11: 0000000000000293 R12: 00007ffdd1a794c8 > > > [ 904.715855] R13: 00007ffdd1a794b8 R14: 00007ffdd1a79690 R15: 00007ffdd1a79690 > > > [ 904.716877] </TASK> > > > > > > Roberto > > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 9:24 ` Roberto Sassu @ 2024-10-18 10:49 ` Kirill A. Shutemov 2024-10-18 10:52 ` Kirill A. Shutemov 2024-10-18 11:00 ` Lorenzo Stoakes 0 siblings, 2 replies; 13+ messages in thread From: Kirill A. Shutemov @ 2024-10-18 10:49 UTC (permalink / raw) To: Roberto Sassu Cc: Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, lorenzo.stoakes, linux-fsdevel On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > Probably it is hard, @Kirill would there be any way to safely move > security_mmap_file() out of the mmap_lock lock? What about something like this (untested): diff --git a/mm/mmap.c b/mm/mmap.c index dd4b35a25aeb..03473e77d356 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (pgoff + (size >> PAGE_SHIFT) < pgoff) return ret; + if (mmap_read_lock_killable(mm)) + return -EINTR; + + vma = vma_lookup(mm, start); + + if (!vma || !(vma->vm_flags & VM_SHARED)) { + mmap_read_unlock(mm); + return -EINVAL; + } + + file = get_file(vma->vm_file); + + mmap_read_unlock(mm); + + ret = security_mmap_file(vma->vm_file, prot, flags); + if (ret) { + fput(file); + return ret; + } + if (mmap_write_lock_killable(mm)) return -EINTR; @@ -1654,6 +1674,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (!vma || !(vma->vm_flags & VM_SHARED)) goto out; + if (vma->vm_file != file) + goto out; + if (start + size > vma->vm_end) { VMA_ITERATOR(vmi, mm, vma->vm_end); struct vm_area_struct *next, *prev = vma; @@ -1688,16 +1711,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (vma->vm_flags & VM_LOCKED) flags |= MAP_LOCKED; - file = get_file(vma->vm_file); - ret = security_mmap_file(vma->vm_file, prot, flags); - if (ret) - goto out_fput; ret = do_mmap(vma->vm_file, start, size, prot, flags, 0, pgoff, &populate, NULL); -out_fput: - fput(file); out: mmap_write_unlock(mm); + fput(file); if (populate) mm_populate(ret, populate); if (!IS_ERR_VALUE(ret)) -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 10:49 ` Kirill A. Shutemov @ 2024-10-18 10:52 ` Kirill A. Shutemov 2024-10-18 10:55 ` Kirill A. Shutemov 2024-10-18 11:00 ` Lorenzo Stoakes 1 sibling, 1 reply; 13+ messages in thread From: Kirill A. Shutemov @ 2024-10-18 10:52 UTC (permalink / raw) To: Roberto Sassu Cc: Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, lorenzo.stoakes, linux-fsdevel On Fri, Oct 18, 2024 at 01:49:21PM +0300, Kirill A. Shutemov wrote: > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > Probably it is hard, @Kirill would there be any way to safely move > > security_mmap_file() out of the mmap_lock lock? > > What about something like this (untested): > > diff --git a/mm/mmap.c b/mm/mmap.c > index dd4b35a25aeb..03473e77d356 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > return ret; > > + if (mmap_read_lock_killable(mm)) > + return -EINTR; > + > + vma = vma_lookup(mm, start); > + > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > + mmap_read_unlock(mm); > + return -EINVAL; > + } > + > + file = get_file(vma->vm_file); > + > + mmap_read_unlock(mm); > + > + ret = security_mmap_file(vma->vm_file, prot, flags); > + if (ret) { > + fput(file); > + return ret; > + } > + Emm. We need to restore 'ret' to -EINVAL here: + + ret = -EINVAL; + > if (mmap_write_lock_killable(mm)) > return -EINTR; > > @@ -1654,6 +1674,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (!vma || !(vma->vm_flags & VM_SHARED)) > goto out; > > + if (vma->vm_file != file) > + goto out; > + > if (start + size > vma->vm_end) { > VMA_ITERATOR(vmi, mm, vma->vm_end); > struct vm_area_struct *next, *prev = vma; > @@ -1688,16 +1711,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (vma->vm_flags & VM_LOCKED) > flags |= MAP_LOCKED; > > - file = get_file(vma->vm_file); > - ret = security_mmap_file(vma->vm_file, prot, flags); > - if (ret) > - goto out_fput; > ret = do_mmap(vma->vm_file, start, size, > prot, flags, 0, pgoff, &populate, NULL); > -out_fput: > - fput(file); > out: > mmap_write_unlock(mm); > + fput(file); > if (populate) > mm_populate(ret, populate); > if (!IS_ERR_VALUE(ret)) > -- > Kiryl Shutsemau / Kirill A. Shutemov -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 10:52 ` Kirill A. Shutemov @ 2024-10-18 10:55 ` Kirill A. Shutemov 0 siblings, 0 replies; 13+ messages in thread From: Kirill A. Shutemov @ 2024-10-18 10:55 UTC (permalink / raw) To: Roberto Sassu Cc: Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, lorenzo.stoakes, linux-fsdevel On Fri, Oct 18, 2024 at 01:53:03PM +0300, Kirill A. Shutemov wrote: > > + mmap_read_unlock(mm); > > + > > + ret = security_mmap_file(vma->vm_file, prot, flags); > > + if (ret) { > > + fput(file); > > + return ret; > > + } > > + > > Emm. We need to restore 'ret' to -EINVAL here: > > + > + ret = -EINVAL; > + > > > if (mmap_write_lock_killable(mm)) > > return -EINTR; > > And fput() here on error. Updated patch: diff --git a/mm/mmap.c b/mm/mmap.c index dd4b35a25aeb..7c1b73a79937 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1646,14 +1646,41 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (pgoff + (size >> PAGE_SHIFT) < pgoff) return ret; - if (mmap_write_lock_killable(mm)) + if (mmap_read_lock_killable(mm)) return -EINTR; vma = vma_lookup(mm, start); + if (!vma || !(vma->vm_flags & VM_SHARED)) { + mmap_read_unlock(mm); + return -EINVAL; + } + + file = get_file(vma->vm_file); + + mmap_read_unlock(mm); + + ret = security_mmap_file(vma->vm_file, prot, flags); + if (ret) { + fput(file); + return ret; + } + + ret = -EINVAL; + + if (mmap_write_lock_killable(mm)) { + fput(file); + return -EINTR; + } + + vma = vma_lookup(mm, start); + if (!vma || !(vma->vm_flags & VM_SHARED)) goto out; + if (vma->vm_file != file) + goto out; + if (start + size > vma->vm_end) { VMA_ITERATOR(vmi, mm, vma->vm_end); struct vm_area_struct *next, *prev = vma; @@ -1688,16 +1715,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (vma->vm_flags & VM_LOCKED) flags |= MAP_LOCKED; - file = get_file(vma->vm_file); - ret = security_mmap_file(vma->vm_file, prot, flags); - if (ret) - goto out_fput; ret = do_mmap(vma->vm_file, start, size, prot, flags, 0, pgoff, &populate, NULL); -out_fput: - fput(file); out: mmap_write_unlock(mm); + fput(file); if (populate) mm_populate(ret, populate); If (!IS_ERR_VALUE(ret)) -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 10:49 ` Kirill A. Shutemov 2024-10-18 10:52 ` Kirill A. Shutemov @ 2024-10-18 11:00 ` Lorenzo Stoakes 2024-10-18 11:05 ` Kirill A. Shutemov 2024-10-18 14:36 ` Jann Horn 1 sibling, 2 replies; 13+ messages in thread From: Lorenzo Stoakes @ 2024-10-18 11:00 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Roberto Sassu, Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, linux-fsdevel, Liam Howlett, Jann Horn + Liam, Jann On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote: > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > Probably it is hard, @Kirill would there be any way to safely move > > security_mmap_file() out of the mmap_lock lock? > > What about something like this (untested): > > diff --git a/mm/mmap.c b/mm/mmap.c > index dd4b35a25aeb..03473e77d356 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > return ret; > > + if (mmap_read_lock_killable(mm)) > + return -EINTR; > + > + vma = vma_lookup(mm, start); > + > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > + mmap_read_unlock(mm); > + return -EINVAL; > + } > + > + file = get_file(vma->vm_file); > + > + mmap_read_unlock(mm); > + > + ret = security_mmap_file(vma->vm_file, prot, flags); Accessing VMA fields without any kind of lock is... very much not advised. I'm guessing you meant to say: ret = security_mmap_file(file, prot, flags); Here? :) I see the original code did this, but obviously was under an mmap lock. I guess given you check that the file is the same below this.... should be fine? Assuming nothing can come in and invalidate the security_mmap_file() check in the mean time somehow? Jann any thoughts? > + if (ret) { > + fput(file); > + return ret; > + } > + > if (mmap_write_lock_killable(mm)) > return -EINTR; > > @@ -1654,6 +1674,9 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (!vma || !(vma->vm_flags & VM_SHARED)) > goto out; > > + if (vma->vm_file != file) > + goto out; > + > if (start + size > vma->vm_end) { > VMA_ITERATOR(vmi, mm, vma->vm_end); > struct vm_area_struct *next, *prev = vma; > @@ -1688,16 +1711,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (vma->vm_flags & VM_LOCKED) > flags |= MAP_LOCKED; > > - file = get_file(vma->vm_file); > - ret = security_mmap_file(vma->vm_file, prot, flags); > - if (ret) > - goto out_fput; > ret = do_mmap(vma->vm_file, start, size, > prot, flags, 0, pgoff, &populate, NULL); > -out_fput: > - fput(file); > out: > mmap_write_unlock(mm); > + fput(file); > if (populate) > mm_populate(ret, populate); > if (!IS_ERR_VALUE(ret)) > -- > Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 11:00 ` Lorenzo Stoakes @ 2024-10-18 11:05 ` Kirill A. Shutemov 2024-10-18 11:22 ` Roberto Sassu 2024-10-18 13:29 ` Roberto Sassu 2024-10-18 14:36 ` Jann Horn 1 sibling, 2 replies; 13+ messages in thread From: Kirill A. Shutemov @ 2024-10-18 11:05 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Roberto Sassu, Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, linux-fsdevel, Liam Howlett, Jann Horn On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote: > + Liam, Jann > > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote: > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > > Probably it is hard, @Kirill would there be any way to safely move > > > security_mmap_file() out of the mmap_lock lock? > > > > What about something like this (untested): > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index dd4b35a25aeb..03473e77d356 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > > return ret; > > > > + if (mmap_read_lock_killable(mm)) > > + return -EINTR; > > + > > + vma = vma_lookup(mm, start); > > + > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > > + mmap_read_unlock(mm); > > + return -EINVAL; > > + } > > + > > + file = get_file(vma->vm_file); > > + > > + mmap_read_unlock(mm); > > + > > + ret = security_mmap_file(vma->vm_file, prot, flags); > > Accessing VMA fields without any kind of lock is... very much not advised. > > I'm guessing you meant to say: > > ret = security_mmap_file(file, prot, flags); > > Here? :) Sure. My bad. Patch with all fixups: diff --git a/mm/mmap.c b/mm/mmap.c index dd4b35a25aeb..541787d526b6 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1646,14 +1646,41 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (pgoff + (size >> PAGE_SHIFT) < pgoff) return ret; - if (mmap_write_lock_killable(mm)) + if (mmap_read_lock_killable(mm)) return -EINTR; vma = vma_lookup(mm, start); + if (!vma || !(vma->vm_flags & VM_SHARED)) { + mmap_read_unlock(mm); + return -EINVAL; + } + + file = get_file(vma->vm_file); + + mmap_read_unlock(mm); + + ret = security_mmap_file(file, prot, flags); + if (ret) { + fput(file); + return ret; + } + + ret = -EINVAL; + + if (mmap_write_lock_killable(mm)) { + fput(file); + return -EINTR; + } + + vma = vma_lookup(mm, start); + if (!vma || !(vma->vm_flags & VM_SHARED)) goto out; + if (vma->vm_file != file) + goto out; + if (start + size > vma->vm_end) { VMA_ITERATOR(vmi, mm, vma->vm_end); struct vm_area_struct *next, *prev = vma; @@ -1688,16 +1715,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, if (vma->vm_flags & VM_LOCKED) flags |= MAP_LOCKED; - file = get_file(vma->vm_file); - ret = security_mmap_file(vma->vm_file, prot, flags); - if (ret) - goto out_fput; ret = do_mmap(vma->vm_file, start, size, prot, flags, 0, pgoff, &populate, NULL); -out_fput: - fput(file); out: mmap_write_unlock(mm); + fput(file); if (populate) mm_populate(ret, populate); if (!IS_ERR_VALUE(ret)) -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 11:05 ` Kirill A. Shutemov @ 2024-10-18 11:22 ` Roberto Sassu 2024-10-18 11:54 ` Kirill A. Shutemov 2024-10-18 13:29 ` Roberto Sassu 1 sibling, 1 reply; 13+ messages in thread From: Roberto Sassu @ 2024-10-18 11:22 UTC (permalink / raw) To: Kirill A. Shutemov, Lorenzo Stoakes Cc: Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, linux-fsdevel, Liam Howlett, Jann Horn On Fri, 2024-10-18 at 14:05 +0300, Kirill A. Shutemov wrote: > On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote: > > + Liam, Jann > > > > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > > > Probably it is hard, @Kirill would there be any way to safely move > > > > security_mmap_file() out of the mmap_lock lock? > > > > > > What about something like this (untested): > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index dd4b35a25aeb..03473e77d356 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > > > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > > > return ret; > > > > > > + if (mmap_read_lock_killable(mm)) > > > + return -EINTR; > > > + > > > + vma = vma_lookup(mm, start); > > > + > > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > > > + mmap_read_unlock(mm); > > > + return -EINVAL; > > > + } > > > + > > > + file = get_file(vma->vm_file); > > > + > > > + mmap_read_unlock(mm); > > > + > > > + ret = security_mmap_file(vma->vm_file, prot, flags); > > > > Accessing VMA fields without any kind of lock is... very much not advised. > > > > I'm guessing you meant to say: > > > > ret = security_mmap_file(file, prot, flags); > > > > Here? :) > > Sure. My bad. > > Patch with all fixups: Thanks a lot! Let's wait a bit until the others have a chance to comment. Meanwhile, I will test it. Do you want me to do the final patch, or will you be proposing it? Roberto > diff --git a/mm/mmap.c b/mm/mmap.c > index dd4b35a25aeb..541787d526b6 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1646,14 +1646,41 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > return ret; > > - if (mmap_write_lock_killable(mm)) > + if (mmap_read_lock_killable(mm)) > return -EINTR; > > vma = vma_lookup(mm, start); > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > + mmap_read_unlock(mm); > + return -EINVAL; > + } > + > + file = get_file(vma->vm_file); > + > + mmap_read_unlock(mm); > + > + ret = security_mmap_file(file, prot, flags); > + if (ret) { > + fput(file); > + return ret; > + } > + > + ret = -EINVAL; > + > + if (mmap_write_lock_killable(mm)) { > + fput(file); > + return -EINTR; > + } > + > + vma = vma_lookup(mm, start); > + > if (!vma || !(vma->vm_flags & VM_SHARED)) > goto out; > > + if (vma->vm_file != file) > + goto out; > + > if (start + size > vma->vm_end) { > VMA_ITERATOR(vmi, mm, vma->vm_end); > struct vm_area_struct *next, *prev = vma; > @@ -1688,16 +1715,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (vma->vm_flags & VM_LOCKED) > flags |= MAP_LOCKED; > > - file = get_file(vma->vm_file); > - ret = security_mmap_file(vma->vm_file, prot, flags); > - if (ret) > - goto out_fput; > ret = do_mmap(vma->vm_file, start, size, > prot, flags, 0, pgoff, &populate, NULL); > -out_fput: > - fput(file); > out: > mmap_write_unlock(mm); > + fput(file); > if (populate) > mm_populate(ret, populate); > if (!IS_ERR_VALUE(ret)) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 11:22 ` Roberto Sassu @ 2024-10-18 11:54 ` Kirill A. Shutemov 2024-10-18 11:56 ` Roberto Sassu 0 siblings, 1 reply; 13+ messages in thread From: Kirill A. Shutemov @ 2024-10-18 11:54 UTC (permalink / raw) To: Roberto Sassu Cc: Lorenzo Stoakes, Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, linux-fsdevel, Liam Howlett, Jann Horn On Fri, Oct 18, 2024 at 01:22:35PM +0200, Roberto Sassu wrote: > On Fri, 2024-10-18 at 14:05 +0300, Kirill A. Shutemov wrote: > > On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote: > > > + Liam, Jann > > > > > > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote: > > > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > > > > Probably it is hard, @Kirill would there be any way to safely move > > > > > security_mmap_file() out of the mmap_lock lock? > > > > > > > > What about something like this (untested): > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > index dd4b35a25aeb..03473e77d356 100644 > > > > --- a/mm/mmap.c > > > > +++ b/mm/mmap.c > > > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > > > > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > > > > return ret; > > > > > > > > + if (mmap_read_lock_killable(mm)) > > > > + return -EINTR; > > > > + > > > > + vma = vma_lookup(mm, start); > > > > + > > > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > > > > + mmap_read_unlock(mm); > > > > + return -EINVAL; > > > > + } > > > > + > > > > + file = get_file(vma->vm_file); > > > > + > > > > + mmap_read_unlock(mm); > > > > + > > > > + ret = security_mmap_file(vma->vm_file, prot, flags); > > > > > > Accessing VMA fields without any kind of lock is... very much not advised. > > > > > > I'm guessing you meant to say: > > > > > > ret = security_mmap_file(file, prot, flags); > > > > > > Here? :) > > > > Sure. My bad. > > > > Patch with all fixups: > > Thanks a lot! Let's wait a bit until the others have a chance to > comment. Meanwhile, I will test it. > > Do you want me to do the final patch, or will you be proposing it? You can post it if it works: Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> -- Kiryl Shutsemau / Kirill A. Shutemov ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 11:54 ` Kirill A. Shutemov @ 2024-10-18 11:56 ` Roberto Sassu 0 siblings, 0 replies; 13+ messages in thread From: Roberto Sassu @ 2024-10-18 11:56 UTC (permalink / raw) To: Kirill A. Shutemov Cc: Lorenzo Stoakes, Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, linux-fsdevel, Liam Howlett, Jann Horn On Fri, 2024-10-18 at 14:54 +0300, Kirill A. Shutemov wrote: > On Fri, Oct 18, 2024 at 01:22:35PM +0200, Roberto Sassu wrote: > > On Fri, 2024-10-18 at 14:05 +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote: > > > > + Liam, Jann > > > > > > > > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote: > > > > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > > > > > Probably it is hard, @Kirill would there be any way to safely move > > > > > > security_mmap_file() out of the mmap_lock lock? > > > > > > > > > > What about something like this (untested): > > > > > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > > > index dd4b35a25aeb..03473e77d356 100644 > > > > > --- a/mm/mmap.c > > > > > +++ b/mm/mmap.c > > > > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > > > > > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > > > > > return ret; > > > > > > > > > > + if (mmap_read_lock_killable(mm)) > > > > > + return -EINTR; > > > > > + > > > > > + vma = vma_lookup(mm, start); > > > > > + > > > > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > > > > > + mmap_read_unlock(mm); > > > > > + return -EINVAL; > > > > > + } > > > > > + > > > > > + file = get_file(vma->vm_file); > > > > > + > > > > > + mmap_read_unlock(mm); > > > > > + > > > > > + ret = security_mmap_file(vma->vm_file, prot, flags); > > > > > > > > Accessing VMA fields without any kind of lock is... very much not advised. > > > > > > > > I'm guessing you meant to say: > > > > > > > > ret = security_mmap_file(file, prot, flags); > > > > > > > > Here? :) > > > > > > Sure. My bad. > > > > > > Patch with all fixups: > > > > Thanks a lot! Let's wait a bit until the others have a chance to > > comment. Meanwhile, I will test it. > > > > Do you want me to do the final patch, or will you be proposing it? > > You can post it if it works: > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Thanks! Ok. Roberto ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 11:05 ` Kirill A. Shutemov 2024-10-18 11:22 ` Roberto Sassu @ 2024-10-18 13:29 ` Roberto Sassu 1 sibling, 0 replies; 13+ messages in thread From: Roberto Sassu @ 2024-10-18 13:29 UTC (permalink / raw) To: Kirill A. Shutemov, Lorenzo Stoakes Cc: Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, linux-fsdevel, Liam Howlett, Jann Horn On Fri, 2024-10-18 at 14:05 +0300, Kirill A. Shutemov wrote: > On Fri, Oct 18, 2024 at 12:00:22PM +0100, Lorenzo Stoakes wrote: > > + Liam, Jann > > > > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > > > Probably it is hard, @Kirill would there be any way to safely move > > > > security_mmap_file() out of the mmap_lock lock? > > > > > > What about something like this (untested): > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index dd4b35a25aeb..03473e77d356 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > > > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > > > return ret; > > > > > > + if (mmap_read_lock_killable(mm)) > > > + return -EINTR; > > > + > > > + vma = vma_lookup(mm, start); > > > + > > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > > > + mmap_read_unlock(mm); > > > + return -EINVAL; > > > + } > > > + > > > + file = get_file(vma->vm_file); > > > + > > > + mmap_read_unlock(mm); > > > + > > > + ret = security_mmap_file(vma->vm_file, prot, flags); > > > > Accessing VMA fields without any kind of lock is... very much not advised. > > > > I'm guessing you meant to say: > > > > ret = security_mmap_file(file, prot, flags); > > > > Here? :) > > Sure. My bad. > > Patch with all fixups: > > diff --git a/mm/mmap.c b/mm/mmap.c > index dd4b35a25aeb..541787d526b6 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1646,14 +1646,41 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > return ret; > > - if (mmap_write_lock_killable(mm)) > + if (mmap_read_lock_killable(mm)) > return -EINTR; > > vma = vma_lookup(mm, start); > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > + mmap_read_unlock(mm); > + return -EINVAL; > + } > + > + file = get_file(vma->vm_file); > + > + mmap_read_unlock(mm); > + > + ret = security_mmap_file(file, prot, flags); Uhm, I have to calculate prot and flags before. I can check if what I used here changed in the next lock, and refuse. Roberto > + if (ret) { > + fput(file); > + return ret; > + } > + > + ret = -EINVAL; > + > + if (mmap_write_lock_killable(mm)) { > + fput(file); > + return -EINTR; > + } > + > + vma = vma_lookup(mm, start); > + > if (!vma || !(vma->vm_flags & VM_SHARED)) > goto out; > > + if (vma->vm_file != file) > + goto out; > + > if (start + size > vma->vm_end) { > VMA_ITERATOR(vmi, mm, vma->vm_end); > struct vm_area_struct *next, *prev = vma; > @@ -1688,16 +1715,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > if (vma->vm_flags & VM_LOCKED) > flags |= MAP_LOCKED; > > - file = get_file(vma->vm_file); > - ret = security_mmap_file(vma->vm_file, prot, flags); > - if (ret) > - goto out_fput; > ret = do_mmap(vma->vm_file, start, size, > prot, flags, 0, pgoff, &populate, NULL); > -out_fput: > - fput(file); > out: > mmap_write_unlock(mm); > + fput(file); > if (populate) > mm_populate(ret, populate); > if (!IS_ERR_VALUE(ret)) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 11:00 ` Lorenzo Stoakes 2024-10-18 11:05 ` Kirill A. Shutemov @ 2024-10-18 14:36 ` Jann Horn 2024-10-18 14:48 ` Lorenzo Stoakes 1 sibling, 1 reply; 13+ messages in thread From: Jann Horn @ 2024-10-18 14:36 UTC (permalink / raw) To: Lorenzo Stoakes Cc: Kirill A. Shutemov, Roberto Sassu, Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, linux-fsdevel, Liam Howlett On Fri, Oct 18, 2024 at 1:00 PM Lorenzo Stoakes <lorenzo.stoakes@oracle.com> wrote: > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote: > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > > Probably it is hard, @Kirill would there be any way to safely move > > > security_mmap_file() out of the mmap_lock lock? > > > > What about something like this (untested): > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > index dd4b35a25aeb..03473e77d356 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > > return ret; > > > > + if (mmap_read_lock_killable(mm)) > > + return -EINTR; > > + > > + vma = vma_lookup(mm, start); > > + > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > > + mmap_read_unlock(mm); > > + return -EINVAL; > > + } > > + > > + file = get_file(vma->vm_file); > > + > > + mmap_read_unlock(mm); > > + > > + ret = security_mmap_file(vma->vm_file, prot, flags); > > Accessing VMA fields without any kind of lock is... very much not advised. > > I'm guessing you meant to say: > > ret = security_mmap_file(file, prot, flags); > > Here? :) > > I see the original code did this, but obviously was under an mmap lock. > > I guess given you check that the file is the same below this.... should be > fine? Assuming nothing can come in and invalidate the security_mmap_file() > check in the mean time somehow? > > Jann any thoughts? The overall approach seems reasonable to me - it aligns this path with the other security_mmap_file() checks, which also don't happen under the lock. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 1/3] ima: Remove inode lock 2024-10-18 14:36 ` Jann Horn @ 2024-10-18 14:48 ` Lorenzo Stoakes 0 siblings, 0 replies; 13+ messages in thread From: Lorenzo Stoakes @ 2024-10-18 14:48 UTC (permalink / raw) To: Jann Horn Cc: Kirill A. Shutemov, Roberto Sassu, Paul Moore, ebpqwerty472123, kirill.shutemov, zohar, dmitry.kasatkin, eric.snowberg, jmorris, serge, linux-integrity, linux-security-module, linux-kernel, bpf, Roberto Sassu, linux-mm, akpm, vbabka, linux-fsdevel, Liam Howlett On Fri, Oct 18, 2024 at 04:36:16PM +0200, Jann Horn wrote: > On Fri, Oct 18, 2024 at 1:00 PM Lorenzo Stoakes > <lorenzo.stoakes@oracle.com> wrote: > > On Fri, Oct 18, 2024 at 01:49:06PM +0300, Kirill A. Shutemov wrote: > > > On Fri, Oct 18, 2024 at 11:24:06AM +0200, Roberto Sassu wrote: > > > > Probably it is hard, @Kirill would there be any way to safely move > > > > security_mmap_file() out of the mmap_lock lock? > > > > > > What about something like this (untested): > > > > > > diff --git a/mm/mmap.c b/mm/mmap.c > > > index dd4b35a25aeb..03473e77d356 100644 > > > --- a/mm/mmap.c > > > +++ b/mm/mmap.c > > > @@ -1646,6 +1646,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size, > > > if (pgoff + (size >> PAGE_SHIFT) < pgoff) > > > return ret; > > > > > > + if (mmap_read_lock_killable(mm)) > > > + return -EINTR; > > > + > > > + vma = vma_lookup(mm, start); > > > + > > > + if (!vma || !(vma->vm_flags & VM_SHARED)) { > > > + mmap_read_unlock(mm); > > > + return -EINVAL; > > > + } > > > + > > > + file = get_file(vma->vm_file); > > > + > > > + mmap_read_unlock(mm); > > > + > > > + ret = security_mmap_file(vma->vm_file, prot, flags); > > > > Accessing VMA fields without any kind of lock is... very much not advised. > > > > I'm guessing you meant to say: > > > > ret = security_mmap_file(file, prot, flags); > > > > Here? :) > > > > I see the original code did this, but obviously was under an mmap lock. > > > > I guess given you check that the file is the same below this.... should be > > fine? Assuming nothing can come in and invalidate the security_mmap_file() > > check in the mean time somehow? > > > > Jann any thoughts? > > The overall approach seems reasonable to me - it aligns this path with > the other security_mmap_file() checks, which also don't happen under > the lock. Yeah equally I find this pattern fine as we check that the file is the same after we reacquire the lock (a common pattern in mm), so if there's no objections on security side I don't really see any issue myself - Roberto if you want to go ahead and post the patch (separately perhaps?) we can toss some tags your way :) ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2024-10-18 14:48 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <20241008165732.2603647-1-roberto.sassu@huaweicloud.com>
[not found] ` <CAHC9VhSyWNKqustrTjA1uUaZa_jA-KjtzpKdJ4ikSUKoi7iV0Q@mail.gmail.com>
[not found] ` <CAHC9VhQR2JbB7ni2yX_U8TWE0PcQQkm_pBCuG3nYN7qO15nNjg@mail.gmail.com>
[not found] ` <7358f12d852964d9209492e337d33b8880234b74.camel@huaweicloud.com>
[not found] ` <593282dbc9f48673c8f3b8e0f28e100f34141115.camel@huaweicloud.com>
[not found] ` <15bb94a306d3432de55c0a12f29e7ed2b5fa3ba1.camel@huaweicloud.com>
2024-10-16 9:59 ` [PATCH 1/3] ima: Remove inode lock Roberto Sassu
2024-10-18 9:24 ` Roberto Sassu
2024-10-18 10:49 ` Kirill A. Shutemov
2024-10-18 10:52 ` Kirill A. Shutemov
2024-10-18 10:55 ` Kirill A. Shutemov
2024-10-18 11:00 ` Lorenzo Stoakes
2024-10-18 11:05 ` Kirill A. Shutemov
2024-10-18 11:22 ` Roberto Sassu
2024-10-18 11:54 ` Kirill A. Shutemov
2024-10-18 11:56 ` Roberto Sassu
2024-10-18 13:29 ` Roberto Sassu
2024-10-18 14:36 ` Jann Horn
2024-10-18 14:48 ` Lorenzo Stoakes
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox