linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* not issuing vma_start_write() in dup_mmap() if the caller is single-threaded
@ 2025-03-27  5:46 Mateusz Guzik
  2025-03-29  0:56 ` Suren Baghdasaryan
  0 siblings, 1 reply; 13+ messages in thread
From: Mateusz Guzik @ 2025-03-27  5:46 UTC (permalink / raw)
  To: Suren Baghdasaryan; +Cc: linux-mm, Liam R. Howlett

Hi Suren,

I brought this up long time ago, you agreed the idea works and stated
you will sort it out, but it apparently fell to the wayside (no hard
feelings though :>)

Here is the original:
https://lore.kernel.org/all/20230804214620.btgwhsszsd7rh6nf@f/

the thread is weirdly long and I recommend not opening it without a
good reason, I link it for reference if needed.

At the time there were woes concerning stability of the new locking
scheme, resulting in CC list being rather excessive. As such I'm
starting a new thread with a modest list, not sure who to add though.

So dup_mmap() holds the caller mmap_sem for writing and calls
vma_start_write() to protect against fault handling in another threads
using the same mm.

If this is the only thread with this ->mm in place, there are no
sibling threads to worry about and this can be checked with mm_users
== 1.

AFAICS all remote accesses require the mmap_sem to also be held, which
provides exclusion against dup_mmap, meaning they don't pose a problem
either.

The patch to merely skip locking is a few liner and I would officially
submit it myself, but for my taste an assert is needed in fault
handling to runtime test the above invariant. So happens I really
can't be bothered to figure out how to sort it out and was hoping you
would step in. ;) Alternatively if you guys don't think the assert is
warranted, that's your business.

As for whether this can save any locking -- yes:

I added a probe (below for reference) with two args: whether we are
single-threaded and vma_start_write() returning whether it took the
down/up cycle and ran make -j 20 in the kernel dir.

The lock was taken for every single vma (377913 in total), while all
forking processes were single-threaded. Or to put it differently all
of these were skippable.

the probe (total hack):
bpftrace -e 'kprobe:dup_probe { @[arg0, arg1] = count(); }'

probe diff:
diff --git a/fs/namei.c b/fs/namei.c
index ecb7b95c2ca3..d6cde76eda81 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -5459,3 +5459,7 @@ const struct inode_operations
page_symlink_inode_operations = {
        .get_link       = page_get_link,
 };
 EXPORT_SYMBOL(page_symlink_inode_operations);
+
+void dup_probe(int, int);
+void dup_probe(int, int) { }
+EXPORT_SYMBOL(dup_probe);
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 1f80baddacc5..f7b1f0a02f2e 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -760,12 +760,12 @@ static bool __is_vma_write_locked(struct
vm_area_struct *vma, unsigned int *mm_l
  * Exclude concurrent readers under the per-VMA lock until the currently
  * write-locked mmap_lock is dropped or downgraded.
  */
-static inline void vma_start_write(struct vm_area_struct *vma)
+static inline bool vma_start_write(struct vm_area_struct *vma)
 {
        unsigned int mm_lock_seq;

        if (__is_vma_write_locked(vma, &mm_lock_seq))
-               return;
+               return false;

        down_write(&vma->vm_lock->lock);
        /*
@@ -776,6 +776,7 @@ static inline void vma_start_write(struct
vm_area_struct *vma)
         */
        WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq);
        up_write(&vma->vm_lock->lock);
+       return true;
 }

 static inline void vma_assert_write_locked(struct vm_area_struct *vma)
diff --git a/kernel/fork.c b/kernel/fork.c
index 735405a9c5f3..0cc56255a339 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -629,6 +629,8 @@ static void dup_mm_exe_file(struct mm_struct *mm,
struct mm_struct *oldmm)
                pr_warn_once("exe_file_deny_write_access() failed in
%s\n", __func__);
 }

+void dup_probe(int, int);
+
 #ifdef CONFIG_MMU
 static __latent_entropy int dup_mmap(struct mm_struct *mm,
                                        struct mm_struct *oldmm)
@@ -638,9 +640,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
        unsigned long charge = 0;
        LIST_HEAD(uf);
        VMA_ITERATOR(vmi, mm, 0);
+       bool only_user;

        if (mmap_write_lock_killable(oldmm))
                return -EINTR;
+       only_user = atomic_read(&oldmm->mm_users) == 1;
        flush_cache_dup_mm(oldmm);
        uprobe_dup_mmap(oldmm, mm);
        /*
@@ -664,8 +668,11 @@ static __latent_entropy int dup_mmap(struct mm_struct *mm,
        mt_clear_in_rcu(vmi.mas.tree);
        for_each_vma(vmi, mpnt) {
                struct file *file;
+               bool locked;
+
+               locked = vma_start_write(mpnt);
+               dup_probe(only_user ? 1 :0, locked ? 1 : 0);

-               vma_start_write(mpnt);
                if (mpnt->vm_flags & VM_DONTCOPY) {
                        retval = vma_iter_clear_gfp(&vmi, mpnt->vm_start,
                                                    mpnt->vm_end, GFP_KERNEL);

-- 
Mateusz Guzik <mjguzik gmail.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2025-03-31 20:28 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-27  5:46 not issuing vma_start_write() in dup_mmap() if the caller is single-threaded Mateusz Guzik
2025-03-29  0:56 ` Suren Baghdasaryan
2025-03-29  1:15   ` Mateusz Guzik
2025-03-29  1:35     ` Suren Baghdasaryan
2025-03-29  1:51       ` Mateusz Guzik
2025-03-30 19:23         ` Suren Baghdasaryan
2025-03-30 19:25           ` Suren Baghdasaryan
2025-03-30 19:43           ` Mateusz Guzik
2025-03-31 16:43             ` Liam R. Howlett
2025-03-31 17:50               ` Mateusz Guzik
2025-03-31 18:42                 ` Suren Baghdasaryan
2025-03-31 19:24                 ` Liam R. Howlett
2025-03-31 20:27                   ` Mateusz Guzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox