linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: akpm@linux-foundation.org, jannh@google.com, willy@infradead.org,
	 liam.howlett@oracle.com, david@redhat.com, peterx@redhat.com,
	 ldufour@linux.ibm.com, vbabka@suse.cz, michel@lespinasse.org,
	 jglisse@google.com, mhocko@suse.com, hannes@cmpxchg.org,
	dave@stgolabs.net,  hughd@google.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	 stable@vger.kernel.org
Subject: Re: [PATCH 1/6] mm: enable page walking API to lock vmas during the walk
Date: Mon, 31 Jul 2023 11:02:36 -0700	[thread overview]
Message-ID: <CAHk-=wjEbJS3OhUu+2sV8Kft8GnGcsNFOhYhXYQuk5nvvqR-NQ@mail.gmail.com> (raw)
In-Reply-To: <20230731171233.1098105-2-surenb@google.com>

On Mon, 31 Jul 2023 at 10:12, Suren Baghdasaryan <surenb@google.com> wrote:
>
> -               walk_page_vma(vma, &subpage_walk_ops, NULL);
> +               walk_page_vma(vma, &subpage_walk_ops, true, NULL);

Rather than add a new argument to the walk_page_*() functions, I
really think you should just add the locking rule to the 'const struct
mm_walk_ops' structure.

The locking rule goes along with the rules for what you are actually
doing, after all. Plus it would actually make it all much more legible
when it's not just some random "true/false" argument, but a an actual

        .write_lock = 1

in the ops definition.

Yes, yes, that might mean that some ops might need duplicating in case
you really have a walk that sometimes takes the lock, and sometimes
doesn't, but that is odd to begin with.

The only such case I found from a quick look was the very strange
queue_pages_range() case. Is it really true that do_mbind() needs the
write-lock, but do_migrate_pages() does not?

And if they really are that different maybe they should have different walk_ops?

Maybe there were other cases that I didn't notice.

>                 error = walk_page_range(current->mm, start, end,
> -                               &prot_none_walk_ops, &new_pgprot);
> +                               &prot_none_walk_ops, true, &new_pgprot);

This looks odd. You're adding vma locking to a place that didn't do it before.

Yes, the mmap semaphore is held for writing, but this particular walk
doesn't need it as far as I can tell.

In fact, this feels like that walker should maybe *verify* that it's
held for writing, but not try to write it again?

Maybe the "lock_vma" flag should be a tri-state:

 - lock for reading (no-op per vma), verify that the mmap sem is held
for reading

 - lock for reading (no-op per vma), but with mmap sem held for
writing (this kind of "check before doing changes" walker)

 - lock for writing (with mmap sem obviously needs to be held for writing)

>         mmap_assert_locked(walk.mm);
> +       if (lock_vma)
> +               vma_start_write(vma);

So I think this should also be tightened up, and something like

        switch (ops->locking) {
        case WRLOCK:
                vma_start_write(vma);
                fallthrough;
        case WRLOCK_VERIFY:
                mmap_assert_write_locked(mm);
                break;
        case RDLOCK:
                mmap_assert_locked(walk.mm);
        }

because we shouldn't have a 'vma_start_write()' without holding the
mmap sem for *writing*, and the above would also allow that
mprotect_fixup() "walk to see if we can merge, verify that it was
already locked" thing.

Hmm?

NOTE! The above names are just completely made up. I dcon't think it
should actually be some "WRLOCK" enum. There are probably much better
names. Take the above as a "maybe something kind of in this direction"
rather than "do it exactly like this".

            Linus


  reply	other threads:[~2023-07-31 18:03 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-31 17:12 [PATCH 0/6] make vma locking more obvious Suren Baghdasaryan
2023-07-31 17:12 ` [PATCH 1/6] mm: enable page walking API to lock vmas during the walk Suren Baghdasaryan
2023-07-31 18:02   ` Linus Torvalds [this message]
2023-07-31 19:30     ` Suren Baghdasaryan
2023-07-31 19:33       ` Linus Torvalds
2023-07-31 20:24         ` Suren Baghdasaryan
2023-08-01 20:28           ` Suren Baghdasaryan
2023-08-01 21:34             ` Peter Xu
2023-08-01 21:46               ` Suren Baghdasaryan
2023-08-01 22:13                 ` Suren Baghdasaryan
2023-07-31 17:12 ` [PATCH 2/6] mm: for !CONFIG_PER_VMA_LOCK equate write lock assertion for vma and mmap Suren Baghdasaryan
2023-07-31 17:12 ` [PATCH 3/6] mm: replace mmap with vma write lock assertions when operating on a vma Suren Baghdasaryan
2023-07-31 17:12 ` [PATCH 4/6] mm: lock vma explicitly before doing vm_flags_reset and vm_flags_reset_once Suren Baghdasaryan
2023-07-31 17:12 ` [PATCH 5/6] mm: always lock new vma before inserting into vma tree Suren Baghdasaryan
2023-07-31 17:12 ` [PATCH 6/6] mm: move vma locking out of vma_prepare Suren Baghdasaryan
2023-07-31 20:30   ` Liam R. Howlett
2023-07-31 20:38     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHk-=wjEbJS3OhUu+2sV8Kft8GnGcsNFOhYhXYQuk5nvvqR-NQ@mail.gmail.com' \
    --to=torvalds@linux-foundation.org \
    --cc=akpm@linux-foundation.org \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=jglisse@google.com \
    --cc=ldufour@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=michel@lespinasse.org \
    --cc=peterx@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox