linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Xu <peterx@redhat.com>
To: David Hildenbrand <david@redhat.com>
Cc: Oscar Salvador <osalvador@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Muchun Song <muchun.song@linux.dev>,
	James Houghton <jthoughton@google.com>,
	Gavin Guo <gavinguo@igalia.com>,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault and hugetlb_wp
Date: Tue, 3 Jun 2025 11:46:19 -0400	[thread overview]
Message-ID: <aD8Yy0xfJdvLKp4X@x1.local> (raw)
In-Reply-To: <fd10b2b3-064c-4ee2-ad7d-e30a0e194533@redhat.com>

On Tue, Jun 03, 2025 at 05:08:55PM +0200, David Hildenbrand wrote:
> On 03.06.25 16:57, Peter Xu wrote:
> > On Tue, Jun 03, 2025 at 03:50:54PM +0200, Oscar Salvador wrote:
> > > On Mon, Jun 02, 2025 at 05:30:19PM -0400, Peter Xu wrote:
> > > > Right, and thanks for the git digging as usual.  I would agree hugetlb is
> > > > more challenge than many other modules on git archaeology. :)
> > > > 
> > > > Even if I mentioned the invalidate_lock, I don't think I thought deeper
> > > > than that. I just wished whenever possible we still move hugetlb code
> > > > closer to generic code, so if that's the goal we may still want to one day
> > > > have a closer look at whether hugetlb can also use invalidate_lock.  Maybe
> > > > it isn't worthwhile at last: invalidate_lock is currently a rwsem, which
> > > > normally at least allows concurrent fault, but that's currently what isn't
> > > > allowed in hugetlb anyway..
> > > > 
> > > > If we start to remove finer grained locks that work will be even harder,
> > > > and removing folio lock in this case in fault path also brings hugetlbfs
> > > > even further from other file systems.  That might be slightly against what
> > > > we used to wish to do, which is to make it closer to others.  Meanwhile I'm
> > > > also not yet sure the benefit of not taking folio lock all across, e.g. I
> > > > don't expect perf would change at all even if lock is avoided.  We may want
> > > > to think about that too when doing so.
> > > 
> > > Ok, I have to confess I was not looking things from this perspective,
> > > but when doing so, yes, you are right, we should strive to find
> > > replacements wherever we can for not using hugetlb-specific code.
> > > 
> > > I do not know about this case though, not sure what other options do we
> > > have when trying to shut concurrent faults while doing other operation.
> > > But it is something we should definitely look at.
> > > 
> > > Wrt. to the lock.
> > > There were two locks, old_folio (taken in hugetlb_fault) and
> > > pagecache_folio one.
> > 
> > There're actually three places this patch touched, the 3rd one is
> > hugetlb_no_page(), in which case I also think we should lock it, not only
> > because file folios normally does it (see do_fault(), for example), but
> > also that's exactly what James mentioned I believe on possible race of
> > !uptodate hugetlb folio being injected by UFFDIO_CONTINUE, along the lines:
> > 
> > 		folio = alloc_hugetlb_folio(vma, vmf->address, false);
> >                  ...
> > 		folio_zero_user(folio, vmf->real_address);
> > 		__folio_mark_uptodate(folio);
> > 
> > > The thing was not about worry as how much perf we leave on the table
> > > because of these locks, as I am pretty sure is next to 0, but my drive
> > > was to understand what are protection and why, because as the discussion
> > > showed, none of us really had a good idea about it and it turns out that this
> > > goes back more than ~20 years ago.
> > > 
> > > Another topic for the lock (old_folio, so the one we copy from),
> > > when we compare it to generic code, we do not take the lock there.
> > > Looking at do_wp_page(), we do __get__ a reference on the folio we copy
> > > from, but not the lock, so AFAIU, the lock seems only to please
> > 
> > Yes this is a good point; for CoW path alone maybe we don't need to lock
> > old_folio.
> > 
> > > folio_move_anon_rmap() from hugetlb_wp.
> > > 
> > > Taking a look at do_wp_page()->wp_can_reuse_anon_folio() which also
> > > calls folio_move_anon_rmap() in case we can re-use the folio, it only
> > > takes the lock before the call to folio_move_anon_rmap(), and then
> > > unlocks it.
> > 
> > IMHO, do_wp_page() took the folio lock not for folio_move_anon_rmap(), but
> > for checking swapcache/ksm stuff which needs to be serialized with folio
> > lock.
> > 
> > So I'm not 100% confident on the folio_move_anon_rmap(), but I _think_ it
> > deserves a data_race() and IIUC it only work not because of the folio lock,
> > but because of how anon_vma is managed as a tree as of now, so that as long
> > as WRITE_ONCE() even a race is benign (because the rmap walker will either
> > see a complete old anon_vma that includes the parent process's anon_vma, or
> > the child's).  What really protects the anon_vma should really be anon_vma
> > lock.. That can definitely be a separate topic.  I'm not sure whether you'd
> > like to dig this part out, but if you do I'd also be more than happy to
> > know whether my understanding needs correction here.. :)
> > 
> > In general, I still agree with you that if hugetlb CoW path can look closer
> > to do_wp_page then it's great.
> 
> 
> As stated elsewhere, the mapcount check + folio_move_anon_rmap need the
> folio lock.

Could you elaborate what would go wrong if we do folio_move_anon_rmap()
without folio lock here?  Just to make sure we're on the same page: we
already have pgtable lock held, and we decided to reuse an anonymous
hugetlb page.

Thanks,

-- 
Peter Xu



  reply	other threads:[~2025-06-03 15:46 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-02 14:16 [RFC PATCH 0/3] Clean up locking in hugetlb faulting code Oscar Salvador
2025-06-02 14:16 ` [RFC PATCH 1/3] mm, hugetlb: Clean up locking in hugetlb_fault and hugetlb_wp Oscar Salvador
2025-06-02 15:14   ` Peter Xu
2025-06-02 20:47     ` Oscar Salvador
2025-06-02 21:30       ` Peter Xu
2025-06-03 13:50         ` Oscar Salvador
2025-06-03 14:57           ` Peter Xu
2025-06-03 15:08             ` David Hildenbrand
2025-06-03 15:46               ` Peter Xu [this message]
2025-06-03 17:19                 ` David Hildenbrand
2025-06-03 19:11                   ` Peter Xu
2025-06-03 18:31             ` Peter Xu
2025-06-10 14:13               ` Oscar Salvador
2025-06-10 15:57                 ` Peter Xu
2025-06-03 15:12           ` David Hildenbrand
2025-06-02 14:16 ` [RFC PATCH 2/3] mm, hugetlb: Update comments in hugetlb_fault Oscar Salvador
2025-06-02 14:16 ` [RFC PATCH 3/3] mm, hugetlb: Drop unlikelys from hugetlb_fault Oscar Salvador
2025-06-16  3:21 ` [RFC PATCH 0/3] Clean up locking in hugetlb faulting code Gavin Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aD8Yy0xfJdvLKp4X@x1.local \
    --to=peterx@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=gavinguo@igalia.com \
    --cc=jthoughton@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox