Re: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Steven Sistare <steven.sistare@oracle.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: linux-mm@kvack.org, Vivek Kasireddy <vivek.kasireddy@intel.com>,
	Muchun Song <muchun.song@linux.dev>,
	Andrew Morton <akpm@linux-foundation.org>,
	Peter Xu <peterx@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>
Subject: Re: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio
Date: Wed, 4 Sep 2024 16:20:50 -0400	[thread overview]
Message-ID: <af4466b0-e9ff-4fab-9cc7-5c3c08f34cab@oracle.com> (raw)
In-Reply-To: <Zti-7nPVMcGgpcbi@casper.infradead.org>

On 9/4/2024 4:11 PM, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 09:02:59PM +0100, Matthew Wilcox wrote:
>> On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
>>> The folio_try_get in memfd_alloc_folio is not necessary.  Delete it, and
>>> delete the matching folio_put in memfd_pin_folios.  This also avoids
>>> leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
>>> fails, which would otherwise need an additional folio_put.  This is a
>>> continuation of the fix
>>>    "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
>>
>> I think you're right, but don't we also need to get rid of the
>> folio_put() call in the 'if (err)' case after calling
>> hugetlb_add_to_page_cache()?
> 
> After scratching my head about this a bit more, I was trying to preserve
> the existing semantics of the code, but I think the code was always
> buggy.
> 
> The correct code would be:
> 
> 	folio = alloc_hugetlb_folio_nodemask(...);
> 	folio_put(folio);
> 
> The code as in tree today would trip an assertion:
> 
>          VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
> 
> as alloc_hugetlb_folio_nodemask() returns a folio with a refcount 1,
> folio_try_get() would increment it to 2, folio_put() would decrement
> it to 1, and so we'd call free_huge_folio() with a refcount of 1.
> 
> But after your patch, the code _is_ still wrong because we'll
> start with a refcount of 1, fail to add to the page cache, call
> folio_put() which will decrement the refcount to 0 _and call
> free_huge_folio() itself_.  Then we'll call free_huge_folio()
> on an already freed and possibly reallocated folio.

Indeed.  The explicit call to free_huge_folio must be deleted, as you
coded below.  I'll send a V2 of the patch.

- Steve

> So every version suggested so far (current, yours, mine) is wrong,
> and the right code looks like:
> 
>                  folio = alloc_hugetlb_folio_nodemask(...);
> 		if (folio) {
> 			err = hugetlb_add_to_page_cache(...);
> 			if (err) {
> 				folio_put(folio);
> 				return ERR_PTR(err);
> 			}
> ...
> 
> Or have I got something wrong in that analysis?
> 
>>> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
>>>
>>> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>>   mm/gup.c   | 4 +---
>>>   mm/memfd.c | 2 +-
>>>   2 files changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index bccabaa..947881ff 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>   	pgoff_t start_idx, end_idx, next_idx;
>>>   	struct folio *folio = NULL;
>>>   	struct folio_batch fbatch;
>>> -	struct hstate *h = NULL;
>>> +	struct hstate *h;
>>>   	long ret = -EINVAL;
>>>   
>>>   	if (start < 0 || start > end || !max_folios)
>>> @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>>   							     &fbatch);
>>>   			if (folio) {
>>>   				folio_put(folio);
>>> -				if (h)
>>> -					folio_put(folio);
>>>   				folio = NULL;
>>>   			}
>>>   
>>> diff --git a/mm/memfd.c b/mm/memfd.c
>>> index bcb131d..f715301 100644
>>> --- a/mm/memfd.c
>>> +++ b/mm/memfd.c
>>> @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
>>>   						    numa_node_id(),
>>>   						    NULL,
>>>   						    gfp_mask);
>>> -		if (folio && folio_try_get(folio)) {
>>> +		if (folio) {
>>>   			err = hugetlb_add_to_page_cache(folio,
>>>   							memfd->f_mapping,
>>>   							idx);
>>> -- 
>>> 1.8.3.1
>>>
>>

next prev parent reply	other threads:[~2024-09-04 20:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-04 19:41 Steve Sistare
2024-09-04 20:02 ` Matthew Wilcox
2024-09-04 20:11   ` Matthew Wilcox
2024-09-04 20:20     ` Steven Sistare [this message]
2024-09-04 20:13   ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af4466b0-e9ff-4fab-9cc7-5c3c08f34cab@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=jgg@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=peterx@redhat.com \
    --cc=vivek.kasireddy@intel.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox