* [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio
@ 2024-09-04 19:41 Steve Sistare
2024-09-04 20:02 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: Steve Sistare @ 2024-09-04 19:41 UTC (permalink / raw)
To: linux-mm
Cc: Vivek Kasireddy, Muchun Song, Andrew Morton, Matthew Wilcox,
Peter Xu, David Hildenbrand, Jason Gunthorpe, Steve Sistare
The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
delete the matching folio_put in memfd_pin_folios. This also avoids
leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
fails, which would otherwise need an additional folio_put. This is a
continuation of the fix
"mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
---
mm/gup.c | 4 +---
mm/memfd.c | 2 +-
2 files changed, 2 insertions(+), 4 deletions(-)
diff --git a/mm/gup.c b/mm/gup.c
index bccabaa..947881ff 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
pgoff_t start_idx, end_idx, next_idx;
struct folio *folio = NULL;
struct folio_batch fbatch;
- struct hstate *h = NULL;
+ struct hstate *h;
long ret = -EINVAL;
if (start < 0 || start > end || !max_folios)
@@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
&fbatch);
if (folio) {
folio_put(folio);
- if (h)
- folio_put(folio);
folio = NULL;
}
diff --git a/mm/memfd.c b/mm/memfd.c
index bcb131d..f715301 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
numa_node_id(),
NULL,
gfp_mask);
- if (folio && folio_try_get(folio)) {
+ if (folio) {
err = hugetlb_add_to_page_cache(folio,
memfd->f_mapping,
idx);
--
1.8.3.1
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio
2024-09-04 19:41 [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio Steve Sistare
@ 2024-09-04 20:02 ` Matthew Wilcox
2024-09-04 20:11 ` Matthew Wilcox
2024-09-04 20:13 ` Steven Sistare
0 siblings, 2 replies; 5+ messages in thread
From: Matthew Wilcox @ 2024-09-04 20:02 UTC (permalink / raw)
To: Steve Sistare
Cc: linux-mm, Vivek Kasireddy, Muchun Song, Andrew Morton, Peter Xu,
David Hildenbrand, Jason Gunthorpe
On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
> The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
> delete the matching folio_put in memfd_pin_folios. This also avoids
> leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
> fails, which would otherwise need an additional folio_put. This is a
> continuation of the fix
> "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
I think you're right, but don't we also need to get rid of the
folio_put() call in the 'if (err)' case after calling
hugetlb_add_to_page_cache()?
> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
>
> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> ---
> mm/gup.c | 4 +---
> mm/memfd.c | 2 +-
> 2 files changed, 2 insertions(+), 4 deletions(-)
>
> diff --git a/mm/gup.c b/mm/gup.c
> index bccabaa..947881ff 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> pgoff_t start_idx, end_idx, next_idx;
> struct folio *folio = NULL;
> struct folio_batch fbatch;
> - struct hstate *h = NULL;
> + struct hstate *h;
> long ret = -EINVAL;
>
> if (start < 0 || start > end || !max_folios)
> @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> &fbatch);
> if (folio) {
> folio_put(folio);
> - if (h)
> - folio_put(folio);
> folio = NULL;
> }
>
> diff --git a/mm/memfd.c b/mm/memfd.c
> index bcb131d..f715301 100644
> --- a/mm/memfd.c
> +++ b/mm/memfd.c
> @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
> numa_node_id(),
> NULL,
> gfp_mask);
> - if (folio && folio_try_get(folio)) {
> + if (folio) {
> err = hugetlb_add_to_page_cache(folio,
> memfd->f_mapping,
> idx);
> --
> 1.8.3.1
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio
2024-09-04 20:02 ` Matthew Wilcox
@ 2024-09-04 20:11 ` Matthew Wilcox
2024-09-04 20:20 ` Steven Sistare
2024-09-04 20:13 ` Steven Sistare
1 sibling, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-09-04 20:11 UTC (permalink / raw)
To: Steve Sistare
Cc: linux-mm, Vivek Kasireddy, Muchun Song, Andrew Morton, Peter Xu,
David Hildenbrand, Jason Gunthorpe
On Wed, Sep 04, 2024 at 09:02:59PM +0100, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
> > The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
> > delete the matching folio_put in memfd_pin_folios. This also avoids
> > leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
> > fails, which would otherwise need an additional folio_put. This is a
> > continuation of the fix
> > "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
>
> I think you're right, but don't we also need to get rid of the
> folio_put() call in the 'if (err)' case after calling
> hugetlb_add_to_page_cache()?
After scratching my head about this a bit more, I was trying to preserve
the existing semantics of the code, but I think the code was always
buggy.
The correct code would be:
folio = alloc_hugetlb_folio_nodemask(...);
folio_put(folio);
The code as in tree today would trip an assertion:
VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
as alloc_hugetlb_folio_nodemask() returns a folio with a refcount 1,
folio_try_get() would increment it to 2, folio_put() would decrement
it to 1, and so we'd call free_huge_folio() with a refcount of 1.
But after your patch, the code _is_ still wrong because we'll
start with a refcount of 1, fail to add to the page cache, call
folio_put() which will decrement the refcount to 0 _and call
free_huge_folio() itself_. Then we'll call free_huge_folio()
on an already freed and possibly reallocated folio.
So every version suggested so far (current, yours, mine) is wrong,
and the right code looks like:
folio = alloc_hugetlb_folio_nodemask(...);
if (folio) {
err = hugetlb_add_to_page_cache(...);
if (err) {
folio_put(folio);
return ERR_PTR(err);
}
...
Or have I got something wrong in that analysis?
> > Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
> >
> > Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
> > Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
> > ---
> > mm/gup.c | 4 +---
> > mm/memfd.c | 2 +-
> > 2 files changed, 2 insertions(+), 4 deletions(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index bccabaa..947881ff 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> > pgoff_t start_idx, end_idx, next_idx;
> > struct folio *folio = NULL;
> > struct folio_batch fbatch;
> > - struct hstate *h = NULL;
> > + struct hstate *h;
> > long ret = -EINVAL;
> >
> > if (start < 0 || start > end || !max_folios)
> > @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
> > &fbatch);
> > if (folio) {
> > folio_put(folio);
> > - if (h)
> > - folio_put(folio);
> > folio = NULL;
> > }
> >
> > diff --git a/mm/memfd.c b/mm/memfd.c
> > index bcb131d..f715301 100644
> > --- a/mm/memfd.c
> > +++ b/mm/memfd.c
> > @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
> > numa_node_id(),
> > NULL,
> > gfp_mask);
> > - if (folio && folio_try_get(folio)) {
> > + if (folio) {
> > err = hugetlb_add_to_page_cache(folio,
> > memfd->f_mapping,
> > idx);
> > --
> > 1.8.3.1
> >
>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio
2024-09-04 20:02 ` Matthew Wilcox
2024-09-04 20:11 ` Matthew Wilcox
@ 2024-09-04 20:13 ` Steven Sistare
1 sibling, 0 replies; 5+ messages in thread
From: Steven Sistare @ 2024-09-04 20:13 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, Vivek Kasireddy, Muchun Song, Andrew Morton, Peter Xu,
David Hildenbrand, Jason Gunthorpe
On 9/4/2024 4:02 PM, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
>> The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
>> delete the matching folio_put in memfd_pin_folios. This also avoids
>> leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
>> fails, which would otherwise need an additional folio_put. This is a
>> continuation of the fix
>> "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
>
> I think you're right, but don't we also need to get rid of the
> folio_put() call in the 'if (err)' case after calling
> hugetlb_add_to_page_cache()?
That folio_put drops the ref obtained by unfreeze:
memfd_alloc_folio()
alloc_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_nodemask()
dequeue_hugetlb_folio_node_exact()
folio_ref_unfreeze(folio, 1); ; adds 1 refcount
- Steve
>> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
>>
>> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>> ---
>> mm/gup.c | 4 +---
>> mm/memfd.c | 2 +-
>> 2 files changed, 2 insertions(+), 4 deletions(-)
>>
>> diff --git a/mm/gup.c b/mm/gup.c
>> index bccabaa..947881ff 100644
>> --- a/mm/gup.c
>> +++ b/mm/gup.c
>> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>> pgoff_t start_idx, end_idx, next_idx;
>> struct folio *folio = NULL;
>> struct folio_batch fbatch;
>> - struct hstate *h = NULL;
>> + struct hstate *h;
>> long ret = -EINVAL;
>>
>> if (start < 0 || start > end || !max_folios)
>> @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>> &fbatch);
>> if (folio) {
>> folio_put(folio);
>> - if (h)
>> - folio_put(folio);
>> folio = NULL;
>> }
>>
>> diff --git a/mm/memfd.c b/mm/memfd.c
>> index bcb131d..f715301 100644
>> --- a/mm/memfd.c
>> +++ b/mm/memfd.c
>> @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
>> numa_node_id(),
>> NULL,
>> gfp_mask);
>> - if (folio && folio_try_get(folio)) {
>> + if (folio) {
>> err = hugetlb_add_to_page_cache(folio,
>> memfd->f_mapping,
>> idx);
>> --
>> 1.8.3.1
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio
2024-09-04 20:11 ` Matthew Wilcox
@ 2024-09-04 20:20 ` Steven Sistare
0 siblings, 0 replies; 5+ messages in thread
From: Steven Sistare @ 2024-09-04 20:20 UTC (permalink / raw)
To: Matthew Wilcox
Cc: linux-mm, Vivek Kasireddy, Muchun Song, Andrew Morton, Peter Xu,
David Hildenbrand, Jason Gunthorpe
On 9/4/2024 4:11 PM, Matthew Wilcox wrote:
> On Wed, Sep 04, 2024 at 09:02:59PM +0100, Matthew Wilcox wrote:
>> On Wed, Sep 04, 2024 at 12:41:08PM -0700, Steve Sistare wrote:
>>> The folio_try_get in memfd_alloc_folio is not necessary. Delete it, and
>>> delete the matching folio_put in memfd_pin_folios. This also avoids
>>> leaking a ref if the memfd_alloc_folio call to hugetlb_add_to_page_cache
>>> fails, which would otherwise need an additional folio_put. This is a
>>> continuation of the fix
>>> "mm/hugetlb: fix memfd_pin_folios free_huge_pages leak"
>>
>> I think you're right, but don't we also need to get rid of the
>> folio_put() call in the 'if (err)' case after calling
>> hugetlb_add_to_page_cache()?
>
> After scratching my head about this a bit more, I was trying to preserve
> the existing semantics of the code, but I think the code was always
> buggy.
>
> The correct code would be:
>
> folio = alloc_hugetlb_folio_nodemask(...);
> folio_put(folio);
>
> The code as in tree today would trip an assertion:
>
> VM_BUG_ON_FOLIO(folio_ref_count(folio), folio);
>
> as alloc_hugetlb_folio_nodemask() returns a folio with a refcount 1,
> folio_try_get() would increment it to 2, folio_put() would decrement
> it to 1, and so we'd call free_huge_folio() with a refcount of 1.
>
> But after your patch, the code _is_ still wrong because we'll
> start with a refcount of 1, fail to add to the page cache, call
> folio_put() which will decrement the refcount to 0 _and call
> free_huge_folio() itself_. Then we'll call free_huge_folio()
> on an already freed and possibly reallocated folio.
Indeed. The explicit call to free_huge_folio must be deleted, as you
coded below. I'll send a V2 of the patch.
- Steve
> So every version suggested so far (current, yours, mine) is wrong,
> and the right code looks like:
>
> folio = alloc_hugetlb_folio_nodemask(...);
> if (folio) {
> err = hugetlb_add_to_page_cache(...);
> if (err) {
> folio_put(folio);
> return ERR_PTR(err);
> }
> ...
>
> Or have I got something wrong in that analysis?
>
>>> Fixes: 89c1905d9c14 ("mm/gup: introduce memfd_pin_folios() for pinning memfd folios")
>>>
>>> Suggested-by: Vivek Kasireddy <vivek.kasireddy@intel.com>
>>> Signed-off-by: Steve Sistare <steven.sistare@oracle.com>
>>> ---
>>> mm/gup.c | 4 +---
>>> mm/memfd.c | 2 +-
>>> 2 files changed, 2 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/mm/gup.c b/mm/gup.c
>>> index bccabaa..947881ff 100644
>>> --- a/mm/gup.c
>>> +++ b/mm/gup.c
>>> @@ -3618,7 +3618,7 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>> pgoff_t start_idx, end_idx, next_idx;
>>> struct folio *folio = NULL;
>>> struct folio_batch fbatch;
>>> - struct hstate *h = NULL;
>>> + struct hstate *h;
>>> long ret = -EINVAL;
>>>
>>> if (start < 0 || start > end || !max_folios)
>>> @@ -3662,8 +3662,6 @@ long memfd_pin_folios(struct file *memfd, loff_t start, loff_t end,
>>> &fbatch);
>>> if (folio) {
>>> folio_put(folio);
>>> - if (h)
>>> - folio_put(folio);
>>> folio = NULL;
>>> }
>>>
>>> diff --git a/mm/memfd.c b/mm/memfd.c
>>> index bcb131d..f715301 100644
>>> --- a/mm/memfd.c
>>> +++ b/mm/memfd.c
>>> @@ -89,7 +89,7 @@ struct folio *memfd_alloc_folio(struct file *memfd, pgoff_t idx)
>>> numa_node_id(),
>>> NULL,
>>> gfp_mask);
>>> - if (folio && folio_try_get(folio)) {
>>> + if (folio) {
>>> err = hugetlb_add_to_page_cache(folio,
>>> memfd->f_mapping,
>>> idx);
>>> --
>>> 1.8.3.1
>>>
>>
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-09-04 20:21 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-04 19:41 [PATCH] mm/hugetlb: simplify refs in memfd_alloc_folio Steve Sistare
2024-09-04 20:02 ` Matthew Wilcox
2024-09-04 20:11 ` Matthew Wilcox
2024-09-04 20:20 ` Steven Sistare
2024-09-04 20:13 ` Steven Sistare
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox