linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/codetag: sub in advance when free non-compound high order pages
@ 2025-05-04  6:19 David Wang
  2025-05-05 13:12 ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: David Wang @ 2025-05-04  6:19 UTC (permalink / raw)
  To: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy
  Cc: linux-mm, linux-kernel, David Wang

When page is non-compound, page[0] could be released by other
thread right after put_page_testzero failed in current thread,
pgalloc_tag_sub_pages afterwards would manipulate an invalid
page for accounting remaining pages:

[timeline]   [thread1]                     [thread2]
  |          alloc_page non-compound
  V
  |                                        get_page, rf counter inc
  V
  |          in ___free_pages
  |          put_page_testzero fails
  V
  |                                        put_page, page released
  V
  |          in ___free_pages,
  |          pgalloc_tag_sub_pages
  |          manipulate an invalid page
  V
  V

Move the tag page accounting ahead, and only account remaining pages
for non-compound pages with non-zero order.

Signed-off-by: David Wang <00107082@163.com>
---
 mm/page_alloc.c | 36 +++++++++++++++++++++++++++++++++---
 1 file changed, 33 insertions(+), 3 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5669baf2a6fe..c42e41ed35fe 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1163,12 +1163,25 @@ static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
 		this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
 }
 
+static inline void pgalloc_tag_add_pages(struct page *page, unsigned int nr)
+{
+	struct alloc_tag *tag;
+
+	if (!mem_alloc_profiling_enabled())
+		return;
+
+	tag = __pgalloc_tag_get(page);
+	if (tag)
+		this_cpu_add(tag->counters->bytes, PAGE_SIZE * nr);
+}
+
 #else /* CONFIG_MEM_ALLOC_PROFILING */
 
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 				   unsigned int nr) {}
 static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
 static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr) {}
+static inline void pgalloc_tag_add_pages(struct page *page, unsigned int nr) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
@@ -5065,11 +5078,28 @@ static void ___free_pages(struct page *page, unsigned int order,
 {
 	/* get PageHead before we drop reference */
 	int head = PageHead(page);
+	/*
+	 * For remaining pages other than the first page of
+	 * a non-compound allocation, we decrease its tag
+	 * pages in advance, in case the first page is released
+	 * by other thread inbetween our put_page_testzero and any
+	 * accounting behavior afterwards.
+	 */
+	unsigned int remaining_tag_pages = 0;
 
-	if (put_page_testzero(page))
+	if (order > 0 && !head) {
+		if (unlikely(page_ref_count(page) > 1)) {
+			remaining_tag_pages = (1 << order) - 1;
+			pgalloc_tag_sub_pages(page, remaining_tag_pages);
+		}
+	}
+
+	if (put_page_testzero(page)) {
+		/* no need special treat for remaining pages, add it back. */
+		if (unlikely(remaining_tag_pages > 0))
+			pgalloc_tag_add_pages(page, remaining_tag_pages);
 		__free_frozen_pages(page, order, fpi_flags);
-	else if (!head) {
-		pgalloc_tag_sub_pages(page, (1 << order) - 1);
+	} else if (!head) {
 		while (order-- > 0)
 			__free_frozen_pages(page + (1 << order), order,
 					    fpi_flags);
-- 
2.39.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/codetag: sub in advance when free non-compound high order pages
  2025-05-04  6:19 [PATCH] mm/codetag: sub in advance when free non-compound high order pages David Wang
@ 2025-05-05 13:12 ` Vlastimil Babka
  2025-05-05 14:31   ` David Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2025-05-05 13:12 UTC (permalink / raw)
  To: David Wang, akpm, surenb, mhocko, jackmanb, hannes, ziy
  Cc: linux-mm, linux-kernel, Shakeel Butt

On 5/4/25 08:19, David Wang wrote:
> When page is non-compound, page[0] could be released by other
> thread right after put_page_testzero failed in current thread,
> pgalloc_tag_sub_pages afterwards would manipulate an invalid
> page for accounting remaining pages:
> 
> [timeline]   [thread1]                     [thread2]
>   |          alloc_page non-compound
>   V
>   |                                        get_page, rf counter inc
>   V
>   |          in ___free_pages
>   |          put_page_testzero fails
>   V
>   |                                        put_page, page released
>   V
>   |          in ___free_pages,
>   |          pgalloc_tag_sub_pages
>   |          manipulate an invalid page
>   V
>   V
> 
> Move the tag page accounting ahead, and only account remaining pages
> for non-compound pages with non-zero order.
> 
> Signed-off-by: David Wang <00107082@163.com>

Hmm, I think the problem was introduced by 51ff4d7486f0 ("mm: avoid extra
mem_alloc_profiling_enabled() checks"). Previously we'd get the tag pointer
upfront and avoid the page use-after-free.

It would likely be nicer to fix it by going back to that approach for
___free_pages(), while hopefully keeping the optimisations of 51ff4d7486f0
for the other call sites where it applies?

> ---
>  mm/page_alloc.c | 36 +++++++++++++++++++++++++++++++++---
>  1 file changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5669baf2a6fe..c42e41ed35fe 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1163,12 +1163,25 @@ static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
>  		this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
>  }
>  
> +static inline void pgalloc_tag_add_pages(struct page *page, unsigned int nr)
> +{
> +	struct alloc_tag *tag;
> +
> +	if (!mem_alloc_profiling_enabled())
> +		return;
> +
> +	tag = __pgalloc_tag_get(page);
> +	if (tag)
> +		this_cpu_add(tag->counters->bytes, PAGE_SIZE * nr);
> +}
> +
>  #else /* CONFIG_MEM_ALLOC_PROFILING */
>  
>  static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
>  				   unsigned int nr) {}
>  static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
>  static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr) {}
> +static inline void pgalloc_tag_add_pages(struct page *page, unsigned int nr) {}
>  
>  #endif /* CONFIG_MEM_ALLOC_PROFILING */
>  
> @@ -5065,11 +5078,28 @@ static void ___free_pages(struct page *page, unsigned int order,
>  {
>  	/* get PageHead before we drop reference */
>  	int head = PageHead(page);
> +	/*
> +	 * For remaining pages other than the first page of
> +	 * a non-compound allocation, we decrease its tag
> +	 * pages in advance, in case the first page is released
> +	 * by other thread inbetween our put_page_testzero and any
> +	 * accounting behavior afterwards.
> +	 */
> +	unsigned int remaining_tag_pages = 0;
>  
> -	if (put_page_testzero(page))
> +	if (order > 0 && !head) {
> +		if (unlikely(page_ref_count(page) > 1)) {
> +			remaining_tag_pages = (1 << order) - 1;
> +			pgalloc_tag_sub_pages(page, remaining_tag_pages);
> +		}
> +	}
> +
> +	if (put_page_testzero(page)) {
> +		/* no need special treat for remaining pages, add it back. */
> +		if (unlikely(remaining_tag_pages > 0))
> +			pgalloc_tag_add_pages(page, remaining_tag_pages);
>  		__free_frozen_pages(page, order, fpi_flags);
> -	else if (!head) {
> -		pgalloc_tag_sub_pages(page, (1 << order) - 1);
> +	} else if (!head) {
>  		while (order-- > 0)
>  			__free_frozen_pages(page + (1 << order), order,
>  					    fpi_flags);



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/codetag: sub in advance when free non-compound high order pages
  2025-05-05 13:12 ` Vlastimil Babka
@ 2025-05-05 14:31   ` David Wang
  2025-05-05 14:55     ` Vlastimil Babka
  0 siblings, 1 reply; 12+ messages in thread
From: David Wang @ 2025-05-05 14:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, linux-mm,
	linux-kernel, Shakeel Butt



At 2025-05-05 21:12:55, "Vlastimil Babka" <vbabka@suse.cz> wrote:
>On 5/4/25 08:19, David Wang wrote:
>> When page is non-compound, page[0] could be released by other
>> thread right after put_page_testzero failed in current thread,
>> pgalloc_tag_sub_pages afterwards would manipulate an invalid
>> page for accounting remaining pages:
>> 
>> [timeline]   [thread1]                     [thread2]
>>   |          alloc_page non-compound
>>   V
>>   |                                        get_page, rf counter inc
>>   V
>>   |          in ___free_pages
>>   |          put_page_testzero fails
>>   V
>>   |                                        put_page, page released
>>   V
>>   |          in ___free_pages,
>>   |          pgalloc_tag_sub_pages
>>   |          manipulate an invalid page
>>   V
>>   V
>> 
>> Move the tag page accounting ahead, and only account remaining pages
>> for non-compound pages with non-zero order.
>> 
>> Signed-off-by: David Wang <00107082@163.com>
>
>Hmm, I think the problem was introduced by 51ff4d7486f0 ("mm: avoid extra
>mem_alloc_profiling_enabled() checks"). Previously we'd get the tag pointer
>upfront and avoid the page use-after-free.


Oh, you're right. I forgot to check history......


>
>It would likely be nicer to fix it by going back to that approach for
>___free_pages(), while hopefully keeping the optimisations of 51ff4d7486f0
>for the other call sites where it applies?

After checking that commit, I kind of feels the changes in __free_pages are
 the major optimization of the commit....
What about revert that commit and make optimization by condition checks,
similar to what this patch did?
 


David


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/codetag: sub in advance when free non-compound high order pages
  2025-05-05 14:31   ` David Wang
@ 2025-05-05 14:55     ` Vlastimil Babka
  2025-05-05 15:33       ` Suren Baghdasaryan
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2025-05-05 14:55 UTC (permalink / raw)
  To: David Wang
  Cc: akpm, surenb, mhocko, jackmanb, hannes, ziy, linux-mm,
	linux-kernel, Shakeel Butt

On 5/5/25 16:31, David Wang wrote:
> 
> 
> At 2025-05-05 21:12:55, "Vlastimil Babka" <vbabka@suse.cz> wrote:
>>On 5/4/25 08:19, David Wang wrote:
>>> When page is non-compound, page[0] could be released by other
>>> thread right after put_page_testzero failed in current thread,
>>> pgalloc_tag_sub_pages afterwards would manipulate an invalid
>>> page for accounting remaining pages:
>>> 
>>> [timeline]   [thread1]                     [thread2]
>>>   |          alloc_page non-compound
>>>   V
>>>   |                                        get_page, rf counter inc
>>>   V
>>>   |          in ___free_pages
>>>   |          put_page_testzero fails
>>>   V
>>>   |                                        put_page, page released
>>>   V
>>>   |          in ___free_pages,
>>>   |          pgalloc_tag_sub_pages
>>>   |          manipulate an invalid page
>>>   V
>>>   V
>>> 
>>> Move the tag page accounting ahead, and only account remaining pages
>>> for non-compound pages with non-zero order.
>>> 
>>> Signed-off-by: David Wang <00107082@163.com>
>>
>>Hmm, I think the problem was introduced by 51ff4d7486f0 ("mm: avoid extra
>>mem_alloc_profiling_enabled() checks"). Previously we'd get the tag pointer
>>upfront and avoid the page use-after-free.
> 
> 
> Oh, you're right. I forgot to check history......
> 
> 
>>
>>It would likely be nicer to fix it by going back to that approach for
>>___free_pages(), while hopefully keeping the optimisations of 51ff4d7486f0
>>for the other call sites where it applies?
> 
> After checking that commit, I kind of feels the changes in __free_pages are
>  the major optimization of the commit....

We could have both pgalloc_tag_get() to use in __free_page() as before
51ff4d7486f0, and keep __pgalloc_tag_get() to use in pgalloc_tag_split() and
pgalloc_tag_swap().

I think __free_page() didn't benefit from the stated purpose of "avoiding
mem_alloc_profiling_enabled() ... which is often called after that check was
already done"

> What about revert that commit and make optimization by condition checks,
> similar to what this patch did?

The downside of the condition checks is they make the code more complex and
might actually increase overhead when mem_alloc_profiling_enabled() is
false, as those checks add non-static branches outside of the static branch
that's mem_alloc_profiling_enabled().

I think __free_pages() before 51ff4d7486f0 was quite ok.

- pgalloc_tag_get() is done unconditionally, but its code is all inside the
mem_alloc_profiling_enabled() static branch so that's a no-op when profiling
is not enabled

- pgalloc_tag_sub_pages() is also all behind the static branch inside. Also
it's a very rare path anyway, most freeing should go through the
put_page_testzero() being true.

> David
> 



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/codetag: sub in advance when free non-compound high order pages
  2025-05-05 14:55     ` Vlastimil Babka
@ 2025-05-05 15:33       ` Suren Baghdasaryan
  2025-05-05 16:42         ` David Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Suren Baghdasaryan @ 2025-05-05 15:33 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: David Wang, akpm, mhocko, jackmanb, hannes, ziy, linux-mm,
	linux-kernel, Shakeel Butt

On Mon, May 5, 2025 at 7:55 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 5/5/25 16:31, David Wang wrote:
> >
> >
> > At 2025-05-05 21:12:55, "Vlastimil Babka" <vbabka@suse.cz> wrote:
> >>On 5/4/25 08:19, David Wang wrote:
> >>> When page is non-compound, page[0] could be released by other
> >>> thread right after put_page_testzero failed in current thread,
> >>> pgalloc_tag_sub_pages afterwards would manipulate an invalid
> >>> page for accounting remaining pages:
> >>>
> >>> [timeline]   [thread1]                     [thread2]
> >>>   |          alloc_page non-compound
> >>>   V
> >>>   |                                        get_page, rf counter inc
> >>>   V
> >>>   |          in ___free_pages
> >>>   |          put_page_testzero fails
> >>>   V
> >>>   |                                        put_page, page released
> >>>   V
> >>>   |          in ___free_pages,
> >>>   |          pgalloc_tag_sub_pages
> >>>   |          manipulate an invalid page
> >>>   V
> >>>   V
> >>>
> >>> Move the tag page accounting ahead, and only account remaining pages
> >>> for non-compound pages with non-zero order.
> >>>
> >>> Signed-off-by: David Wang <00107082@163.com>

Thanks for reporting!

> >>
> >>Hmm, I think the problem was introduced by 51ff4d7486f0 ("mm: avoid extra
> >>mem_alloc_profiling_enabled() checks"). Previously we'd get the tag pointer
> >>upfront and avoid the page use-after-free.

Right, sorry I missed that.

> >
> >
> > Oh, you're right. I forgot to check history......
> >
> >
> >>
> >>It would likely be nicer to fix it by going back to that approach for
> >>___free_pages(), while hopefully keeping the optimisations of 51ff4d7486f0
> >>for the other call sites where it applies?
> >
> > After checking that commit, I kind of feels the changes in __free_pages are
> >  the major optimization of the commit....
>
> We could have both pgalloc_tag_get() to use in __free_page() as before
> 51ff4d7486f0, and keep __pgalloc_tag_get() to use in pgalloc_tag_split() and
> pgalloc_tag_swap().

Yes, we can add back pgalloc_tag_get() which would call
__pgalloc_tag_get() if mem_alloc_profiling_enabled() is true and
change pgalloc_tag_sub_pages() back to use tags instead of pages.
__free_pages() is the only user of that function, so that change
should not affect anything else.

>
> I think __free_page() didn't benefit from the stated purpose of "avoiding
> mem_alloc_profiling_enabled() ... which is often called after that check was
> already done"
>
> > What about revert that commit and make optimization by condition checks,
> > similar to what this patch did?
>
> The downside of the condition checks is they make the code more complex and
> might actually increase overhead when mem_alloc_profiling_enabled() is
> false, as those checks add non-static branches outside of the static branch
> that's mem_alloc_profiling_enabled().
>
> I think __free_pages() before 51ff4d7486f0 was quite ok.
>
> - pgalloc_tag_get() is done unconditionally, but its code is all inside the
> mem_alloc_profiling_enabled() static branch so that's a no-op when profiling
> is not enabled
>
> - pgalloc_tag_sub_pages() is also all behind the static branch inside. Also
> it's a very rare path anyway, most freeing should go through the
> put_page_testzero() being true.

Yeah, the main goal of that change in __free_page() was to make
__pgalloc_tag_get() a local function for alloc_tags and limiting the
direct use of struct alloc_tag in the core mm code. Obviously I
screwed up forgetting why we had to store the tag in the first place.
An additional comment in __free_page() is probably a good idea to
avoid confusion in the future.
Thanks,
Suren.

>
> > David
> >
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/codetag: sub in advance when free non-compound high order pages
  2025-05-05 15:33       ` Suren Baghdasaryan
@ 2025-05-05 16:42         ` David Wang
  2025-05-05 16:53           ` Suren Baghdasaryan
  0 siblings, 1 reply; 12+ messages in thread
From: David Wang @ 2025-05-05 16:42 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Vlastimil Babka, akpm, mhocko, jackmanb, hannes, ziy, linux-mm,
	linux-kernel, Shakeel Butt




At 2025-05-05 23:33:50, "Suren Baghdasaryan" <surenb@google.com> wrote:
>On Mon, May 5, 2025 at 7:55 AM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 5/5/25 16:31, David Wang wrote:
>> >
>> >
>> > At 2025-05-05 21:12:55, "Vlastimil Babka" <vbabka@suse.cz> wrote:
>> >>On 5/4/25 08:19, David Wang wrote:
>> >>> When page is non-compound, page[0] could be released by other
>> >>> thread right after put_page_testzero failed in current thread,
>> >>> pgalloc_tag_sub_pages afterwards would manipulate an invalid
>> >>> page for accounting remaining pages:
>> >>>
>> >>> [timeline]   [thread1]                     [thread2]
>> >>>   |          alloc_page non-compound
>> >>>   V
>> >>>   |                                        get_page, rf counter inc
>> >>>   V
>> >>>   |          in ___free_pages
>> >>>   |          put_page_testzero fails
>> >>>   V
>> >>>   |                                        put_page, page released
>> >>>   V
>> >>>   |          in ___free_pages,
>> >>>   |          pgalloc_tag_sub_pages
>> >>>   |          manipulate an invalid page
>> >>>   V
>> >>>   V
>> >>>
>> >>> Move the tag page accounting ahead, and only account remaining pages
>> >>> for non-compound pages with non-zero order.
>> >>>
>> >>> Signed-off-by: David Wang <00107082@163.com>
>
>Thanks for reporting!
>
>> >>
>> >>Hmm, I think the problem was introduced by 51ff4d7486f0 ("mm: avoid extra
>> >>mem_alloc_profiling_enabled() checks"). Previously we'd get the tag pointer
>> >>upfront and avoid the page use-after-free.
>
>Right, sorry I missed that.
>
>> >
>> >
>> > Oh, you're right. I forgot to check history......
>> >
>> >
>> >>
>> >>It would likely be nicer to fix it by going back to that approach for
>> >>___free_pages(), while hopefully keeping the optimisations of 51ff4d7486f0
>> >>for the other call sites where it applies?
>> >
>> > After checking that commit, I kind of feels the changes in __free_pages are
>> >  the major optimization of the commit....
>>
>> We could have both pgalloc_tag_get() to use in __free_page() as before
>> 51ff4d7486f0, and keep __pgalloc_tag_get() to use in pgalloc_tag_split() and
>> pgalloc_tag_swap().
>
>Yes, we can add back pgalloc_tag_get() which would call
>__pgalloc_tag_get() if mem_alloc_profiling_enabled() is true and
>change pgalloc_tag_sub_pages() back to use tags instead of pages.
>__free_pages() is the only user of that function, so that change
>should not affect anything else.


Adding back pgalloc_tag_get() seems just reverting 51ff4d7486f0.....
Do you want me to do it or you take over and make further adjustments?



>
>>
>> I think __free_page() didn't benefit from the stated purpose of "avoiding
>> mem_alloc_profiling_enabled() ... which is often called after that check was
>> already done"
>>
>> > What about revert that commit and make optimization by condition checks,
>> > similar to what this patch did?
>>
>> The downside of the condition checks is they make the code more complex and
>> might actually increase overhead when mem_alloc_profiling_enabled() is
>> false, as those checks add non-static branches outside of the static branch
>> that's mem_alloc_profiling_enabled().
>>
>> I think __free_pages() before 51ff4d7486f0 was quite ok.
>>
>> - pgalloc_tag_get() is done unconditionally, but its code is all inside the
>> mem_alloc_profiling_enabled() static branch so that's a no-op when profiling
>> is not enabled
>>
>> - pgalloc_tag_sub_pages() is also all behind the static branch inside. Also
>> it's a very rare path anyway, most freeing should go through the
>> put_page_testzero() being true.
>
>Yeah, the main goal of that change in __free_page() was to make
>__pgalloc_tag_get() a local function for alloc_tags and limiting the
>direct use of struct alloc_tag in the core mm code. Obviously I
>screwed up forgetting why we had to store the tag in the first place.
>An additional comment in __free_page() is probably a good idea to
>avoid confusion in the future.
>Thanks,
>Suren.
>
>>
>> > David
>> >
>>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] mm/codetag: sub in advance when free non-compound high order pages
  2025-05-05 16:42         ` David Wang
@ 2025-05-05 16:53           ` Suren Baghdasaryan
  2025-05-05 18:34             ` [PATCH v2] mm/codetag: move tag retrieval back upfront in __free_pages() David Wang
  2025-05-05 19:30             ` [PATCH v3] " David Wang
  0 siblings, 2 replies; 12+ messages in thread
From: Suren Baghdasaryan @ 2025-05-05 16:53 UTC (permalink / raw)
  To: David Wang
  Cc: Vlastimil Babka, akpm, mhocko, jackmanb, hannes, ziy, linux-mm,
	linux-kernel, Shakeel Butt

On Mon, May 5, 2025 at 9:43 AM David Wang <00107082@163.com> wrote:
>
>
>
>
> At 2025-05-05 23:33:50, "Suren Baghdasaryan" <surenb@google.com> wrote:
> >On Mon, May 5, 2025 at 7:55 AM Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> On 5/5/25 16:31, David Wang wrote:
> >> >
> >> >
> >> > At 2025-05-05 21:12:55, "Vlastimil Babka" <vbabka@suse.cz> wrote:
> >> >>On 5/4/25 08:19, David Wang wrote:
> >> >>> When page is non-compound, page[0] could be released by other
> >> >>> thread right after put_page_testzero failed in current thread,
> >> >>> pgalloc_tag_sub_pages afterwards would manipulate an invalid
> >> >>> page for accounting remaining pages:
> >> >>>
> >> >>> [timeline]   [thread1]                     [thread2]
> >> >>>   |          alloc_page non-compound
> >> >>>   V
> >> >>>   |                                        get_page, rf counter inc
> >> >>>   V
> >> >>>   |          in ___free_pages
> >> >>>   |          put_page_testzero fails
> >> >>>   V
> >> >>>   |                                        put_page, page released
> >> >>>   V
> >> >>>   |          in ___free_pages,
> >> >>>   |          pgalloc_tag_sub_pages
> >> >>>   |          manipulate an invalid page
> >> >>>   V
> >> >>>   V
> >> >>>
> >> >>> Move the tag page accounting ahead, and only account remaining pages
> >> >>> for non-compound pages with non-zero order.
> >> >>>
> >> >>> Signed-off-by: David Wang <00107082@163.com>
> >
> >Thanks for reporting!
> >
> >> >>
> >> >>Hmm, I think the problem was introduced by 51ff4d7486f0 ("mm: avoid extra
> >> >>mem_alloc_profiling_enabled() checks"). Previously we'd get the tag pointer
> >> >>upfront and avoid the page use-after-free.
> >
> >Right, sorry I missed that.
> >
> >> >
> >> >
> >> > Oh, you're right. I forgot to check history......
> >> >
> >> >
> >> >>
> >> >>It would likely be nicer to fix it by going back to that approach for
> >> >>___free_pages(), while hopefully keeping the optimisations of 51ff4d7486f0
> >> >>for the other call sites where it applies?
> >> >
> >> > After checking that commit, I kind of feels the changes in __free_pages are
> >> >  the major optimization of the commit....
> >>
> >> We could have both pgalloc_tag_get() to use in __free_page() as before
> >> 51ff4d7486f0, and keep __pgalloc_tag_get() to use in pgalloc_tag_split() and
> >> pgalloc_tag_swap().
> >
> >Yes, we can add back pgalloc_tag_get() which would call
> >__pgalloc_tag_get() if mem_alloc_profiling_enabled() is true and
> >change pgalloc_tag_sub_pages() back to use tags instead of pages.
> >__free_pages() is the only user of that function, so that change
> >should not affect anything else.
>
>
> Adding back pgalloc_tag_get() seems just reverting 51ff4d7486f0.....

Not quite. pgalloc_tag_split() and pgalloc_tag_swap() will still be
using __pgalloc_tag_get(), so would avoid the extra checks.

> Do you want me to do it or you take over and make further adjustments?

The change I suggested should be simple, so take a stab at it and I'll
review and ack. If you prefer me to make the change, let me know.

>
>
>
> >
> >>
> >> I think __free_page() didn't benefit from the stated purpose of "avoiding
> >> mem_alloc_profiling_enabled() ... which is often called after that check was
> >> already done"
> >>
> >> > What about revert that commit and make optimization by condition checks,
> >> > similar to what this patch did?
> >>
> >> The downside of the condition checks is they make the code more complex and
> >> might actually increase overhead when mem_alloc_profiling_enabled() is
> >> false, as those checks add non-static branches outside of the static branch
> >> that's mem_alloc_profiling_enabled().
> >>
> >> I think __free_pages() before 51ff4d7486f0 was quite ok.
> >>
> >> - pgalloc_tag_get() is done unconditionally, but its code is all inside the
> >> mem_alloc_profiling_enabled() static branch so that's a no-op when profiling
> >> is not enabled
> >>
> >> - pgalloc_tag_sub_pages() is also all behind the static branch inside. Also
> >> it's a very rare path anyway, most freeing should go through the
> >> put_page_testzero() being true.
> >
> >Yeah, the main goal of that change in __free_page() was to make
> >__pgalloc_tag_get() a local function for alloc_tags and limiting the
> >direct use of struct alloc_tag in the core mm code. Obviously I
> >screwed up forgetting why we had to store the tag in the first place.
> >An additional comment in __free_page() is probably a good idea to
> >avoid confusion in the future.
> >Thanks,
> >Suren.
> >
> >>
> >> > David
> >> >
> >>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2] mm/codetag: move tag retrieval back upfront in __free_pages()
  2025-05-05 16:53           ` Suren Baghdasaryan
@ 2025-05-05 18:34             ` David Wang
  2025-05-05 19:17               ` David Wang
  2025-05-05 19:30             ` [PATCH v3] " David Wang
  1 sibling, 1 reply; 12+ messages in thread
From: David Wang @ 2025-05-05 18:34 UTC (permalink / raw)
  To: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy
  Cc: linux-mm, linux-kernel, David Wang

Commit 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled()
 checks") introduces a possible use-after-free scenario, when page
is non-compound, page[0] could be released by other thread right
after put_page_testzero failed in current thread, pgalloc_tag_sub_pages
afterwards would manipulate an invalid page for accounting remaining
pages:

[timeline]   [thread1]                     [thread2]
  |          alloc_page non-compound
  V
  |                                        get_page, rf counter inc
  V
  |          in ___free_pages
  |          put_page_testzero fails
  V
  |                                        put_page, page released
  V
  |          in ___free_pages,
  |          pgalloc_tag_sub_pages
  |          manipulate an invalid page
  V

Restore __free_pages() to its state before, retrieve alloc tag
beforehand.

Fixes: 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks")
Signed-off-by: David Wang <00107082@163.com>
---
 include/linux/pgalloc_tag.h |  7 +++++++
 mm/page_alloc.c             | 15 ++++++---------
 2 files changed, 13 insertions(+), 9 deletions(-)

diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index c74077977830..97eb4835568e 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -188,6 +188,13 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
 	return tag;
 }
 
+static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
+{
+	if (mem_alloc_profiling_enabled())
+		return __pgalloc_tag_get(page);
+	return NULL;
+}
+
 void pgalloc_tag_split(struct folio *folio, int old_order, int new_order);
 void pgalloc_tag_swap(struct folio *new, struct folio *old);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5669baf2a6fe..1b00e14a9780 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1151,14 +1151,9 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
 		__pgalloc_tag_sub(page, nr);
 }
 
-static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
+/* When tag is not NULL, assuming mem_alloc_profiling_enabled */
+static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
 {
-	struct alloc_tag *tag;
-
-	if (!mem_alloc_profiling_enabled())
-		return;
-
-	tag = __pgalloc_tag_get(page);
 	if (tag)
 		this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
 }
@@ -1168,7 +1163,7 @@ static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 				   unsigned int nr) {}
 static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
-static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr) {}
+static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
@@ -5065,11 +5060,13 @@ static void ___free_pages(struct page *page, unsigned int order,
 {
 	/* get PageHead before we drop reference */
 	int head = PageHead(page);
+	/* get alloc tag in case the page is released by others */
+	struct alloc_tag *tag = pgalloc_tag_get(page);
 
 	if (put_page_testzero(page))
 		__free_frozen_pages(page, order, fpi_flags);
 	else if (!head) {
-		pgalloc_tag_sub_pages(page, (1 << order) - 1);
+		pgalloc_tag_sub_pages(tag, (1 << order) - 1);
 		while (order-- > 0)
 			__free_frozen_pages(page + (1 << order), order,
 					    fpi_flags);
-- 
2.39.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re:[PATCH v2] mm/codetag: move tag retrieval back upfront in __free_pages()
  2025-05-05 18:34             ` [PATCH v2] mm/codetag: move tag retrieval back upfront in __free_pages() David Wang
@ 2025-05-05 19:17               ` David Wang
  0 siblings, 0 replies; 12+ messages in thread
From: David Wang @ 2025-05-05 19:17 UTC (permalink / raw)
  To: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy
  Cc: linux-mm, linux-kernel

Sorry, I made a mistake, pgalloc_tag_get should have a dummy definition 
when CONFIG_MEM_ALLOC_PROFILING not defined, will send another patch.


At 2025-05-06 02:34:23, "David Wang" <00107082@163.com> wrote:
>Commit 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled()
> checks") introduces a possible use-after-free scenario, when page
>is non-compound, page[0] could be released by other thread right
>after put_page_testzero failed in current thread, pgalloc_tag_sub_pages
>afterwards would manipulate an invalid page for accounting remaining
>pages:
>
>[timeline]   [thread1]                     [thread2]
>  |          alloc_page non-compound
>  V
>  |                                        get_page, rf counter inc
>  V
>  |          in ___free_pages
>  |          put_page_testzero fails
>  V
>  |                                        put_page, page released
>  V
>  |          in ___free_pages,
>  |          pgalloc_tag_sub_pages
>  |          manipulate an invalid page
>  V
>
>Restore __free_pages() to its state before, retrieve alloc tag
>beforehand.
>
>Fixes: 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks")
>Signed-off-by: David Wang <00107082@163.com>
>---
> include/linux/pgalloc_tag.h |  7 +++++++
> mm/page_alloc.c             | 15 ++++++---------
> 2 files changed, 13 insertions(+), 9 deletions(-)
>
>diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
>index c74077977830..97eb4835568e 100644
>--- a/include/linux/pgalloc_tag.h
>+++ b/include/linux/pgalloc_tag.h
>@@ -188,6 +188,13 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
> 	return tag;
> }
> 
>+static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
>+{
>+	if (mem_alloc_profiling_enabled())
>+		return __pgalloc_tag_get(page);
>+	return NULL;
>+}
>+
> void pgalloc_tag_split(struct folio *folio, int old_order, int new_order);
> void pgalloc_tag_swap(struct folio *new, struct folio *old);
> 
>diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>index 5669baf2a6fe..1b00e14a9780 100644
>--- a/mm/page_alloc.c
>+++ b/mm/page_alloc.c
>@@ -1151,14 +1151,9 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
> 		__pgalloc_tag_sub(page, nr);
> }
> 
>-static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
>+/* When tag is not NULL, assuming mem_alloc_profiling_enabled */
>+static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
> {
>-	struct alloc_tag *tag;
>-
>-	if (!mem_alloc_profiling_enabled())
>-		return;
>-
>-	tag = __pgalloc_tag_get(page);
> 	if (tag)
> 		this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
> }
>@@ -1168,7 +1163,7 @@ static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
> static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
> 				   unsigned int nr) {}
> static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
>-static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr) {}
>+static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
> 
> #endif /* CONFIG_MEM_ALLOC_PROFILING */
> 
>@@ -5065,11 +5060,13 @@ static void ___free_pages(struct page *page, unsigned int order,
> {
> 	/* get PageHead before we drop reference */
> 	int head = PageHead(page);
>+	/* get alloc tag in case the page is released by others */
>+	struct alloc_tag *tag = pgalloc_tag_get(page);
> 
> 	if (put_page_testzero(page))
> 		__free_frozen_pages(page, order, fpi_flags);
> 	else if (!head) {
>-		pgalloc_tag_sub_pages(page, (1 << order) - 1);
>+		pgalloc_tag_sub_pages(tag, (1 << order) - 1);
> 		while (order-- > 0)
> 			__free_frozen_pages(page + (1 << order), order,
> 					    fpi_flags);
>-- 
>2.39.2

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3] mm/codetag: move tag retrieval back upfront in __free_pages()
  2025-05-05 16:53           ` Suren Baghdasaryan
  2025-05-05 18:34             ` [PATCH v2] mm/codetag: move tag retrieval back upfront in __free_pages() David Wang
@ 2025-05-05 19:30             ` David Wang
  2025-05-05 20:32               ` Suren Baghdasaryan
  2025-05-06  7:58               ` Vlastimil Babka
  1 sibling, 2 replies; 12+ messages in thread
From: David Wang @ 2025-05-05 19:30 UTC (permalink / raw)
  To: akpm, vbabka, surenb, mhocko, jackmanb, hannes, ziy
  Cc: linux-mm, linux-kernel, David Wang

Commit 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled()
 checks") introduces a possible use-after-free scenario, when page
is non-compound, page[0] could be released by other thread right
after put_page_testzero failed in current thread, pgalloc_tag_sub_pages
afterwards would manipulate an invalid page for accounting remaining
pages:

[timeline]   [thread1]                     [thread2]
  |          alloc_page non-compound
  V
  |                                        get_page, rf counter inc
  V
  |          in ___free_pages
  |          put_page_testzero fails
  V
  |                                        put_page, page released
  V
  |          in ___free_pages,
  |          pgalloc_tag_sub_pages
  |          manipulate an invalid page
  V

Restore __free_pages() to its state before, retrieve alloc tag
beforehand.

Fixes: 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks")
Signed-off-by: David Wang <00107082@163.com>
---
 include/linux/pgalloc_tag.h |  8 ++++++++
 mm/page_alloc.c             | 15 ++++++---------
 2 files changed, 14 insertions(+), 9 deletions(-)

diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
index c74077977830..8a7f4f802c57 100644
--- a/include/linux/pgalloc_tag.h
+++ b/include/linux/pgalloc_tag.h
@@ -188,6 +188,13 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
 	return tag;
 }
 
+static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
+{
+	if (mem_alloc_profiling_enabled())
+		return __pgalloc_tag_get(page);
+	return NULL;
+}
+
 void pgalloc_tag_split(struct folio *folio, int old_order, int new_order);
 void pgalloc_tag_swap(struct folio *new, struct folio *old);
 
@@ -199,6 +206,7 @@ static inline void clear_page_tag_ref(struct page *page) {}
 static inline void alloc_tag_sec_init(void) {}
 static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order) {}
 static inline void pgalloc_tag_swap(struct folio *new, struct folio *old) {}
+static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5669baf2a6fe..1b00e14a9780 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1151,14 +1151,9 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
 		__pgalloc_tag_sub(page, nr);
 }
 
-static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
+/* When tag is not NULL, assuming mem_alloc_profiling_enabled */
+static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
 {
-	struct alloc_tag *tag;
-
-	if (!mem_alloc_profiling_enabled())
-		return;
-
-	tag = __pgalloc_tag_get(page);
 	if (tag)
 		this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
 }
@@ -1168,7 +1163,7 @@ static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
 static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
 				   unsigned int nr) {}
 static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
-static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr) {}
+static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
 
 #endif /* CONFIG_MEM_ALLOC_PROFILING */
 
@@ -5065,11 +5060,13 @@ static void ___free_pages(struct page *page, unsigned int order,
 {
 	/* get PageHead before we drop reference */
 	int head = PageHead(page);
+	/* get alloc tag in case the page is released by others */
+	struct alloc_tag *tag = pgalloc_tag_get(page);
 
 	if (put_page_testzero(page))
 		__free_frozen_pages(page, order, fpi_flags);
 	else if (!head) {
-		pgalloc_tag_sub_pages(page, (1 << order) - 1);
+		pgalloc_tag_sub_pages(tag, (1 << order) - 1);
 		while (order-- > 0)
 			__free_frozen_pages(page + (1 << order), order,
 					    fpi_flags);
-- 
2.39.2



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm/codetag: move tag retrieval back upfront in __free_pages()
  2025-05-05 19:30             ` [PATCH v3] " David Wang
@ 2025-05-05 20:32               ` Suren Baghdasaryan
  2025-05-06  7:58               ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: Suren Baghdasaryan @ 2025-05-05 20:32 UTC (permalink / raw)
  To: David Wang
  Cc: akpm, vbabka, mhocko, jackmanb, hannes, ziy, linux-mm, linux-kernel

On Mon, May 5, 2025 at 12:31 PM David Wang <00107082@163.com> wrote:
>
> Commit 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled()
>  checks") introduces a possible use-after-free scenario, when page
> is non-compound, page[0] could be released by other thread right
> after put_page_testzero failed in current thread, pgalloc_tag_sub_pages
> afterwards would manipulate an invalid page for accounting remaining
> pages:
>
> [timeline]   [thread1]                     [thread2]
>   |          alloc_page non-compound
>   V
>   |                                        get_page, rf counter inc
>   V
>   |          in ___free_pages
>   |          put_page_testzero fails
>   V
>   |                                        put_page, page released
>   V
>   |          in ___free_pages,
>   |          pgalloc_tag_sub_pages
>   |          manipulate an invalid page
>   V
>
> Restore __free_pages() to its state before, retrieve alloc tag
> beforehand.
>
> Fixes: 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks")
> Signed-off-by: David Wang <00107082@163.com>

Acked-by: Suren Baghdasaryan <surenb@google.com>

Thanks!


> ---
>  include/linux/pgalloc_tag.h |  8 ++++++++
>  mm/page_alloc.c             | 15 ++++++---------
>  2 files changed, 14 insertions(+), 9 deletions(-)
>
> diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
> index c74077977830..8a7f4f802c57 100644
> --- a/include/linux/pgalloc_tag.h
> +++ b/include/linux/pgalloc_tag.h
> @@ -188,6 +188,13 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
>         return tag;
>  }
>
> +static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
> +{
> +       if (mem_alloc_profiling_enabled())
> +               return __pgalloc_tag_get(page);
> +       return NULL;
> +}
> +
>  void pgalloc_tag_split(struct folio *folio, int old_order, int new_order);
>  void pgalloc_tag_swap(struct folio *new, struct folio *old);
>
> @@ -199,6 +206,7 @@ static inline void clear_page_tag_ref(struct page *page) {}
>  static inline void alloc_tag_sec_init(void) {}
>  static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order) {}
>  static inline void pgalloc_tag_swap(struct folio *new, struct folio *old) {}
> +static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
>
>  #endif /* CONFIG_MEM_ALLOC_PROFILING */
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5669baf2a6fe..1b00e14a9780 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1151,14 +1151,9 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
>                 __pgalloc_tag_sub(page, nr);
>  }
>
> -static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
> +/* When tag is not NULL, assuming mem_alloc_profiling_enabled */
> +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
>  {
> -       struct alloc_tag *tag;
> -
> -       if (!mem_alloc_profiling_enabled())
> -               return;
> -
> -       tag = __pgalloc_tag_get(page);
>         if (tag)
>                 this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
>  }
> @@ -1168,7 +1163,7 @@ static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
>  static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
>                                    unsigned int nr) {}
>  static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
> -static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr) {}
> +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
>
>  #endif /* CONFIG_MEM_ALLOC_PROFILING */
>
> @@ -5065,11 +5060,13 @@ static void ___free_pages(struct page *page, unsigned int order,
>  {
>         /* get PageHead before we drop reference */
>         int head = PageHead(page);
> +       /* get alloc tag in case the page is released by others */
> +       struct alloc_tag *tag = pgalloc_tag_get(page);
>
>         if (put_page_testzero(page))
>                 __free_frozen_pages(page, order, fpi_flags);
>         else if (!head) {
> -               pgalloc_tag_sub_pages(page, (1 << order) - 1);
> +               pgalloc_tag_sub_pages(tag, (1 << order) - 1);
>                 while (order-- > 0)
>                         __free_frozen_pages(page + (1 << order), order,
>                                             fpi_flags);
> --
> 2.39.2
>


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] mm/codetag: move tag retrieval back upfront in __free_pages()
  2025-05-05 19:30             ` [PATCH v3] " David Wang
  2025-05-05 20:32               ` Suren Baghdasaryan
@ 2025-05-06  7:58               ` Vlastimil Babka
  1 sibling, 0 replies; 12+ messages in thread
From: Vlastimil Babka @ 2025-05-06  7:58 UTC (permalink / raw)
  To: David Wang, akpm, surenb, mhocko, jackmanb, hannes, ziy
  Cc: linux-mm, linux-kernel

On 5/5/25 21:30, David Wang wrote:
> Commit 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled()
>  checks") introduces a possible use-after-free scenario, when page
> is non-compound, page[0] could be released by other thread right
> after put_page_testzero failed in current thread, pgalloc_tag_sub_pages
> afterwards would manipulate an invalid page for accounting remaining
> pages:
> 
> [timeline]   [thread1]                     [thread2]
>   |          alloc_page non-compound
>   V
>   |                                        get_page, rf counter inc
>   V
>   |          in ___free_pages
>   |          put_page_testzero fails
>   V
>   |                                        put_page, page released
>   V
>   |          in ___free_pages,
>   |          pgalloc_tag_sub_pages
>   |          manipulate an invalid page
>   V
> 
> Restore __free_pages() to its state before, retrieve alloc tag
> beforehand.
> 
> Fixes: 51ff4d7486f0 ("mm: avoid extra mem_alloc_profiling_enabled() checks")

That's a 6.15-rc1 commit, so no cc: stable needed, but should go to mm-hotfixes.

> Signed-off-by: David Wang <00107082@163.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Thanks!

> ---
>  include/linux/pgalloc_tag.h |  8 ++++++++
>  mm/page_alloc.c             | 15 ++++++---------
>  2 files changed, 14 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/pgalloc_tag.h b/include/linux/pgalloc_tag.h
> index c74077977830..8a7f4f802c57 100644
> --- a/include/linux/pgalloc_tag.h
> +++ b/include/linux/pgalloc_tag.h
> @@ -188,6 +188,13 @@ static inline struct alloc_tag *__pgalloc_tag_get(struct page *page)
>  	return tag;
>  }
>  
> +static inline struct alloc_tag *pgalloc_tag_get(struct page *page)
> +{
> +	if (mem_alloc_profiling_enabled())
> +		return __pgalloc_tag_get(page);
> +	return NULL;
> +}
> +
>  void pgalloc_tag_split(struct folio *folio, int old_order, int new_order);
>  void pgalloc_tag_swap(struct folio *new, struct folio *old);
>  
> @@ -199,6 +206,7 @@ static inline void clear_page_tag_ref(struct page *page) {}
>  static inline void alloc_tag_sec_init(void) {}
>  static inline void pgalloc_tag_split(struct folio *folio, int old_order, int new_order) {}
>  static inline void pgalloc_tag_swap(struct folio *new, struct folio *old) {}
> +static inline struct alloc_tag *pgalloc_tag_get(struct page *page) { return NULL; }
>  
>  #endif /* CONFIG_MEM_ALLOC_PROFILING */
>  
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5669baf2a6fe..1b00e14a9780 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1151,14 +1151,9 @@ static inline void pgalloc_tag_sub(struct page *page, unsigned int nr)
>  		__pgalloc_tag_sub(page, nr);
>  }
>  
> -static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
> +/* When tag is not NULL, assuming mem_alloc_profiling_enabled */
> +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr)
>  {
> -	struct alloc_tag *tag;
> -
> -	if (!mem_alloc_profiling_enabled())
> -		return;
> -
> -	tag = __pgalloc_tag_get(page);
>  	if (tag)
>  		this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
>  }
> @@ -1168,7 +1163,7 @@ static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
>  static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
>  				   unsigned int nr) {}
>  static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
> -static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr) {}
> +static inline void pgalloc_tag_sub_pages(struct alloc_tag *tag, unsigned int nr) {}
>  
>  #endif /* CONFIG_MEM_ALLOC_PROFILING */
>  
> @@ -5065,11 +5060,13 @@ static void ___free_pages(struct page *page, unsigned int order,
>  {
>  	/* get PageHead before we drop reference */
>  	int head = PageHead(page);
> +	/* get alloc tag in case the page is released by others */
> +	struct alloc_tag *tag = pgalloc_tag_get(page);
>  
>  	if (put_page_testzero(page))
>  		__free_frozen_pages(page, order, fpi_flags);
>  	else if (!head) {
> -		pgalloc_tag_sub_pages(page, (1 << order) - 1);
> +		pgalloc_tag_sub_pages(tag, (1 << order) - 1);
>  		while (order-- > 0)
>  			__free_frozen_pages(page + (1 << order), order,
>  					    fpi_flags);



^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-05-06  7:58 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-05-04  6:19 [PATCH] mm/codetag: sub in advance when free non-compound high order pages David Wang
2025-05-05 13:12 ` Vlastimil Babka
2025-05-05 14:31   ` David Wang
2025-05-05 14:55     ` Vlastimil Babka
2025-05-05 15:33       ` Suren Baghdasaryan
2025-05-05 16:42         ` David Wang
2025-05-05 16:53           ` Suren Baghdasaryan
2025-05-05 18:34             ` [PATCH v2] mm/codetag: move tag retrieval back upfront in __free_pages() David Wang
2025-05-05 19:17               ` David Wang
2025-05-05 19:30             ` [PATCH v3] " David Wang
2025-05-05 20:32               ` Suren Baghdasaryan
2025-05-06  7:58               ` Vlastimil Babka

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox