linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: David Wang <00107082@163.com>,
	akpm@linux-foundation.org, surenb@google.com, mhocko@suse.com,
	jackmanb@google.com, hannes@cmpxchg.org, ziy@nvidia.com
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Shakeel Butt <shakeel.butt@linux.dev>
Subject: Re: [PATCH] mm/codetag: sub in advance when free non-compound high order pages
Date: Mon, 5 May 2025 15:12:55 +0200	[thread overview]
Message-ID: <8edbd2be-d495-4bfc-a9f3-6eaae7a66d91@suse.cz> (raw)
In-Reply-To: <20250504061923.66914-1-00107082@163.com>

On 5/4/25 08:19, David Wang wrote:
> When page is non-compound, page[0] could be released by other
> thread right after put_page_testzero failed in current thread,
> pgalloc_tag_sub_pages afterwards would manipulate an invalid
> page for accounting remaining pages:
> 
> [timeline]   [thread1]                     [thread2]
>   |          alloc_page non-compound
>   V
>   |                                        get_page, rf counter inc
>   V
>   |          in ___free_pages
>   |          put_page_testzero fails
>   V
>   |                                        put_page, page released
>   V
>   |          in ___free_pages,
>   |          pgalloc_tag_sub_pages
>   |          manipulate an invalid page
>   V
>   V
> 
> Move the tag page accounting ahead, and only account remaining pages
> for non-compound pages with non-zero order.
> 
> Signed-off-by: David Wang <00107082@163.com>

Hmm, I think the problem was introduced by 51ff4d7486f0 ("mm: avoid extra
mem_alloc_profiling_enabled() checks"). Previously we'd get the tag pointer
upfront and avoid the page use-after-free.

It would likely be nicer to fix it by going back to that approach for
___free_pages(), while hopefully keeping the optimisations of 51ff4d7486f0
for the other call sites where it applies?

> ---
>  mm/page_alloc.c | 36 +++++++++++++++++++++++++++++++++---
>  1 file changed, 33 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 5669baf2a6fe..c42e41ed35fe 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1163,12 +1163,25 @@ static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr)
>  		this_cpu_sub(tag->counters->bytes, PAGE_SIZE * nr);
>  }
>  
> +static inline void pgalloc_tag_add_pages(struct page *page, unsigned int nr)
> +{
> +	struct alloc_tag *tag;
> +
> +	if (!mem_alloc_profiling_enabled())
> +		return;
> +
> +	tag = __pgalloc_tag_get(page);
> +	if (tag)
> +		this_cpu_add(tag->counters->bytes, PAGE_SIZE * nr);
> +}
> +
>  #else /* CONFIG_MEM_ALLOC_PROFILING */
>  
>  static inline void pgalloc_tag_add(struct page *page, struct task_struct *task,
>  				   unsigned int nr) {}
>  static inline void pgalloc_tag_sub(struct page *page, unsigned int nr) {}
>  static inline void pgalloc_tag_sub_pages(struct page *page, unsigned int nr) {}
> +static inline void pgalloc_tag_add_pages(struct page *page, unsigned int nr) {}
>  
>  #endif /* CONFIG_MEM_ALLOC_PROFILING */
>  
> @@ -5065,11 +5078,28 @@ static void ___free_pages(struct page *page, unsigned int order,
>  {
>  	/* get PageHead before we drop reference */
>  	int head = PageHead(page);
> +	/*
> +	 * For remaining pages other than the first page of
> +	 * a non-compound allocation, we decrease its tag
> +	 * pages in advance, in case the first page is released
> +	 * by other thread inbetween our put_page_testzero and any
> +	 * accounting behavior afterwards.
> +	 */
> +	unsigned int remaining_tag_pages = 0;
>  
> -	if (put_page_testzero(page))
> +	if (order > 0 && !head) {
> +		if (unlikely(page_ref_count(page) > 1)) {
> +			remaining_tag_pages = (1 << order) - 1;
> +			pgalloc_tag_sub_pages(page, remaining_tag_pages);
> +		}
> +	}
> +
> +	if (put_page_testzero(page)) {
> +		/* no need special treat for remaining pages, add it back. */
> +		if (unlikely(remaining_tag_pages > 0))
> +			pgalloc_tag_add_pages(page, remaining_tag_pages);
>  		__free_frozen_pages(page, order, fpi_flags);
> -	else if (!head) {
> -		pgalloc_tag_sub_pages(page, (1 << order) - 1);
> +	} else if (!head) {
>  		while (order-- > 0)
>  			__free_frozen_pages(page + (1 << order), order,
>  					    fpi_flags);



  reply	other threads:[~2025-05-05 13:13 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-04  6:19 David Wang
2025-05-05 13:12 ` Vlastimil Babka [this message]
2025-05-05 14:31   ` David Wang
2025-05-05 14:55     ` Vlastimil Babka
2025-05-05 15:33       ` Suren Baghdasaryan
2025-05-05 16:42         ` David Wang
2025-05-05 16:53           ` Suren Baghdasaryan
2025-05-05 18:34             ` [PATCH v2] mm/codetag: move tag retrieval back upfront in __free_pages() David Wang
2025-05-05 19:17               ` David Wang
2025-05-05 19:30             ` [PATCH v3] " David Wang
2025-05-05 20:32               ` Suren Baghdasaryan
2025-05-06  7:58               ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8edbd2be-d495-4bfc-a9f3-6eaae7a66d91@suse.cz \
    --to=vbabka@suse.cz \
    --cc=00107082@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox