From: yangge1116 <yangge1116@126.com>
To: Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com,
liuzixing@hygon.cn
Subject: Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating non-CMA THP-sized page
Date: Tue, 18 Jun 2024 11:31:00 +0800 [thread overview]
Message-ID: <4482bf69-eb07-0ec9-f777-28ce40f96589@126.com> (raw)
In-Reply-To: <CAGsJ_4ynfvjXsr6QFBA_7Gzk3PaO1pk+6ErKZaNCt4H+nuwiJw@mail.gmail.com>
在 2024/6/18 上午9:55, Barry Song 写道:
> On Tue, Jun 18, 2024 at 9:36 AM yangge1116 <yangge1116@126.com> wrote:
>>
>>
>>
>> 在 2024/6/17 下午8:47, yangge1116 写道:
>>>
>>>
>>> 在 2024/6/17 下午6:26, Barry Song 写道:
>>>> On Tue, Jun 4, 2024 at 9:15 PM <yangge1116@126.com> wrote:
>>>>>
>>>>> From: yangge <yangge1116@126.com>
>>>>>
>>>>> Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
>>>>> THP-sized allocations") no longer differentiates the migration type
>>>>> of pages in THP-sized PCP list, it's possible to get a CMA page from
>>>>> the list, in some cases, it's not acceptable, for example, allocating
>>>>> a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page.
>>>>>
>>>>> The patch forbids allocating non-CMA THP-sized page from THP-sized
>>>>> PCP list to avoid the issue above.
>>>>
>>>> Could you please describe the impact on users in the commit log?
>>>
>>> If a large number of CMA memory are configured in the system (for
>>> example, the CMA memory accounts for 50% of the system memory), starting
>>> virtual machine with device passthrough will get stuck.
>>>
>>> During starting virtual machine, it will call pin_user_pages_remote(...,
>>> FOLL_LONGTERM, ...) to pin memory. If a page is in CMA area,
>>> pin_user_pages_remote() will migrate the page from CMA area to non-CMA
>>> area because of FOLL_LONGTERM flag. If non-movable allocation requests
>>> return CMA memory, pin_user_pages_remote() will enter endless loops.
>>>
>>> backtrace:
>>> pin_user_pages_remote
>>> ----__gup_longterm_locked //cause endless loops in this function
>>> --------__get_user_pages_locked
>>> --------check_and_migrate_movable_pages //always check fail and continue
>>> to migrate
>>> ------------migrate_longterm_unpinnable_pages
>>> ----------------alloc_migration_target // non-movable allocation
>>>
>>>> Is it possible that some CMA memory might be used by non-movable
>>>> allocation requests?
>>>
>>> Yes.
>>>
>>>
>>>> If so, will CMA somehow become unable to migrate, causing cma_alloc()
>>>> to fail?
>>>
>>>
>>> No, it will cause endless loops in __gup_longterm_locked(). If
>>> non-movable allocation requests return CMA memory,
>>> migrate_longterm_unpinnable_pages() will migrate a CMA page to another
>>> CMA page, which is useless and cause endless loops in
>>> __gup_longterm_locked().
>
> This is only one perspective. We also need to consider the impact on
> CMA itself. For example,
> when CMA is borrowed by THP, and we need to reclaim it through
> cma_alloc() or dma_alloc_coherent(),
> we must move those pages out to ensure CMA's users can retrieve that
> contiguous memory.
>
> Currently, CMA's memory is occupied by non-movable pages, meaning we
> can't relocate them.
> As a result, cma_alloc() is more likely to fail.
>
>>>
>>> backtrace:
>>> pin_user_pages_remote
>>> ----__gup_longterm_locked //cause endless loops in this function
>>> --------__get_user_pages_locked
>>> --------check_and_migrate_movable_pages //always check fail and continue
>>> to migrate
>>> ------------migrate_longterm_unpinnable_pages
>>>
>>>
>>>
>>>
>>>
>>>>>
>>>>> Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for
>>>>> THP-sized allocations")
>>>>> Signed-off-by: yangge <yangge1116@126.com>
>>>>> ---
>>>>> mm/page_alloc.c | 10 ++++++++++
>>>>> 1 file changed, 10 insertions(+)
>>>>>
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index 2e22ce5..0bdf471 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone
>>>>> *preferred_zone,
>>>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
>>>>>
>>>>> if (likely(pcp_allowed_order(order))) {
>>>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE
>>>>> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags &
>>>>> ALLOC_CMA ||
>>>>> + order !=
>>>>> HPAGE_PMD_ORDER) {
>>>>> + page = rmqueue_pcplist(preferred_zone, zone,
>>>>> order,
>>>>> + migratetype,
>>>>> alloc_flags);
>>>>> + if (likely(page))
>>>>> + goto out;
>>>>> + }
>>>>
>>>> This seems not ideal, because non-CMA THP gets no chance to use PCP.
>>>> But it
>>>> still seems better than causing the failure of CMA allocation.
>>>>
>>>> Is there a possible approach to avoiding adding CMA THP into pcp from
>>>> the first
>>>> beginning? Otherwise, we might need a separate PCP for CMA.
>>>>
>>
>> The vast majority of THP-sized allocations are GFP_MOVABLE, avoiding
>> adding CMA THP into pcp may incur a slight performance penalty.
>>
>
> But the majority of movable pages aren't CMA, right?
> Do we have an estimate for
> adding back a CMA THP PCP? Will per_cpu_pages introduce a new cacheline, which
> the original intention for THP was to avoid by having only one PCP[1]?
>
> [1] https://patchwork.kernel.org/project/linux-mm/patch/20220624125423.6126-3-mgorman@techsingularity.net/
>
The size of struct per_cpu_pages is 256 bytes in current code containing
commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized
allocations").
crash> struct per_cpu_pages
struct per_cpu_pages {
spinlock_t lock;
int count;
int high;
int high_min;
int high_max;
int batch;
u8 flags;
u8 alloc_factor;
u8 expire;
short free_count;
struct list_head lists[13];
}
SIZE: 256
After revert commit 5d0a661d808f ("mm/page_alloc: use only one PCP list
for THP-sized allocations"), the size of struct per_cpu_pages is 272 bytes.
crash> struct per_cpu_pages
struct per_cpu_pages {
spinlock_t lock;
int count;
int high;
int high_min;
int high_max;
int batch;
u8 flags;
u8 alloc_factor;
u8 expire;
short free_count;
struct list_head lists[15];
}
SIZE: 272
Seems commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for
THP-sized allocations") decrease one cacheline.
>
>> Commit 1d91df85f399 takes a similar approach to filter, and I mainly
>> refer to it.
>>
>>
>>>>> +#else
>>>>> page = rmqueue_pcplist(preferred_zone, zone, order,
>>>>> migratetype, alloc_flags);
>>>>> if (likely(page))
>>>>> goto out;
>>>>> +#endif
>>>>> }
>>>>>
>>>>> page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags,
>>>>> --
>>>>> 2.7.4
>>>>
>>>> Thanks
>>>> Barry
>>>>
>>
>>
next prev parent reply other threads:[~2024-06-18 3:31 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-04 9:14 yangge1116
2024-06-04 12:01 ` Baolin Wang
2024-06-04 12:36 ` yangge1116
2024-06-06 3:06 ` Baolin Wang
2024-06-06 9:10 ` yangge1116
2024-06-17 10:43 ` Barry Song
2024-06-17 11:36 ` Baolin Wang
2024-06-17 11:55 ` Barry Song
2024-06-18 3:31 ` yangge1116
2024-06-17 10:26 ` Barry Song
2024-06-17 12:47 ` yangge1116
2024-06-18 1:34 ` yangge1116
2024-06-18 1:55 ` Barry Song
2024-06-18 3:31 ` yangge1116 [this message]
2024-06-18 4:10 ` Barry Song
2024-06-18 5:49 ` yangge1116
2024-06-18 6:55 ` yangge1116
2024-06-18 6:58 ` Barry Song
2024-06-18 7:51 ` yangge1116
2024-06-19 5:34 ` Ge Yang
2024-06-19 8:20 ` Barry Song
2024-06-19 8:35 ` Ge Yang
2024-06-18 3:40 ` yangge1116
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4482bf69-eb07-0ec9-f777-28ce40f96589@126.com \
--to=yangge1116@126.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liuzixing@hygon.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox