From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9D571C47258 for ; Thu, 18 Jan 2024 01:47:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37CF26B0075; Wed, 17 Jan 2024 20:47:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 32C1B6B0078; Wed, 17 Jan 2024 20:47:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1F4336B007D; Wed, 17 Jan 2024 20:47:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 0D5476B0075 for ; Wed, 17 Jan 2024 20:47:05 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D29AA1C1324 for ; Thu, 18 Jan 2024 01:47:04 +0000 (UTC) X-FDA: 81690743568.24.33922FF Received: from mail-yb1-f178.google.com (mail-yb1-f178.google.com [209.85.219.178]) by imf13.hostedemail.com (Postfix) with ESMTP id 1246320002 for ; Thu, 18 Jan 2024 01:47:02 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="I7s/sHKK"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705542423; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+SbokGqNIGizcFU8VWDSVMedxouK7tzaX6SbnlnQ1SI=; b=2oB8R+AJEgYL3AgXDBsVzcC9gkE53+JwjCp3Uu4iLEyWGzLVeh/OvzOYZ8XrpVyV9dxTHX JDYSkQUO/9DlFG5mEoFt7IVk4bjkyfEpcht6bLpj5HRwzbNlTQQV3CIK1smT6skuJvpQIF mWSmg1+cgyvdgIGIP+xgzE9h0eLSD+k= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="I7s/sHKK"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf13.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.219.178 as permitted sender) smtp.mailfrom=ioworker0@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705542423; a=rsa-sha256; cv=none; b=09uj+s6lr7TAJ8SzomJgyOGoA2xehuYAVc7ku3nMyRra/ihNeYETmd1k26yPzAqK5R5aAr zSAYzH0Y0P81m2dqv9E7NuURFWq44wTEP0AmAnpaa4b1wwYfOtrLgNpZKsJcnswHU2TYgM UuSv/JI9SSRRV9OfNyaVFJ03jk4RTow= Received: by mail-yb1-f178.google.com with SMTP id 3f1490d57ef6-dc253ca54cdso384661276.0 for ; Wed, 17 Jan 2024 17:47:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1705542422; x=1706147222; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=+SbokGqNIGizcFU8VWDSVMedxouK7tzaX6SbnlnQ1SI=; b=I7s/sHKKWvDtIko6JQV2P5/+jAIri9ngG3DcfphLe4n6MzFHNAOKaWO1GQaDv1C5Fh gu/7c8ufemqlgBcXbh7W1KphvuP6MVhGLDZ+EFG8z6QKlqA1d1c2bZR1phvWc1bu9kX2 h51+aHteGEnkQhv2f9K44m+T5bZbQfD451OswMCFqRHjBHOj53aoEtN1i07zctjdIsxd YUe85BgrWWOMUYjo8rMk9MYi3hDrC0laPq8PMjCLmVTkP/cMbW5IUPYiT6EVMOlGSrbS xmNQs+mRCqsKM+SY+MzH9GjRVFd2k1koVF6IBhrBXTQA2d7sBUgL63lcKtK8xfhgJ/tc Q1VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1705542422; x=1706147222; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+SbokGqNIGizcFU8VWDSVMedxouK7tzaX6SbnlnQ1SI=; b=Jl0NbnW9G9t+xBsq0o3Dcv2t0AxOYyHcSeSMJUQN0qhUS1hqsMEyNpHwPjT/Dwja7I MbccdGqj9ndEPV7eLX+NdNNRFAONofZKUZfKPRMOK9qiMlWY6QzgTHWSSv8cWbZ8BKVD l6PecT6uNQNgwj25uwVAJswJ5y7KLMnQsY+T6TZRmULk7rpV3kpQTag3DMROgyDQcdfM S3XYOARm0fH3eMi5vgE3QhYo1EURnQ+frNuYvTY9MeSxjILLd80keljXKgwoYMDAp04d tKTd2jyyi/7MuMVLiQk+eEQYdfGWz7Nh7BKjNQ14QUtTPx20r/Ja+CA125U64t2NuAW5 UXjA== X-Gm-Message-State: AOJu0YwVGMuaiUlAAh72E7ShpH+uG7a0TUnFkxaXKLWcSLRp7SX0LXFv Vk02zQE/cV8c8kMSqVH/Di8TgcwIZ8iznUVaOACRZoAWLDhmJZknSOOQP16+NERxm93q1Bh70nH eGBK+3X8fdUCYsBPHIekpUIW19J4= X-Google-Smtp-Source: AGHT+IFwDMyE3srS0/WoJRHeR2/VbaUuhAR/MNShNGmzF9e7AH5yusDoq380X8zWMm6QkmkjLToJEBXMejO/80QDjBI= X-Received: by 2002:a25:a290:0:b0:dc2:4402:6ec4 with SMTP id c16-20020a25a290000000b00dc244026ec4mr65057ybi.40.1705542421930; Wed, 17 Jan 2024 17:47:01 -0800 (PST) MIME-Version: 1.0 References: <20240117050217.43610-1-ioworker0@gmail.com> In-Reply-To: From: Lance Yang Date: Thu, 18 Jan 2024 09:46:50 +0800 Message-ID: Subject: Re: [PATCH v1 1/2] mm/madvise: introduce MADV_TRY_COLLAPSE for attempted synchronous hugepage collapse To: "Zach O'Keefe" Cc: akpm@linux-foundation.org, songmuchun@bytedance.com, linux-kernel@vger.kernel.org, Yang Shi , Peter Xu , David Hildenbrand , Michael Knyszek , Minchan Kim , Michal Hocko , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 1246320002 X-Stat-Signature: ncbqg1kphbfmc5qy4jxamj89pdtuu9fa X-Rspam-User: X-HE-Tag: 1705542422-315417 X-HE-Meta: U2FsdGVkX1+0+T5iTA9jJmaJhFLqs71+fTqywBd6p1RGV8J3DFutEydXUXyEUToInEdTrQ5s2J2PwUpPXxasaVLfvbDMq2m0aO5lLv/EZs2dsRMkfyJHYV/MJj4E5BffXtfkNT2v/JY1wsuhPaLc5jW0woSsXlp/s2iNZblNZ97tWLrTPMpdJYziIHgIf0BJbsaWy1zk5DRyH1RzyyaXpYZXQYxNZhIyME+scM/xQEjjyg5il/AVPwX3SjAMjKCcWsL8FGfgCV9S4UPyt79d4XSVnCSKpQUmUrJZ1ClVXA8/gWVAoEOKDWZRpSbgPU8yLSWHUBeyyIhPtg82fWraf3mFXUYnu07WPt1HmUwuJlyEBfnEvHuoPvZyjUs0s9tXyznMEfl/wjq2yrU4qTLBwUnVlSHGyLmjIaVAYYoZ4cG8a7lzRyyIJKaDwC/4IT052DIwjRttubNclGtakBqphz6rsI1K2Pd6YjhcEwMmi8Bs1DWtWhkR2MeJMsL89U7K6WkCNJU/3Y92ePRvlvpbEQRSXEO5789tLOC+PW3NwTRjKSGr5ZQUGj4Ce3ValQIxNqOd54u35IkS0iEtu9/vpvhywN0QBH0DIfMG2jKbJWroTELcJY3X9Arazh/m87do/pLkccJilemCL4RhWBnNX8h8faMrknlM83TWdCM2ZoSWYSft3/bhgYt6jC2tfrhcwrYLU93dSwOnR8vt2VYgjsdX9zzlxXr9AdnLlh1BbIaS70o2JFWf0QKvMWyqOTn/iKO2BY5fXLVL7KkW32H4XFumrrLpPQhel+sKo1xqvNablrOIR+ZgOj0jKq22iRR3RZC23kY77tEw0AHrei0jIfGePMeDMAza7xoWZ5IWudjuOYvamOkEBrxMYi1rz9gNCaGMCAwWtqtJs6x0u2YhmQIhbazkRDhKzu+FT6/158b/M/iGhJVqpH8I0kc1kcIC7OvjYX7NB3gUf353dwW +tyCUaE+ 6N6psj7eR3L9JLlva8wsL8U68N2OUCn40JAvyATOah8WskAyfXWC8O5Zq5DvI/Xkb7eyz+Q5Ze3OIYGybwG2cxa8QP3gwALAYz2VRqmK4AkoISLeRogcZ8C0R2ex1e+OwVLmUacE+BWaQqdckxC6QYDOV4zuUX9ZV9k16D7AcY/L+W0Tiy7as/+kywVL8TVKruuoG5VvCvcyeeovKQNiR5EEPfISZQhaDDxGLtjBrKEJGrC/CKFm9cxWxMbRmDob22zim0P5yT+MhrBRtCeSFHorOdtV9/atbPKCmTm6Ezc+HCX2GL6KDVHfMgnI/fHJD7qOGsDCh3aFa8IXJl+49J8GQhuQldR6YXWbKJT3Dui6IVMsNaATJH0A8hZK/46nAkez0aZQA3NM1jDsLLDoW+gIqh4PFNVg/QbkkO5bpbE20Q52pE91aORhrI2rMnKKRtBMEqATbeUMY7SiRDQGiNh3EvMk0MdgRjbnF X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hey Zach, Thanks for taking the time to review! Zach O'Keefe =E4=BA=8E2024=E5=B9=B41=E6=9C=8818=E6=97= =A5=E5=91=A8=E5=9B=9B 01:11=E5=86=99=E9=81=93=EF=BC=9A > > [+linux-mm & others] > > On Tue, Jan 16, 2024 at 9:02=E2=80=AFPM Lance Yang = wrote: > > > > This idea was inspired by MADV_COLLAPSE introduced by Zach O'Keefe[1]. > > > > Introduce a new madvise mode, MADV_TRY_COLLAPSE, that allows users to > > make a least-effort attempt at a synchronous collapse of memory at > > their own expense. > > > > The only difference from MADV_COLLAPSE is that the new hugepage allocat= ion > > avoids direct reclaim and/or compaction, quickly failing on allocation = errors. > > > > The benefits of this approach are: > > > > * CPU is charged to the process that wants to spend the cycles for the = THP > > * Avoid unpredictable timing of khugepaged collapse > > * Prevent unpredictable stalls caused by direct reclaim and/or compacti= on > > > > Semantics > > > > This call is independent of the system-wide THP sysfs settings, but wil= l > > fail for memory marked VM_NOHUGEPAGE. If the ranges provided span > > multiple VMAs, the semantics of the collapse over each VMA is independe= nt > > from the others. This implies a hugepage cannot cross a VMA boundary. = If > > collapse of a given hugepage-aligned/sized region fails, the operation = may > > continue to attempt collapsing the remainder of memory specified. > > > > The memory ranges provided must be page-aligned, but are not required t= o > > be hugepage-aligned. If the memory ranges are not hugepage-aligned, th= e > > start/end of the range will be clamped to the first/last hugepage-align= ed > > address covered by said range. The memory ranges must span at least on= e > > hugepage-sized region. > > > > All non-resident pages covered by the range will first be > > swapped/faulted-in, before being internally copied onto a freshly > > allocated hugepage. Unmapped pages will have their data directly > > initialized to 0 in the new hugepage. However, for every eligible > > hugepage aligned/sized region to-be collapsed, at least one page must > > currently be backed by memory (a PMD covering the address range must > > already exist). > > > > Allocation for the new hugepage will not enter direct reclaim and/or > > compaction, quickly failing if allocation fails. When the system has > > multiple NUMA nodes, the hugepage will be allocated from the node provi= ding > > the most native pages. This operation operates on the current state of = the > > specified process and makes no persistent changes or guarantees on how = pages > > will be mapped, constructed, or faulted in the future. > > > > Return Value > > > > If all hugepage-sized/aligned regions covered by the provided range wer= e > > either successfully collapsed, or were already PMD-mapped THPs, this > > operation will be deemed successful. On success, madvise(2) returns 0. > > Else, -1 is returned and errno is set to indicate the error for the > > most-recently attempted hugepage collapse. Note that many failures mig= ht > > have occurred, since the operation may continue to collapse in the even= t a > > single hugepage-sized/aligned region fails. > > > > ENOMEM Memory allocation failed or VMA not found > > EBUSY Memcg charging failed > > EAGAIN Required resource temporarily unavailable. Try again > > might succeed. > > EINVAL Other error: No PMD found, subpage doesn't have Present > > bit set, "Special" page no backed by struct page, VMA > > incorrectly sized, address not page-aligned, ... > > > > Use Cases > > > > An immediate user of this new functionality is the Go runtime heap allo= cator > > that manages memory in hugepage-sized chunks. In the past, whether it w= as a > > newly allocated chunk through mmap() or a reused chunk released by > > madvise(MADV_DONTNEED), the allocator attempted to eagerly back memory = with > > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)[3= ] > > respectively. However, both approaches resulted in performance issues; = for > > both scenarios, there could be entries into direct reclaim and/or compa= ction, > > leading to unpredictable stalls[4]. Now, the allocator can confidently = use > > madvise(MADV_TRY_COLLAPSE) to attempt the allocation of huge pages. > > > > [1] https://github.com/torvalds/linux/commit/7d8faaf155454f8798ec56404f= aca29a82689c77 > > [2] https://github.com/golang/go/commit/8fa9e3beee8b0e6baa7333740996181= 268b60a3a > > [3] https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca3be4= b3a9bd2af > > [4] https://github.com/golang/go/issues/63334 > > Thanks for the patch, Lance, and thanks for providing the links above, > referring to issues Go has seen. > > I've reached out to the Go team to try and understand their use case, > and how we could help. It's not immediately clear whether a > lighter-weight MADV_COLLAPSE is the answer, but it could turn out to > be. > > That said, with respect to the implementation, should a need for a > lighter-weight MADV_COLLAPSE be warranted, I'd personally like to see > process_madvise(2) be the "v2" of madvise(2), where we can start I agree with you; it's a good idea! > leveraging the forward-facing flags argument for these different > advice flavors. We'd need to safely revert v5.10 commit a68a0262abdaa > ("mm/madvise: remove racy mm ownership check") so that > process_madvise(2) can always operate on self. IIRC, this was ~ the > plan we landed on during MADV_COLLAPSE dev discussions (i.e. pick a > sane default, and implement options in flags down the line). > > That flag could be a MADV_F_COLLAPSE_LIGHT, where we use a lighter The name MADV_F_COLLAPSE_LIGHT sounds great for the flag, and its semantics are very clear. Thanks again for your review and your suggestion! Lance > allocation context, as well as, for example, only do a local > lru_add_drain() vs lru_add_drain_all(). But I'll refrain from thinking > too hard about it just yet. > > Best, > Zach > > > > > > Signed-off-by: Lance Yang > > --- > > arch/alpha/include/uapi/asm/mman.h | 1 + > > arch/mips/include/uapi/asm/mman.h | 1 + > > arch/parisc/include/uapi/asm/mman.h | 1 + > > arch/xtensa/include/uapi/asm/mman.h | 1 + > > include/linux/huge_mm.h | 5 +++-- > > include/uapi/asm-generic/mman-common.h | 1 + > > mm/khugepaged.c | 19 ++++++++++++++++--- > > mm/madvise.c | 8 +++++++- > > tools/include/uapi/asm-generic/mman-common.h | 1 + > > 9 files changed, 32 insertions(+), 6 deletions(-) > > > > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/ua= pi/asm/mman.h > > index 763929e814e9..44aa1f57a982 100644 > > --- a/arch/alpha/include/uapi/asm/mman.h > > +++ b/arch/alpha/include/uapi/asm/mman.h > > @@ -77,6 +77,7 @@ > > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop lock= ed pages too */ > > > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collaps= e */ > > +#define MADV_TRY_COLLAPSE 26 /* Similar to COLLAPSE, but avo= ids direct reclaim and/or compaction */ > > > > /* compatibility flags */ > > #define MAP_FILE 0 > > diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi= /asm/mman.h > > index c6e1fc77c996..1ae16e5d7dfc 100644 > > --- a/arch/mips/include/uapi/asm/mman.h > > +++ b/arch/mips/include/uapi/asm/mman.h > > @@ -104,6 +104,7 @@ > > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop lock= ed pages too */ > > > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collaps= e */ > > +#define MADV_TRY_COLLAPSE 26 /* Similar to COLLAPSE, but avo= ids direct reclaim and/or compaction */ > > > > /* compatibility flags */ > > #define MAP_FILE 0 > > diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/= uapi/asm/mman.h > > index 68c44f99bc93..f8d016ee1f98 100644 > > --- a/arch/parisc/include/uapi/asm/mman.h > > +++ b/arch/parisc/include/uapi/asm/mman.h > > @@ -71,6 +71,7 @@ > > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop lock= ed pages too */ > > > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collaps= e */ > > +#define MADV_TRY_COLLAPSE 26 /* Similar to COLLAPSE, but avo= ids direct reclaim and/or compaction */ > > > > #define MADV_HWPOISON 100 /* poison a page for testing */ > > #define MADV_SOFT_OFFLINE 101 /* soft offline page for testin= g */ > > diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/= uapi/asm/mman.h > > index 1ff0c858544f..c495d1b39c83 100644 > > --- a/arch/xtensa/include/uapi/asm/mman.h > > +++ b/arch/xtensa/include/uapi/asm/mman.h > > @@ -112,6 +112,7 @@ > > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop lock= ed pages too */ > > > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collaps= e */ > > +#define MADV_TRY_COLLAPSE 26 /* Similar to COLLAPSE, but avo= ids direct reclaim and/or compaction */ > > > > /* compatibility flags */ > > #define MAP_FILE 0 > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > > index 5adb86af35fc..e1af75aa18fb 100644 > > --- a/include/linux/huge_mm.h > > +++ b/include/linux/huge_mm.h > > @@ -303,7 +303,7 @@ int hugepage_madvise(struct vm_area_struct *vma, un= signed long *vm_flags, > > int advice); > > int madvise_collapse(struct vm_area_struct *vma, > > struct vm_area_struct **prev, > > - unsigned long start, unsigned long end); > > + unsigned long start, unsigned long end, bool is_tr= y); > > void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long s= tart, > > unsigned long end, long adjust_next); > > spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *v= ma); > > @@ -450,7 +450,8 @@ static inline int hugepage_madvise(struct vm_area_s= truct *vma, > > > > static inline int madvise_collapse(struct vm_area_struct *vma, > > struct vm_area_struct **prev, > > - unsigned long start, unsigned long e= nd) > > + unsigned long start, unsigned long e= nd, > > + bool is_try) > > { > > return -EINVAL; > > } > > diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-= generic/mman-common.h > > index 6ce1f1ceb432..a9e5273db5f6 100644 > > --- a/include/uapi/asm-generic/mman-common.h > > +++ b/include/uapi/asm-generic/mman-common.h > > @@ -78,6 +78,7 @@ > > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop lock= ed pages too */ > > > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collaps= e */ > > +#define MADV_TRY_COLLAPSE 26 /* Similar to COLLAPSE, but avo= ids direct reclaim and/or compaction */ > > > > /* compatibility flags */ > > #define MAP_FILE 0 > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > index 2b219acb528e..c22703155b6e 100644 > > --- a/mm/khugepaged.c > > +++ b/mm/khugepaged.c > > @@ -96,6 +96,7 @@ static struct kmem_cache *mm_slot_cache __ro_after_in= it; > > > > struct collapse_control { > > bool is_khugepaged; > > + bool is_try; > > > > /* Num pages scanned per node */ > > u32 node_load[MAX_NUMNODES]; > > @@ -1058,10 +1059,14 @@ static int __collapse_huge_page_swapin(struct m= m_struct *mm, > > static int alloc_charge_hpage(struct page **hpage, struct mm_struct *m= m, > > struct collapse_control *cc) > > { > > - gfp_t gfp =3D (cc->is_khugepaged ? alloc_hugepage_khugepaged_gf= pmask() : > > - GFP_TRANSHUGE); > > int node =3D hpage_collapse_find_target_node(cc); > > struct folio *folio; > > + gfp_t gfp; > > + > > + if (cc->is_khugepaged) > > + gfp =3D alloc_hugepage_khugepaged_gfpmask(); > > + else > > + gfp =3D cc->is_try ? GFP_TRANSHUGE_LIGHT : GFP_TRANSHUG= E; > > > > if (!hpage_collapse_alloc_folio(&folio, gfp, node, &cc->alloc_n= mask)) { > > *hpage =3D NULL; > > @@ -2697,7 +2702,7 @@ static int madvise_collapse_errno(enum scan_resul= t r) > > } > > > > int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct= **prev, > > - unsigned long start, unsigned long end) > > + unsigned long start, unsigned long end, bool is_tr= y) > > { > > struct collapse_control *cc; > > struct mm_struct *mm =3D vma->vm_mm; > > @@ -2718,6 +2723,7 @@ int madvise_collapse(struct vm_area_struct *vma, = struct vm_area_struct **prev, > > if (!cc) > > return -ENOMEM; > > cc->is_khugepaged =3D false; > > + cc->is_try =3D is_try; > > > > mmgrab(mm); > > lru_add_drain_all(); > > @@ -2773,6 +2779,13 @@ int madvise_collapse(struct vm_area_struct *vma,= struct vm_area_struct **prev, > > result =3D collapse_pte_mapped_thp(mm, addr, tr= ue); > > mmap_read_unlock(mm); > > goto handle_result; > > + /* MADV_TRY_COLLAPSE: fail quickly */ > > + case SCAN_ALLOC_HUGE_PAGE_FAIL: > > + case SCAN_CGROUP_CHARGE_FAIL: > > + if (cc->is_try) { > > + last_fail =3D result; > > + goto out_maybelock; > > + } > > /* Whitelisted set of results where continuing OK */ > > case SCAN_PMD_NULL: > > case SCAN_PTE_NON_PRESENT: > > diff --git a/mm/madvise.c b/mm/madvise.c > > index 912155a94ed5..5a359bcd286c 100644 > > --- a/mm/madvise.c > > +++ b/mm/madvise.c > > @@ -60,6 +60,7 @@ static int madvise_need_mmap_write(int behavior) > > case MADV_POPULATE_READ: > > case MADV_POPULATE_WRITE: > > case MADV_COLLAPSE: > > + case MADV_TRY_COLLAPSE: > > return 0; > > default: > > /* be safe, default to 1. list exceptions explicitly */ > > @@ -1082,8 +1083,10 @@ static int madvise_vma_behavior(struct vm_area_s= truct *vma, > > if (error) > > goto out; > > break; > > + case MADV_TRY_COLLAPSE: > > + return madvise_collapse(vma, prev, start, end, true); > > case MADV_COLLAPSE: > > - return madvise_collapse(vma, prev, start, end); > > + return madvise_collapse(vma, prev, start, end, false); > > } > > > > anon_name =3D anon_vma_name(vma); > > @@ -1178,6 +1181,7 @@ madvise_behavior_valid(int behavior) > > case MADV_HUGEPAGE: > > case MADV_NOHUGEPAGE: > > case MADV_COLLAPSE: > > + case MADV_TRY_COLLAPSE: > > #endif > > case MADV_DONTDUMP: > > case MADV_DODUMP: > > @@ -1368,6 +1372,8 @@ int madvise_set_anon_name(struct mm_struct *mm, u= nsigned long start, > > * transparent huge pages so the existing pages will not b= e > > * coalesced into THP and new pages will not be allocated = as THP. > > * MADV_COLLAPSE - synchronously coalesce pages into new THP. > > + * MADV_TRY_COLLAPSE - similar to COLLAPSE, but avoids direct reclaim > > + * and/or compaction. > > * MADV_DONTDUMP - the application wants to prevent pages in the give= n range > > * from being included in its core dump. > > * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core du= mp. > > diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/inclu= de/uapi/asm-generic/mman-common.h > > index 6ce1f1ceb432..a9e5273db5f6 100644 > > --- a/tools/include/uapi/asm-generic/mman-common.h > > +++ b/tools/include/uapi/asm-generic/mman-common.h > > @@ -78,6 +78,7 @@ > > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop lock= ed pages too */ > > > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collaps= e */ > > +#define MADV_TRY_COLLAPSE 26 /* Similar to COLLAPSE, but avo= ids direct reclaim and/or compaction */ > > > > /* compatibility flags */ > > #define MAP_FILE 0 > > -- > > 2.33.1 > >