From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A169DC47422 for ; Thu, 18 Jan 2024 13:28:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B78B6B00A6; Thu, 18 Jan 2024 08:28:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 370C66B00A7; Thu, 18 Jan 2024 08:28:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E11D6B00A8; Thu, 18 Jan 2024 08:28:54 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 05F0B6B00A6 for ; Thu, 18 Jan 2024 08:28:54 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id D0994160CC8 for ; Thu, 18 Jan 2024 13:28:53 +0000 (UTC) X-FDA: 81692512146.08.6D511D1 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf07.hostedemail.com (Postfix) with ESMTP id 8AC504000F for ; Thu, 18 Jan 2024 13:28:50 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="WllbTb/g"; dkim=pass header.d=suse.com header.s=susede1 header.b="WllbTb/g"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf07.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1705584531; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mT0oCbOsWqWG/op5DtygqpA+fS8OzRrMRXt39lRSlxk=; b=Wi9lUFjvWXL6pS9WHzLzFKQDQYehMGymf2rwtQIVNj1FW+I+/NoVwXZDSCdPu9+neZMDV8 Aj0utrHDXRm+PHe14z9xrolH8TfvGddzeU9Z2J9a/jfYOrYyQjQNZWZNVJHCm1VoAxpDnG s2IyVSfJzwzYSJHBBwW9WiEpWuziBxg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b="WllbTb/g"; dkim=pass header.d=suse.com header.s=susede1 header.b="WllbTb/g"; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf07.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1705584531; a=rsa-sha256; cv=none; b=gx4yCFKy7zoJpMc4CQ9005NUjzPVBjUuUpNAbvDy5BJFHzBx1zIIPGjW9/dPtScQ1PbhSR ce4BhyLfuxGtJwzBjVKus5b+Flx524ETl/51ECs44XV7DSm16QOQG4/3q3VbyCPj3szKMY QtzKxpZU23MyNR5fxatytmj16KoVR88= Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [IPv6:2a07:de40:b281:104:10:150:64:97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 82B2B1F78B; Thu, 18 Jan 2024 13:28:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1705584528; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mT0oCbOsWqWG/op5DtygqpA+fS8OzRrMRXt39lRSlxk=; b=WllbTb/gZqRBnRK0f/hUPvIp6OEtbyuw2Th3VEWrA0tQp0m7kHT8rGYMh57HAyXqxJDzGZ M3+E+kay+xCTPv32vJMZgPEY2WaE0Oq65zvQ8k3jDJm9LLp2SsEWEhM5jy0+clNbQDAoPD 6jXknggmKRk+Imiq7dAoNWkonEa+Mhg= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1705584528; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=mT0oCbOsWqWG/op5DtygqpA+fS8OzRrMRXt39lRSlxk=; b=WllbTb/gZqRBnRK0f/hUPvIp6OEtbyuw2Th3VEWrA0tQp0m7kHT8rGYMh57HAyXqxJDzGZ M3+E+kay+xCTPv32vJMZgPEY2WaE0Oq65zvQ8k3jDJm9LLp2SsEWEhM5jy0+clNbQDAoPD 6jXknggmKRk+Imiq7dAoNWkonEa+Mhg= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 61DA913874; Thu, 18 Jan 2024 13:28:48 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id zwipFJAnqWXmBgAAD6G6ig (envelope-from ); Thu, 18 Jan 2024 13:28:48 +0000 Date: Thu, 18 Jan 2024 14:28:43 +0100 From: Michal Hocko To: Lance Yang Cc: akpm@linux-foundation.org, zokeefe@google.com, david@redhat.com, songmuchun@bytedance.com, shy828301@gmail.com, peterx@redhat.com, mknyszek@google.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org Subject: Re: [PATCH v2 1/1] mm/madvise: add MADV_F_COLLAPSE_LIGHT to process_madvise() Message-ID: References: <20240118120347.61817-1-ioworker0@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20240118120347.61817-1-ioworker0@gmail.com> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 8AC504000F X-Stat-Signature: j17netn9qaq77w1k7n8wdsdqt9izio8x X-Rspam-User: X-HE-Tag: 1705584530-167796 X-HE-Meta: U2FsdGVkX19q1jCdqks4ZAY55+bBeHxSgSqctkJQ/NFej//KuaioSZtgbpMwFM5LXiJMUj6v3YhomIZ1lwVGhdGwZEpa7XjETO1w7bCMS+yWOMutRmcFmDSGpXKPmVsYQjRTU2eDs915c5iPpvQ8MPozxwA2Jg5c8ovBtr7S0dQzZz72XJr5pFw+sEHOFhhTDCleT0Ph0KorhowuqESJ6B4+Pu/pXIRVlSam7kwm5B44yufoeNTY11xKBbtryJjXHK7EbfljbK983Q+/R5dTKxvEruR9UbEdVyAiWOsZyJif9oJHuQE+zftjM3k33lJ/FIC+aWEZkyPcSZ6woM2Xo+aqB+vfqoRaAJU4yX/YXxKyjoTJ3F7/vKmAkL7HVBfnpKvu+/X5Fdg+KtCtSSStJk6RR5Ug+KZjFSjJShUtpTcRFWQKnQdZGQ3b4WGZx4ooXmYUtrhLy5Al6J5+4nvm4rgwg716kmP1E9hyy6gTzDGeJthNm5gJkoATaHEu7yzowSUrz+owNG0jwdx4O4dfhw9dMI1MsxGCZu0/oWdP1AnQFulNCn6LwrU6NypSHFUpgGnRM1YqYL1JEk0T77QrXFUj+xG6ey/4hgvbtug36cw+8O4uhCFnFl0qHjQDb7BIuFhkKtLWjoKgf3W1sKypxLv3tCOjMxligF21eQzzqAdeZJgUV5mUN8/rOR9RBP37GNK8vYMzzrjrsdUBh+8EwKkX5NHkhyHAyNSG+MusQ9Tcn+3W0OKKqbbnts82nrBZOuGsaayXhrGnHh/dsNAzS0tdwD12su6bhqmC9M9vDujiFsOd3T+OR5kZFalDd6OiepA1i0Zw4uS8CRvG3GRXjYIlNpzglIeshkoVBtjHcL/p7LDhzKf0BCJKIoGqvwehOmgNW4tmDVzD5y4UIK6EJNfsKFSFYcg59kLP31ciMvt+9RE+vkNwGAsURyEMSCaHed3wDSiZSWxyLqFzo70 Ce9vs2eV h2p5uWxm6Dobb5lo7oY1ZyGe64bTQ+ZsVBfm5Hnt3zcDJCN8X2l3840G2sOeuC1DPlsrqfiKZCb61723uoFk4bDBISVPqyXFF39L9OKdSikmFFP1q23pDR1qMdKfq/3bKYtuFI5WA2yFPIlrYmeUdTSQ6DO3GC+x0fTz7KvB0a1enw6wBY85Kr8QYFmWfv/QvMc3teSQAT0bJaPRuCBQcBvRbOuh/z9sV9KT+cZJ24YDsiCkyvJNe3hLnL1FLqrK35KMFjKFVgljvP5oyaCXJeG76AlwdtNFZATGOeWS3s3SOdWjb4KQNam45rlC/nCH9xmdpaldMPkZFwDuDBQwBCt1bxgXrtFr/ah72CFF3A+3Z3q6jO+r6X+7doX/w2gc/WaLSnuaquDvEu4zYA8xEL5oG9vqV1CgXqFIu X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: [CC linux-api] On Thu 18-01-24 20:03:46, Lance Yang wrote: > This idea was inspired by MADV_COLLAPSE introduced by Zach O'Keefe[1]. > > Allow MADV_F_COLLAPSE_LIGHT behavior for process_madvise(2) if the caller > has CAP_SYS_ADMIN or is requesting the collapse of its own memory. > > The semantics of MADV_F_COLLAPSE_LIGHT are similar to MADV_COLLAPSE, but > it avoids direct reclaim and/or compaction, quickly failing on allocation > errors. > > This change enables a more flexible and efficient usage of memory collapse > operations, providing additional control to userspace applications for > system-wide THP optimization. > > Semantics > > This call is independent of the system-wide THP sysfs settings, but will > fail for memory marked VM_NOHUGEPAGE. If the ranges provided span > multiple VMAs, the semantics of the collapse over each VMA is independent > from the others. This implies a hugepage cannot cross a VMA boundary. If > collapse of a given hugepage-aligned/sized region fails, the operation may > continue to attempt collapsing the remainder of memory specified. > > The memory ranges provided must be page-aligned, but are not required to > be hugepage-aligned. If the memory ranges are not hugepage-aligned, the > start/end of the range will be clamped to the first/last hugepage-aligned > address covered by said range. The memory ranges must span at least one > hugepage-sized region. > > All non-resident pages covered by the range will first be > swapped/faulted-in, before being internally copied onto a freshly > allocated hugepage. Unmapped pages will have their data directly > initialized to 0 in the new hugepage. However, for every eligible > hugepage aligned/sized region to-be collapsed, at least one page must > currently be backed by memory (a PMD covering the address range must > already exist). > > Allocation for the new hugepage will not enter direct reclaim and/or > compaction, quickly failing if allocation fails. When the system has > multiple NUMA nodes, the hugepage will be allocated from the node providing > the most native pages. This operation operates on the current state of the > specified process and makes no persistent changes or guarantees on how pages > will be mapped, constructed, or faulted in the future. > > Use Cases > > An immediate user of this new functionality is the Go runtime heap allocator > that manages memory in hugepage-sized chunks. In the past, whether it was a > newly allocated chunk through mmap() or a reused chunk released by > madvise(MADV_DONTNEED), the allocator attempted to eagerly back memory with > huge pages using madvise(MADV_HUGEPAGE)[2] and madvise(MADV_COLLAPSE)[3] > respectively. However, both approaches resulted in performance issues; for > both scenarios, there could be entries into direct reclaim and/or compaction, > leading to unpredictable stalls[4]. Now, the allocator can confidently use > process_madvise(MADV_F_COLLAPSE_LIGHT) to attempt the allocation of huge pages. > > [1] https://github.com/torvalds/linux/commit/7d8faaf155454f8798ec56404faca29a82689c77 > [2] https://github.com/golang/go/commit/8fa9e3beee8b0e6baa7333740996181268b60a3a > [3] https://github.com/golang/go/commit/9f9bb26880388c5bead158e9eca3be4b3a9bd2af > [4] https://github.com/golang/go/issues/63334 > > [v1] https://lore.kernel.org/lkml/20240117050217.43610-1-ioworker0@gmail.com/ > > Signed-off-by: Lance Yang > Suggested-by: Zach O'Keefe > Suggested-by: David Hildenbrand > --- > V1 -> V2: Treat process_madvise(MADV_F_COLLAPSE_LIGHT) as the lighter-weight alternative > to madvise(MADV_COLLAPSE) > > arch/alpha/include/uapi/asm/mman.h | 1 + > arch/mips/include/uapi/asm/mman.h | 1 + > arch/parisc/include/uapi/asm/mman.h | 1 + > arch/xtensa/include/uapi/asm/mman.h | 1 + > include/linux/huge_mm.h | 5 +-- > include/uapi/asm-generic/mman-common.h | 1 + > mm/khugepaged.c | 15 ++++++-- > mm/madvise.c | 36 +++++++++++++++++--- > tools/include/uapi/asm-generic/mman-common.h | 1 + > 9 files changed, 52 insertions(+), 10 deletions(-) > > diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h > index 763929e814e9..22f23ca04f1a 100644 > --- a/arch/alpha/include/uapi/asm/mman.h > +++ b/arch/alpha/include/uapi/asm/mman.h > @@ -77,6 +77,7 @@ > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ > +#define MADV_F_COLLAPSE_LIGHT 26 /* Similar to COLLAPSE, but avoids direct reclaim and/or compaction */ > > /* compatibility flags */ > #define MAP_FILE 0 > diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h > index c6e1fc77c996..acec0b643e9c 100644 > --- a/arch/mips/include/uapi/asm/mman.h > +++ b/arch/mips/include/uapi/asm/mman.h > @@ -104,6 +104,7 @@ > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ > +#define MADV_F_COLLAPSE_LIGHT 26 /* Similar to COLLAPSE, but avoids direct reclaim and/or compaction */ > > /* compatibility flags */ > #define MAP_FILE 0 > diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h > index 68c44f99bc93..812029c98cd7 100644 > --- a/arch/parisc/include/uapi/asm/mman.h > +++ b/arch/parisc/include/uapi/asm/mman.h > @@ -71,6 +71,7 @@ > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ > +#define MADV_F_COLLAPSE_LIGHT 26 /* Similar to COLLAPSE, but avoids direct reclaim and/or compaction */ > > #define MADV_HWPOISON 100 /* poison a page for testing */ > #define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */ > diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h > index 1ff0c858544f..52ef463dd5b6 100644 > --- a/arch/xtensa/include/uapi/asm/mman.h > +++ b/arch/xtensa/include/uapi/asm/mman.h > @@ -112,6 +112,7 @@ > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ > +#define MADV_F_COLLAPSE_LIGHT 26 /* Similar to COLLAPSE, but avoids direct reclaim and/or compaction */ > > /* compatibility flags */ > #define MAP_FILE 0 > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h > index 5adb86af35fc..075fdb5d481a 100644 > --- a/include/linux/huge_mm.h > +++ b/include/linux/huge_mm.h > @@ -303,7 +303,7 @@ int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, > int advice); > int madvise_collapse(struct vm_area_struct *vma, > struct vm_area_struct **prev, > - unsigned long start, unsigned long end); > + unsigned long start, unsigned long end, int behavior); > void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, > unsigned long end, long adjust_next); > spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); > @@ -450,7 +450,8 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, > > static inline int madvise_collapse(struct vm_area_struct *vma, > struct vm_area_struct **prev, > - unsigned long start, unsigned long end) > + unsigned long start, unsigned long end, > + int behavior) > { > return -EINVAL; > } > diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h > index 6ce1f1ceb432..92c67bc755da 100644 > --- a/include/uapi/asm-generic/mman-common.h > +++ b/include/uapi/asm-generic/mman-common.h > @@ -78,6 +78,7 @@ > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ > +#define MADV_F_COLLAPSE_LIGHT 26 /* Similar to COLLAPSE, but avoids direct reclaim and/or compaction */ > > /* compatibility flags */ > #define MAP_FILE 0 > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 2b219acb528e..2840051c0ae2 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -97,6 +97,8 @@ static struct kmem_cache *mm_slot_cache __ro_after_init; > struct collapse_control { > bool is_khugepaged; > > + int behavior; > + > /* Num pages scanned per node */ > u32 node_load[MAX_NUMNODES]; > > @@ -1058,10 +1060,16 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, > static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, > struct collapse_control *cc) > { > - gfp_t gfp = (cc->is_khugepaged ? alloc_hugepage_khugepaged_gfpmask() : > - GFP_TRANSHUGE); > int node = hpage_collapse_find_target_node(cc); > struct folio *folio; > + gfp_t gfp; > + > + if (cc->is_khugepaged) > + gfp = alloc_hugepage_khugepaged_gfpmask(); > + else > + gfp = (cc->behavior == MADV_F_COLLAPSE_LIGHT ? > + GFP_TRANSHUGE_LIGHT : > + GFP_TRANSHUGE); > > if (!hpage_collapse_alloc_folio(&folio, gfp, node, &cc->alloc_nmask)) { > *hpage = NULL; > @@ -2697,7 +2705,7 @@ static int madvise_collapse_errno(enum scan_result r) > } > > int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, > - unsigned long start, unsigned long end) > + unsigned long start, unsigned long end, int behavior) > { > struct collapse_control *cc; > struct mm_struct *mm = vma->vm_mm; > @@ -2718,6 +2726,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, > if (!cc) > return -ENOMEM; > cc->is_khugepaged = false; > + cc->behavior = behavior; > > mmgrab(mm); > lru_add_drain_all(); > diff --git a/mm/madvise.c b/mm/madvise.c > index 912155a94ed5..9c40226505aa 100644 > --- a/mm/madvise.c > +++ b/mm/madvise.c > @@ -60,6 +60,7 @@ static int madvise_need_mmap_write(int behavior) > case MADV_POPULATE_READ: > case MADV_POPULATE_WRITE: > case MADV_COLLAPSE: > + case MADV_F_COLLAPSE_LIGHT: > return 0; > default: > /* be safe, default to 1. list exceptions explicitly */ > @@ -1082,8 +1083,9 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, > if (error) > goto out; > break; > + case MADV_F_COLLAPSE_LIGHT: > case MADV_COLLAPSE: > - return madvise_collapse(vma, prev, start, end); > + return madvise_collapse(vma, prev, start, end, behavior); > } > > anon_name = anon_vma_name(vma); > @@ -1178,6 +1180,7 @@ madvise_behavior_valid(int behavior) > case MADV_HUGEPAGE: > case MADV_NOHUGEPAGE: > case MADV_COLLAPSE: > + case MADV_F_COLLAPSE_LIGHT: > #endif > case MADV_DONTDUMP: > case MADV_DODUMP: > @@ -1194,6 +1197,17 @@ madvise_behavior_valid(int behavior) > } > } > > + > +static bool process_madvise_behavior_only(int behavior) > +{ > + switch (behavior) { > + case MADV_F_COLLAPSE_LIGHT: > + return true; > + default: > + return false; > + } > +} > + > static bool process_madvise_behavior_valid(int behavior) > { > switch (behavior) { > @@ -1201,6 +1215,7 @@ static bool process_madvise_behavior_valid(int behavior) > case MADV_PAGEOUT: > case MADV_WILLNEED: > case MADV_COLLAPSE: > + case MADV_F_COLLAPSE_LIGHT: > return true; > default: > return false; > @@ -1368,6 +1383,8 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > * transparent huge pages so the existing pages will not be > * coalesced into THP and new pages will not be allocated as THP. > * MADV_COLLAPSE - synchronously coalesce pages into new THP. > + * MADV_F_COLLAPSE_LIGHT - only for process_madvise, avoids direct reclaim and/or > + * compaction. > * MADV_DONTDUMP - the application wants to prevent pages in the given range > * from being included in its core dump. > * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. > @@ -1394,7 +1411,8 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, > * -EBADF - map exists, but area maps something that isn't a file. > * -EAGAIN - a kernel resource was temporarily unavailable. > */ > -int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) > +int _do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, > + int behavior, bool is_process_madvise) > { > unsigned long end; > int error; > @@ -1405,6 +1423,9 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh > if (!madvise_behavior_valid(behavior)) > return -EINVAL; > > + if (!is_process_madvise && process_madvise_behavior_only(behavior)) > + return -EINVAL; > + > if (!PAGE_ALIGNED(start)) > return -EINVAL; > len = PAGE_ALIGN(len_in); > @@ -1448,9 +1469,14 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh > return error; > } > > +int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) > +{ > + return _do_madvise(mm, start, len_in, behavior, false); > +} > + > SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) > { > - return do_madvise(current->mm, start, len_in, behavior); > + return _do_madvise(current->mm, start, len_in, behavior, false); > } > > SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > @@ -1504,8 +1530,8 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, > total_len = iov_iter_count(&iter); > > while (iov_iter_count(&iter)) { > - ret = do_madvise(mm, (unsigned long)iter_iov_addr(&iter), > - iter_iov_len(&iter), behavior); > + ret = _do_madvise(mm, (unsigned long)iter_iov_addr(&iter), > + iter_iov_len(&iter), behavior, true); > if (ret < 0) > break; > iov_iter_advance(&iter, iter_iov_len(&iter)); > diff --git a/tools/include/uapi/asm-generic/mman-common.h b/tools/include/uapi/asm-generic/mman-common.h > index 6ce1f1ceb432..92c67bc755da 100644 > --- a/tools/include/uapi/asm-generic/mman-common.h > +++ b/tools/include/uapi/asm-generic/mman-common.h > @@ -78,6 +78,7 @@ > #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ > > #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ > +#define MADV_F_COLLAPSE_LIGHT 26 /* Similar to COLLAPSE, but avoids direct reclaim and/or compaction */ > > /* compatibility flags */ > #define MAP_FILE 0 > -- > 2.33.1 -- Michal Hocko SUSE Labs