From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA632C54E65 for ; Tue, 20 May 2025 14:57:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C1436B009F; Tue, 20 May 2025 10:57:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 499676B00A0; Tue, 20 May 2025 10:57:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3D60D6B00A1; Tue, 20 May 2025 10:57:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1F7B26B009F for ; Tue, 20 May 2025 10:57:41 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id C9364BA926 for ; Tue, 20 May 2025 14:57:40 +0000 (UTC) X-FDA: 83463590280.07.0DE5CBE Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf19.hostedemail.com (Postfix) with ESMTP id C1C791A0008 for ; Tue, 20 May 2025 14:57:38 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=X6iKVLwV; spf=pass (imf19.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747753058; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IajgCekFJONzao3c/Mri5HJ7aoxC+foi/7AqMFd+OWU=; b=wcotP7iZAz/Y9KOOPI0/KPUXLdjso5oJYjDJxwh1ZDBQhnPJvHtpcxEiE780XaosBxdUoW mxYZc7zsho1lTmUpcMm6gz+bk3MVKgvFomlHbLID0+GsTqQ8L1hjjQ6n6UA+zYb3vHlyc2 xx0GXzt/Bh11W/doQUwYyVrPmAXgy28= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747753058; a=rsa-sha256; cv=none; b=VRNEYLPFl+L4PiaO/3pPg9vFGjba/MiBwu0JQioqvLTaRhN/Acurg9CUvuiAqmj+665+r0 Rf4p9ySpFAUXisoMkTLjnDIl/zmax7zRCeMTH4fuTIDECN5QT0Ik8MSK6xB+eErTxTe9Zq Q793DOt1kUs+SqafqWHHVCMDt2Bb4kk= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=X6iKVLwV; spf=pass (imf19.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-ad56cbc7b07so336018666b.0 for ; Tue, 20 May 2025 07:57:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747753057; x=1748357857; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=IajgCekFJONzao3c/Mri5HJ7aoxC+foi/7AqMFd+OWU=; b=X6iKVLwVZ+Uz5RKJG48Up1oXy/xZqrB4eoUUxjaV5b6eqcbORy0SXLZv66qf0T5rNK dWA2Zl2JrpRB9tGr5iaLg3yQws7Tt6wyUF7DWAXFfwugCFxSbuUvJmGM/5Z0xC69CTz6 SBfbQCEuWQ+0i4PLrr8apfgvl2Y2ChahP6iu+Rrz4ybaeBQliiqO0XUEmcfwuy535Cre uHg0CQiDhyj06/SknzocZ3QTpFzMZpe4FnBB5uIeEiFLVE7RZoYtR53adaXHuVy+4701 ms61IbtINKBn8v4gILkfZZl5fPsNMKciFXUpms3h4sXZiVRVz4tlfdQsrvciDupBhy+z NsFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747753057; x=1748357857; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IajgCekFJONzao3c/Mri5HJ7aoxC+foi/7AqMFd+OWU=; b=PvF+Vt0IjbFPzlm/gO96ow9CmSxnVnnRHERSg6382L+WqM6rnNPcYe3oX9dYpsS6us xP1gtgmq+31UHEujTPS2jcZGvbvVVTECC1x+UOKxDNg2xf4ytciRw6jYAQCUgwH3BXGo 89+YKZZ2u6Busp4tzwXUPhxzzTy7G5LqKT9zCDdix0PhZPNg6YQRkEfHh6U6mJVXHYwi LSQvM7o5xuucCOlyJoj6MB5isJGwcWwg5lgyZ9mkK/e9IGjGXo9VYWRQ+0xRyvwF53cE m+PULIIcLFAcTpLVAAkeza2WH0d3FMPR8kqzFReW5gkIMqpdnp89FRh8mKVoCfZgdxCb DaTA== X-Forwarded-Encrypted: i=1; AJvYcCUh4Ly6L9DFd90GaGPeqF5DNo9C4Fucd626TxuURiLSdf4+PG6s/yIAYlN1Au+0pQUc6+vA5RQ8iA==@kvack.org X-Gm-Message-State: AOJu0Yzuyd4JOlCqbl9VRSoSAmS26YaYXLtGHMLgucd3CwoPqXetkx6P OuNN0NPMDKg5J6fAArGuHSVxn9zW756W0Ta+VXkTlh4LaBmkN0t3cEHP X-Gm-Gg: ASbGncvMiMJJ99E6Kv5gwKL8pZWLmuYz8Y9x99MOjhwyf+n4GiJTDaq16AGat2A1QZf AKTTR9LoKuGcu3Qgc+3xAmaLAUxOOVYvjPZC18ffYsjmxXF0jW6ggZWIWYZTBvyA3WhqEtUKRsa Bte14AXPoNyInH7P+V2eHKfuFIwsgt17uSkSGiTC6PJMS86J1wqRfoNmGycLo68F8QX8C0VGZvb Xokwgjh9KFhB4QYxcrvbYbARXj5qX/doWujV5o4r2VjJNNee3RStaeNO1S9ProzU+hWIzj9TapA 31rq3r07ZpsELe4EQmmtFh+ZETllrW3GESyYJtC4dnf1Ag5f8Helv7URAh8bh1blkd1orrvJdL/ xl04qtuXXIZleDpBy4BfBCTDc X-Google-Smtp-Source: AGHT+IFUKv/+NdaIlzfh/ttE9lZCb6A245ofH9pugxyKdQVM+d5CP7HrC+FNTyOKq0BC3a57Lx+N+w== X-Received: by 2002:a17:907:7294:b0:ad5:6337:ba42 with SMTP id a640c23a62f3a-ad56337c96dmr941030966b.46.1747753057000; Tue, 20 May 2025 07:57:37 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:1c0a:f3ac:4087:51c8? ([2620:10d:c092:500::7:66a9]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-ad52d438c3csm735693066b.88.2025.05.20.07.57.36 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 20 May 2025 07:57:36 -0700 (PDT) Message-ID: Date: Tue, 20 May 2025 15:57:35 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v3 1/7] mm: khugepaged: extract vm flag setting outside of hugepage_madvise To: Lorenzo Stoakes Cc: Andrew Morton , david@redhat.com, linux-mm@kvack.org, hannes@cmpxchg.org, shakeel.butt@linux.dev, riel@surriel.com, ziy@nvidia.com, laoar.shao@gmail.com, baolin.wang@linux.alibaba.com, Liam.Howlett@oracle.com, npache@redhat.com, ryan.roberts@arm.com, vbabka@suse.cz, jannh@google.com, Arnd Bergmann , linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com References: <20250519223307.3601786-1-usamaarif642@gmail.com> <20250519223307.3601786-2-usamaarif642@gmail.com> <18be636f-d6a7-4270-b324-22750b3a358c@lucifer.local> Content-Language: en-US From: Usama Arif In-Reply-To: <18be636f-d6a7-4270-b324-22750b3a358c@lucifer.local> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: ua9a5pih1yud77ayrinjxosnzscgczwz X-Rspamd-Queue-Id: C1C791A0008 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1747753058-703745 X-HE-Meta: U2FsdGVkX18+8JBQokYix5XBMM9jIiJBWc/xdrN1GWVS6n5exLL5hfMU41L1OiYiOvZPO/M6gFdvEqN4VD4upXdNXtH7irQFpSqfv3P1i4vkTubGHLIzkjEas68FzfE0K6TV+3UxRp+mEelT9KhChTS2VJLTVh83G0/xPAXQURIIUWsJglBqzupVeyQdWrorvThe7VVs/2CnjNrhdlM7ZG2JY4447SlXZ530Hi4GVpNedYCYltre++oRvOECxZvXq1t+IEOiKI1fHJmyJ7n1aM9G+EM1j9/NRvPNwLkiB5+BaaRLdFtUZ7ksiFmjEDmzfcwqmGsfrDxkmGdTEvSUu+5ima2BSwMUPIPKC2JSpJtPMYU+ywdjkqF80LkqyO3dgetqJK5jfSa2Sw9SVONXGdtfz30FPnKICR/wL5zzIwvFXbWBY6NcgUxopw34+OORweLVImJDhAcWECejYCTqBHaGRZhh+bk+CS3MKfK+IfA9DXQ1UmV/3kOPBF+YrtpbNEgqszh4vfheSEWIgPxvkHIb47feGZQ8LQw4WDy4Muf+ZyHDcpaGqoXMz5ajH39QffL2rbQoG11sjxLlyeSXvPYzFuG3ZOUudUEvBtY4ocGynl0CFTOccIa5WqLrf12zsYy7BT+7/1TPYfUGLOlOcXOmHhibn7TrJMdMUVa/fDJPZxm6AgoPxPvxYi3iKlbFl+WX71OchdPhEe+51lmnpj8UaWyj0Zq4ycT+xKwGrNoBBoHrGEik2P5yN4kxTj/cksFVvWHWufgijUjcbpDDjfdzRGVlT/06RxAVYT3fWl4F7cQ20LoBg6XVgiGTx4pvoLIbjCja91qbeWzr0197u19tpTpWTVt8Ow/CLoH/41E/BjXnjMcSkCNEbfFXEcNvk5xuZfhPZqv7WxNgxpSHwezLUN1I3FxgTT1i4ZNdmpu0Rr5lbFokRqFsrEc/EarbhP7o3hLQ7hVNqtxN/st ki90JWPV 3e9kTHM/5rBPuwno4QdJnnRAeXMZZUwAqyUKzZ0C+0SIUx6b+XKWC0zbkdWVBGPG6L3l32iJyTk5qXZABsIzabWun/gPss9UehUg/pUQ1Jfu3D0sDYgSVkMmWg0uH0eX3MS//VnNO1Mpr8u3uOlihUMeeDtmwDyvkA8uTSsifWWVFu7rZ0fDZTikMoCAhTV2GgYyTekvZgNq/DgNnvb4/2l62uPMmoNTyPcZbYE6rEWypRAcp1kx4U+qYTvMsm7wZgT8Z1JjlBDZUVXVITHHqabhQAouYyKLLxn+lfGHfvBimmB2ADcwuFDi18+GN3Sy8ixtGrF72YNqwdogsN4OL6NjWX/Tyuo4LN3/ytOe0FgIPQIi6Y1ovl5vtPZNnStM9wJwAsloAKeXXMTMlzT4MdCIlgsBvjeyt135P2Zgnpxe4pDUpEFcnwjMbn+JzyrJm1xSGrKu4jKU33p+e9TBmgp3sOkljDKyQvRSX1Qu8+Gx5DBIjAO4Ovw6ybgr2RvhVJySrb/jH1xUbA++ZY1fIrJD+Bha/3XUTUB5NjtgQHsmStICePFXZL9i8AlN36Po0pNiW X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 20/05/2025 15:43, Lorenzo Stoakes wrote: > This commit message is really poor. You're also not mentioning that you're > changing s390 behaviour? > > On Mon, May 19, 2025 at 11:29:53PM +0100, Usama Arif wrote: >> This is so that flag setting can be resused later in other functions, > > Typo. > >> to reduce code duplication (including the s390 exception). >> >> No functional change intended with this patch. > > I'm pretty sure somebody reviewed that this should just be merged with whatever > uses this? I'm not sure this is all that valuable as you're not really changing > this structurally very much. > Please see patch 2 where hugepage_set_vmflags is reused. I was just trying to follow your feedback from previous revision that the flag setting and s390 code part is duplicate code and should be common in the prctl and madvise function. I realize I messed up the arg not having vma and the order of the if statement. >> >> Signed-off-by: Usama Arif > > Yeah I'm not a fan of this patch, it's buggy and really unclear what the > purpose is here. No functional change was intended (I realized the order below broke it but can be fixed). In the previous revision it was: + case PR_SET_THP_POLICY: + if (arg3 || arg4 || arg5) + return -EINVAL; + if (mmap_write_lock_killable(me->mm)) + return -EINTR; + switch (arg2) { + case PR_DEFAULT_MADV_HUGEPAGE: + if (!hugepage_global_enabled()) + error = -EPERM; +#ifdef CONFIG_S390 + /* + * qemu blindly sets MADV_HUGEPAGE on all allocations, but s390 + * can't handle this properly after s390_enable_sie, so we simply + * ignore the madvise to prevent qemu from causing a SIGSEGV. + */ + else if (mm_has_pgste(vma->vm_mm)) + error = -EPERM; +#endif + else { + me->mm->def_flags &= ~VM_NOHUGEPAGE; + me->mm->def_flags |= VM_HUGEPAGE; + process_default_madv_hugepage(me->mm, MADV_HUGEPAGE); + } + break; ... Now with this hugepage_set_vmflags, it would be + case PR_SET_THP_POLICY: + if (arg3 || arg4 || arg5) + return -EINVAL; + if (mmap_write_lock_killable(mm)) + return -EINTR; + switch (arg2) { + case PR_DEFAULT_MADV_HUGEPAGE: + if (!hugepage_global_enabled()) + error = -EPERM; + error = hugepage_set_vmflags(&mm->def_flags, MADV_HUGEPAGE); + if (!error) + process_default_madv_hugepage(mm, MADV_HUGEPAGE); + break; I am happy to go with either of the methods above, but was just trying to incorporate your feedback :) Would you like the method from previous version? > >> --- >> include/linux/huge_mm.h | 1 + >> mm/khugepaged.c | 26 +++++++++++++++++--------- >> 2 files changed, 18 insertions(+), 9 deletions(-) >> >> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h >> index 2f190c90192d..23580a43787c 100644 >> --- a/include/linux/huge_mm.h >> +++ b/include/linux/huge_mm.h >> @@ -431,6 +431,7 @@ change_huge_pud(struct mmu_gather *tlb, struct vm_area_struct *vma, >> __split_huge_pud(__vma, __pud, __address); \ >> } while (0) >> >> +int hugepage_set_vmflags(unsigned long *vm_flags, int advice); >> int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, >> int advice); >> int madvise_collapse(struct vm_area_struct *vma, >> diff --git a/mm/khugepaged.c b/mm/khugepaged.c >> index b04b6a770afe..ab3427c87422 100644 >> --- a/mm/khugepaged.c >> +++ b/mm/khugepaged.c >> @@ -346,8 +346,7 @@ struct attribute_group khugepaged_attr_group = { >> }; >> #endif /* CONFIG_SYSFS */ >> >> -int hugepage_madvise(struct vm_area_struct *vma, >> - unsigned long *vm_flags, int advice) >> +int hugepage_set_vmflags(unsigned long *vm_flags, int advice) > > >> { >> switch (advice) { >> case MADV_HUGEPAGE: >> @@ -358,16 +357,10 @@ int hugepage_madvise(struct vm_area_struct *vma, >> * ignore the madvise to prevent qemu from causing a SIGSEGV. >> */ >> if (mm_has_pgste(vma->vm_mm)) > > This is broken, you refer to vma which doesn't exist. > > As the kernel bots are telling you... > >> - return 0; >> + return -EPERM; > > Why are you now returning an error? > > This seems like a super broken way of making the caller return 0. Just make this > whole thing a bool return if you're going to treat it like a boolean function. > >> #endif >> *vm_flags &= ~VM_NOHUGEPAGE; >> *vm_flags |= VM_HUGEPAGE; >> - /* >> - * If the vma become good for khugepaged to scan, >> - * register it here without waiting a page fault that >> - * may not happen any time soon. >> - */ >> - khugepaged_enter_vma(vma, *vm_flags); >> break; >> case MADV_NOHUGEPAGE: >> *vm_flags &= ~VM_HUGEPAGE; >> @@ -383,6 +376,21 @@ int hugepage_madvise(struct vm_area_struct *vma, >> return 0; >> } >> >> +int hugepage_madvise(struct vm_area_struct *vma, >> + unsigned long *vm_flags, int advice) >> +{ >> + if (advice == MADV_HUGEPAGE && !hugepage_set_vmflags(vm_flags, advice)) { > > So now you've completely broken MADV_NOHUGEPAGE haven't you? > Yeah order needs to be reversed. >> + /* >> + * If the vma become good for khugepaged to scan, >> + * register it here without waiting a page fault that >> + * may not happen any time soon. >> + */ >> + khugepaged_enter_vma(vma, *vm_flags); >> + } >> + >> + return 0; >> +} >> + >> int __init khugepaged_init(void) >> { >> mm_slot_cache = KMEM_CACHE(khugepaged_mm_slot, 0); >> -- >> 2.47.1 >>