From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1365DC4828D for ; Thu, 1 Feb 2024 18:56:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C1786B0080; Thu, 1 Feb 2024 13:56:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 571E06B0081; Thu, 1 Feb 2024 13:56:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 43A806B0085; Thu, 1 Feb 2024 13:56:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 2F2E66B0080 for ; Thu, 1 Feb 2024 13:56:35 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E8A92160EAE for ; Thu, 1 Feb 2024 18:56:34 +0000 (UTC) X-FDA: 81744141108.01.49E51FE Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf18.hostedemail.com (Postfix) with ESMTP id 24CB41C0014 for ; Thu, 1 Feb 2024 18:56:32 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="l3/Z9TO/"; spf=pass (imf18.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706813793; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=E+C9bCNlGtGcWf/MBgmL+MXY4hhS7q4FxdKWmwkKSHo=; b=qmmCwAk0KQdQ5FlEmhUOoeAy+N6cpvxAd7vd5BxQieWcWNTTy1pfaH9j50SD+ptTkqEz9v 7KG5yU8n7ONmI5vDEOMNdYQ4JqQzAQrrwGU0o+lLXh/MTp9+ylf+a21xCWh2nAt9euBWMR jAftNYANt6QEFXfUcOCVoZ79YEh3x20= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706813793; a=rsa-sha256; cv=none; b=UuaqTlmv7OQRUd7kz3zrRmRHvsNFZQLiOHH6M/QmMW4+nWAjSbF+m5Le3OFjrT1SH5GRT+ P7rGHgJPBu9H0PdZglXnPVTgfY/5QEJ53Jm8o/O3pvPZMgt1Fm9HRbexQw1JILicGhnjO7 14Jv6UWYOEBdKZF+a/X5oC9mBPo9cQc= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="l3/Z9TO/"; spf=pass (imf18.hostedemail.com: domain of shy828301@gmail.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f170.google.com with SMTP id d2e1a72fcca58-6da202aa138so816434b3a.2 for ; Thu, 01 Feb 2024 10:56:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1706813792; x=1707418592; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=E+C9bCNlGtGcWf/MBgmL+MXY4hhS7q4FxdKWmwkKSHo=; b=l3/Z9TO/u5GtYac62UQbo8BJUNAbzjl8R5wAU+cg1kw5NJwC4TcWtW8O7NyTChZBBT wiZ4p+7ee5TnzBr/5T+zkuHObM7pD0b6RibUXMCK6P+rFdY77COCc4eOgj0G++K0FMzB Z9+qABQ63v/Gs81RYEJJfgJVEt+1Pd/tqqpf9sGNZsmAByGebRtebmd3g7VE86+gkZtM gAGq1S/OHo/c0f1gLRe9k2SpR4ouwz+DRCmuSEoHME88ePtwLruTE0lg9+7MP5Ut9UBb beLTVnTxBqTQE0kBJK/dEksGGqYl9wVPy3Gf8Njgudw6ZSVP0jDZJxlmQcjskyweIHuG 8LEA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706813792; x=1707418592; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=E+C9bCNlGtGcWf/MBgmL+MXY4hhS7q4FxdKWmwkKSHo=; b=IGa5ZHmFKMz2r9g5lsCushzPSUTN/vUt7LIG7XI6hKdU3MY1Lr8Qc2Jmv7JHfMHHEw 3BfUsx4i/nME/FqpTfJOCXgry8NoMVvdiZOuh+rPlLibEgZQ2oTAaxpbjsq7p98/jo0/ /WfzljBuXH4SB3OBnSMcnrdLs6mWlnfre7pabspV7jChs+4ttWaQor4/P0sEoaUD6WPs qU9UuyPrAhjMaqXw4INl/p4uUXx2aX470ZTDx6MqcsCW5gpksww/jRxoPDoa7o5D+yuC wGaTTCPpH7ueXi4uRFj81yOpCdHsBeHADVzwidzHKrLGGbcBoB6VgVmlRBk/F2sq/x0X zQoA== X-Gm-Message-State: AOJu0Yxy10N/C7WF8txjKSF7bWyB9s3mcSiv9KhE7pZI1HOdQ0R1hP9p tzccrmCb0KYi2sXqs6+dSAnqw9QE+ixBiKdcnAx09+J9glDd8T7KQkfP5ZP7kK3/2iGReKREzvg Uj3o8nTAXtWZInBDLicJSyST3I4U= X-Google-Smtp-Source: AGHT+IF8QlL0ZYgKkKtjE5g3LDBNUoNP+rn1QWPaBnUbbOz4Lxn9ioCZrmm7vqxlQDE66E4wFpPlBjq8yzekMgr4B/c= X-Received: by 2002:a05:6a00:2da0:b0:6dd:c269:35a with SMTP id fb32-20020a056a002da000b006ddc269035amr7464pfb.27.1706813791816; Thu, 01 Feb 2024 10:56:31 -0800 (PST) MIME-Version: 1.0 References: <20240129054551.57728-1-ioworker0@gmail.com> In-Reply-To: From: Yang Shi Date: Thu, 1 Feb 2024 10:56:19 -0800 Message-ID: Subject: Re: [PATCH 1/1] mm/khugepaged: bypassing unnecessary scans with MMF_DISABLE_THP check To: Lance Yang Cc: akpm@linux-foundation.org, mhocko@suse.com, zokeefe@google.com, david@redhat.com, songmuchun@bytedance.com, peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 24CB41C0014 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: osqp3nprgynpm1o9yfhkmhbpk6ufd87j X-HE-Tag: 1706813792-238677 X-HE-Meta: U2FsdGVkX1/haEPofoLiZZGbqM2sAho7uyts4u+O4ZLNQKZXehyt4kOsNyR5ne2InOcFxcJWASjT926G3A43qdH0bpi/3YjbRh3vCVCmuXETfpynF8f0sEnGOvsNLeU66yRtorPrOOX7GCRgsE8yUzQVcP4FpwGMB2W0LjANAMpZamAFdh+zSZccG2n7tTulfhLu+LaNQQNQ3OC3U62Nh2081mnr/pPl9eKC+q4DtqZUK7xy/TlAb8Khb32cJCKdAmtBkAg+AM2WtumDSgIKrDe299bZqxmxWCqYXnUWu1qdwWKmRf+pYN/shei/35Sg8xP1F6MJxpfE1uAY9+oRcn9D6vd4aBWz1xj18C11S2+cofBSa2aVOsh7lxLw8vo8Qnxycl30UeEZ0lMTz06YCeWfCvN9bbBLo7JUR0Ts6z6jCcHG9vDBbFKsJ9yfHHGj3Gi+YMeCdMJx0deKtjp6umINRHIfA6Ts0iRO4Cbvr94DkdmsLrgXndNhIrJTs8Re3pdijFOalnw5uGnm8PfMMWEa9cd1zaSlV3NOtr54V0khsaCyxneWtAVBlDlwmBaSrwZCRvdgDVN3bnrgWtetKzHWtYA7WFpqf6znYlXmzq9BkgUbDdo2V1Bss8wUNV/cUWEXC0NM4MB/CTX3s95Ww/6Xx1+51lQ8jlJFUVpAJiLjSdMnvU85lrqobt8e3hk4I8adfCavYRDPt6ZIKSTZrRxQAKQ6MFdM7CCIt6naS4tsR9ogWkAm5xu788f5ARUk0TAbtMFS3UiaHHnpzaAexyn185T4FgYgd+3KlXuSKojiIOQ1tJYJ0QFH8DHMvZC9ZNFPg9bHcCdU7SP8fafubSkq3QPx4+ycp6uhvu31PByKCfwjAih8QwekWqARJBCmEO+J8uWKC6lcWt/MsXUe38r0jaJzvhVrC7oCsApFbRIIaq9HOdp6hzW7x9fYAhmmhrGcUNcjqauUGsz9ABN 4sLvt+Uq dIYBYTIUq7UmzsBYnZffv2igomWvXoDPspkqUXwzJfx+R8r+HHwZInjRthulOsIbDyjEz8jAKJfwFLxBciouzYQwLzpjlChorhBxeuRoDmnE8zfT6kyFPC51Gz/ltIr2MN7OIKmAIU4qVW3Kx1h13AE2tMLR13ufMwQPcx4CmtYuBBQk5F7jExbTH/JxDpRuVNFZ59OjQITt1RSfocmQHaecYhonvMiSINOm1IPoQNM/of1+UWH+iuo9kiTYVkCg4vJ35ReHL3Dg9I/d8Xji+4B8MTNhzo9z0VKWt3tCRQ44Qyl5u5SDz9z/oRY99PUGhjGOyAr0tgIofpH5mBMegcFwPBaD/xwsB39euE97dJGiFDm4J1jLE1Pd1+HfoV9/gBchS1pRcSlev93I/vXXkrt+ExjCBv3/1XSZGHJS76+tW1Leo9ORshaokAIZkYQI1Ws8uwAk334BSItjE7Dnefx5PFw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 31, 2024 at 5:13=E2=80=AFPM Lance Yang wr= ote: > > Hey Yang, > > Thank you for the clarification. > > You're correct. If the daemon calls prctl with > MMF_DISABLE_THP before fork, the child > mm won't be on the hash list. > > What I meant is that the daemon mm might > already be on the hash list before fork. > Therefore, khugepaged might still scan the > address space for the daemon. OK, I thought you don't care about the daemon since you mentioned the daemon would call prctl to disable THP or enable THP for different children, so the daemon's THP preference may be constantly changed or doesn't matter at all. So the actual cost is actually traversing the maple tree for the daemon. Does the daemon have excessive vmas? I'm not sure whether the improvement is noticeable or not. > > Thanks, > Lance > > On Thu, Feb 1, 2024 at 4:06=E2=80=AFAM Yang Shi wro= te: > > > > On Wed, Jan 31, 2024 at 1:30=E2=80=AFAM Lance Yang wrote: > > > > > > Updating the change log. > > > > > > khugepaged scans the entire address space in the > > > background for each given mm, looking for > > > opportunities to merge sequences of basic pages > > > into huge pages. However, when an mm is inserted > > > to the mm_slots list, and the MMF_DISABLE_THP > > > flag is set later, this scanning process becomes > > > unnecessary for that mm and can be skipped to > > > avoid redundant operations, especially in scenarios > > > with a large address space. > > > > > > This commit introduces a check before each scanning > > > process to test the MMF_DISABLE_THP flag for the > > > given mm; if the flag is set, the scanning process is > > > bypassed, thereby improving the efficiency of khugepaged. > > > > > > This optimization is not a correctness issue but rather an > > > enhancement to save expensive checks on each VMA > > > when userspace cannot prctl itself before spawning > > > into the new process. > > > > If this is an optimization, you'd better show some real numbers to help= justify. > > > > > > > > On some servers within our company, we deploy a > > > daemon responsible for monitoring and updating local > > > applications. Some applications prefer not to use THP, > > > so the daemon calls prctl to disable THP before fork/exec. > > > Conversely, for other applications, the daemon calls prctl > > > to enable THP before fork/exec. > > > > If your daemon calls prctl with MMF_DISABLE_THP before fork, then you > > end up having the child mm on the hash list in the first place, I > > think it should be a bug in khugepaged_fork() IIUC. khugepaged_fork() > > should check this flag and bail out if it is set. Did I miss > > something? > > > > > > > > Ideally, the daemon should invoke prctl after the fork, > > > but its current implementation follows the described > > > approach. In the Go standard library, there is no direct > > > encapsulation of the fork system call; instead, fork and > > > execve are combined into one through syscall.ForkExec. > > > > > > Thanks, > > > Lance > > > > > > On Mon, Jan 29, 2024 at 1:46=E2=80=AFPM Lance Yang wrote: > > > > > > > > khugepaged scans the entire address space in the > > > > background for each given mm, looking for > > > > opportunities to merge sequences of basic pages > > > > into huge pages. However, when an mm is inserted > > > > to the mm_slots list, and the MMF_DISABLE_THP flag > > > > is set later, this scanning process becomes > > > > unnecessary for that mm and can be skipped to avoid > > > > redundant operations, especially in scenarios with > > > > a large address space. > > > > > > > > This commit introduces a check before each scanning > > > > process to test the MMF_DISABLE_THP flag for the > > > > given mm; if the flag is set, the scanning process > > > > is bypassed, thereby improving the efficiency of > > > > khugepaged. > > > > > > > > Signed-off-by: Lance Yang > > > > --- > > > > mm/khugepaged.c | 18 ++++++++++++------ > > > > 1 file changed, 12 insertions(+), 6 deletions(-) > > > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > index 2b219acb528e..d6a700834edc 100644 > > > > --- a/mm/khugepaged.c > > > > +++ b/mm/khugepaged.c > > > > @@ -410,6 +410,12 @@ static inline int hpage_collapse_test_exit(str= uct mm_struct *mm) > > > > return atomic_read(&mm->mm_users) =3D=3D 0; > > > > } > > > > > > > > +static inline int hpage_collapse_test_exit_or_disable(struct mm_st= ruct *mm) > > > > +{ > > > > + return hpage_collapse_test_exit(mm) || > > > > + test_bit(MMF_DISABLE_THP, &mm->flags); > > > > +} > > > > + > > > > void __khugepaged_enter(struct mm_struct *mm) > > > > { > > > > struct khugepaged_mm_slot *mm_slot; > > > > @@ -1422,7 +1428,7 @@ static void collect_mm_slot(struct khugepaged= _mm_slot *mm_slot) > > > > > > > > lockdep_assert_held(&khugepaged_mm_lock); > > > > > > > > - if (hpage_collapse_test_exit(mm)) { > > > > + if (hpage_collapse_test_exit_or_disable(mm)) { > > > > /* free mm_slot */ > > > > hash_del(&slot->hash); > > > > list_del(&slot->mm_node); > > > > @@ -2360,7 +2366,7 @@ static unsigned int khugepaged_scan_mm_slot(u= nsigned int pages, int *result, > > > > goto breakouterloop_mmap_lock; > > > > > > > > progress++; > > > > - if (unlikely(hpage_collapse_test_exit(mm))) > > > > + if (unlikely(hpage_collapse_test_exit_or_disable(mm))) > > > > goto breakouterloop; > > > > > > > > vma_iter_init(&vmi, mm, khugepaged_scan.address); > > > > @@ -2368,7 +2374,7 @@ static unsigned int khugepaged_scan_mm_slot(u= nsigned int pages, int *result, > > > > unsigned long hstart, hend; > > > > > > > > cond_resched(); > > > > - if (unlikely(hpage_collapse_test_exit(mm))) { > > > > + if (unlikely(hpage_collapse_test_exit_or_disable(mm= ))) { > > > > progress++; > > > > break; > > > > } > > > > @@ -2390,7 +2396,7 @@ static unsigned int khugepaged_scan_mm_slot(u= nsigned int pages, int *result, > > > > bool mmap_locked =3D true; > > > > > > > > cond_resched(); > > > > - if (unlikely(hpage_collapse_test_exit(mm))) > > > > + if (unlikely(hpage_collapse_test_exit_or_di= sable(mm))) > > > > goto breakouterloop; > > > > > > > > VM_BUG_ON(khugepaged_scan.address < hstart = || > > > > @@ -2408,7 +2414,7 @@ static unsigned int khugepaged_scan_mm_slot(u= nsigned int pages, int *result, > > > > fput(file); > > > > if (*result =3D=3D SCAN_PTE_MAPPED_= HUGEPAGE) { > > > > mmap_read_lock(mm); > > > > - if (hpage_collapse_test_exi= t(mm)) > > > > + if (hpage_collapse_test_exi= t_or_disable(mm)) > > > > goto breakouterloop= ; > > > > *result =3D collapse_pte_ma= pped_thp(mm, > > > > khugepaged_scan.add= ress, false); > > > > @@ -2450,7 +2456,7 @@ static unsigned int khugepaged_scan_mm_slot(u= nsigned int pages, int *result, > > > > * Release the current mm_slot if this mm is about to die, = or > > > > * if we scanned all vmas of this mm. > > > > */ > > > > - if (hpage_collapse_test_exit(mm) || !vma) { > > > > + if (hpage_collapse_test_exit_or_disable(mm) || !vma) { > > > > /* > > > > * Make sure that if mm_users is reaching zero whil= e > > > > * khugepaged runs here, khugepaged_exit will find > > > > -- > > > > 2.33.1 > > > >