From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5244C369AB for ; Tue, 15 Apr 2025 11:47:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 618A22801D8; Tue, 15 Apr 2025 07:47:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5C9A72800F0; Tue, 15 Apr 2025 07:47:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 46BA42801D8; Tue, 15 Apr 2025 07:47:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 255E82800F0 for ; Tue, 15 Apr 2025 07:47:29 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 94177BC68F for ; Tue, 15 Apr 2025 11:47:30 +0000 (UTC) X-FDA: 83336103060.23.F5FC166 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf28.hostedemail.com (Postfix) with ESMTP id B7F11C0006 for ; Tue, 15 Apr 2025 11:47:28 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744717649; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LI3YbCMIv8+j3UVp5ieKRENdp6nJIxzrvjyjLJWAKwk=; b=k7kSfHr7QGwQEPzI/rxt3nYQryye7jEY142y3g+e00kvj+6G53Ly1WCFb1PZcXCjNvmNyt 9BBQ22js7hYUQaQTi+fk4Y9uuMzCTKdV9ji8N1LSaOxjSI4CUSNKUj84xnYCaTGebn8kTL Pgfa13byZqr2ezncd5SYN9gksbyRBqc= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none; spf=pass (imf28.hostedemail.com: domain of dev.jain@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=dev.jain@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744717649; a=rsa-sha256; cv=none; b=r6PBizCfUBJZ3INlMep7rk5+ioGVz74SXXKCaM16uyVInGIAmkD3ENvYi9b9mWnbQgyLgN erKPzcZxdM+AWAKGr1jo+sPYh6mZRYvFTGIdClTqG0CPBGGc/CgMw5aXRQYHgMpvxv83oF Nst+UVZScs0otSgbrNzFJw8JDcnYFQo= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F1E7115A1; Tue, 15 Apr 2025 04:47:25 -0700 (PDT) Received: from [10.163.73.130] (unknown [10.163.73.130]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id CD8F63F66E; Tue, 15 Apr 2025 04:47:23 -0700 (PDT) Message-ID: <9ed4c113-37eb-4e3d-98a1-f46f786aaea9@arm.com> Date: Tue, 15 Apr 2025 17:17:19 +0530 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mempolicy: Optimize queue_folios_pte_range by PTE batching To: David Hildenbrand , akpm@linux-foundation.org Cc: ryan.roberts@arm.com, willy@infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, hughd@google.com, vishal.moola@gmail.com, yang@os.amperecomputing.com, ziy@nvidia.com References: <20250411081301.8533-1-dev.jain@arm.com> <09c77ab5-65fc-4bca-a7e5-2b11bba9330d@redhat.com> Content-Language: en-US From: Dev Jain In-Reply-To: <09c77ab5-65fc-4bca-a7e5-2b11bba9330d@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: B7F11C0006 X-Stat-Signature: q4r8jtt1f3ihqfsz5kbw1mj8bjdyhr6t X-Rspam-User: X-HE-Tag: 1744717648-922803 X-HE-Meta: U2FsdGVkX1+zGEfBpCwhNxaCJTB6TLru22yoOTEaU1juG8kKrr6gzVhrIwDq3pt1g9ZOl1FXB2jZBZEhpMs1aSwu4yXS4eJlG/pI84n1gCXJ1wZhfaB/+gbXADyT8LefmoBz1fvAchDmEj/86OIq705qtasnSbx3lALymgFoy5CuafqXWiAuHEsVx0vaos5bePpzagFzFdfsd7oBqmexYH7rvsAnBlS0ADOTDU++hBry4Ec+lDB4AvufKsyYvinPiYt8/ivvFwspUlQBsP9wCg8O0li3yTlipOuHGTENFwsOsOH7ZokRD0TsaAVIseK01RkUgV+ciQf4ZqY/uG7aomuEjmr7IfSxxlQMaoKC4znhme27yrDvLCgjLw001TAJgDg2kFGXzsab9qly1ETMWNC1JySmb1ECc4UgGYwyAtRQ0++K5U5Y3f1QNYnlVvDQ2dzMLBEASSUw7xuAhQgSEbMYP25qwqYMgav4KXCzvzJxwGSHZJn75zXydRFrMNT7J6Ec6eXuaUIEwMumYP49FxiAv+WKRk7jNOy0LWBw57AwWxpo4LFXn4zZaotUxBry2x+oE6ReI73+VEHngKE2H+CuTBiMBQokgpm2z2WGvIC5dnC8VpM79YHCwWuLMz+JNTYEz/KNoYMLgWzy1R7LcLHF31BdmYcMb0ZL0TJ/9h4QIY+rU4oVcSnF9CI9yCPunnAgD9PF88IZj13BL0yCdriY4LrHUysPAX3V9NqNW9PKtJ1dy/kHMG53Vev7d5lYydS7dYxkL2but3pA9Pn+NBk4Vx6SaCKHf2foTAeuouLQ5J8xmwyUP0fjVoyyKB2/nvX4lxn7P6fRbdESsgodin7psR2C4/uFiV2G/SVuxbfZb6Z1/iDAGVihtYC+xYg7qLNfi7nGx1PXXZEidRAO0XZYA2n0YNuJOsfILZGD8yRodFpiua1f8aSpubagrYjukd1iFJoFoicmqA7dxWJ K3g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 15/04/25 3:47 pm, David Hildenbrand wrote: > On 11.04.25 10:13, Dev Jain wrote: >> After the check for queue_folio_required(), the code only cares about the >> folio in the for loop, i.e the PTEs are redundant. Therefore, optimize >> this >> loop by skipping over a PTE batch mapping the same folio. >> >> Signed-off-by: Dev Jain >> --- >> Unfortunately I have only build tested this since my test environment is >> broken. >> >>   mm/mempolicy.c | 12 +++++++++++- >>   1 file changed, 11 insertions(+), 1 deletion(-) >> >> diff --git a/mm/mempolicy.c b/mm/mempolicy.c >> index b28a1e6ae096..b019524da8a2 100644 >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -573,6 +573,9 @@ static int queue_folios_pte_range(pmd_t *pmd, >> unsigned long addr, >>       pte_t *pte, *mapped_pte; >>       pte_t ptent; >>       spinlock_t *ptl; >> +    int max_nr; >> +    const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; >> +    int nr = 1; > > Try sticking to reverse xmas tree, please. (not completely the case > here, but fpb_flags can easily be moved all he way to the top) I thought that the initializations were to be kept at the bottom. Asking for future patches, should I put all declarations in reverse-xmas fashion (even those which I don't intend to touch w.r.t the patch logic), or do I do that for only my additions? > > Also, why are you initializing nr to 1 here if you reinitialize it below? Yup no need, I thought pte += nr will blow up due to nr not being initialized, but it won't because it gets executed just before the start of the second iteration. > > >       ptl = pmd_trans_huge_lock(pmd, vma);>       if (ptl) { >> @@ -586,7 +589,8 @@ static int queue_folios_pte_range(pmd_t *pmd, >> unsigned long addr, >>           walk->action = ACTION_AGAIN; >>           return 0; >>       } > > -    for (; addr != end; pte++, addr += PAGE_SIZE) {> +    for (; > addr != end; pte += nr, addr += nr * PAGE_SIZE) { >> +        nr = 1; >>           ptent = ptep_get(pte); >>           if (pte_none(ptent)) >>               continue; >> @@ -607,6 +611,11 @@ static int queue_folios_pte_range(pmd_t *pmd, >> unsigned long addr, >>           if (!queue_folio_required(folio, qp)) >>               continue; >>           if (folio_test_large(folio)) { >> +            max_nr = (end - addr) >> PAGE_SHIFT; >> +            if (max_nr != 1) >> +                nr = folio_pte_batch(folio, addr, pte, ptent, >> +                             max_nr, fpb_flags, >> +                             NULL, NULL, NULL); > > We should probably do that immediately after we verified that > vm_normal_folio() have us something reasonable. But shouldn't we keep the small folio case separate to avoid the overhead of folio_pte_batch()? > >>               /* >>                * A large folio can only be isolated from LRU once, >>                * but may be mapped by many PTEs (and Copy-On-Write may >> @@ -633,6 +642,7 @@ static int queue_folios_pte_range(pmd_t *pmd, >> unsigned long addr, >>               qp->nr_failed++; >>               if (strictly_unmovable(flags)) >>                   break; >> +            qp->nr_failed += nr - 1; > > Can't we do qp->nr_failed += nr; above? I did not dive deep into the significance of nr_failed, but I did that to keep the code, before and after the change, equivalent: Claim: if we reach qp->nr_failed++ for a single pte, we will reach here for all ptes belonging to the same batch. Proof: We reach here => the if condition is true. Now, !(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) and !vma_migratable(vma) do not depend on the ptes. So the other case is that !migrate_folio_add() is true => !folio_isolate_lru() is true, which depends on the folio and not the PTEs; if isolation fails for one PTE, it will definitely fail for the PTE batch. So, before the change, if we iterate on a pte mapping a large folio, and strictly_unmovable(flags) is true, then nr_failed += 1 only. If not, then nr_failed++ will happen nr times for sure (because of the claim) and we can safely do qp->nr_failed += nr - 1. > > Weird enough, queue_folios_pmd() also only does qp->nr_failed++, but > queue_pages_range() documents it that way. > >>           } >>       } >>       pte_unmap_unlock(mapped_pte, ptl); > >