From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F836C3ABBF for ; Wed, 7 May 2025 10:03:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4032E6B0089; Wed, 7 May 2025 06:03:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3B24F6B008A; Wed, 7 May 2025 06:03:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 288406B008C; Wed, 7 May 2025 06:03:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 063246B0089 for ; Wed, 7 May 2025 06:03:45 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0C479CC5C9 for ; Wed, 7 May 2025 10:03:47 +0000 (UTC) X-FDA: 83415675294.14.4E2ED9C Received: from out30-124.freemail.mail.aliyun.com (out30-124.freemail.mail.aliyun.com [115.124.30.124]) by imf04.hostedemail.com (Postfix) with ESMTP id 85C6940008 for ; Wed, 7 May 2025 10:03:43 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=sgxIqJAZ; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf04.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746612225; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EWQps3oaNtZZ7GNbFF8Y5bHARbDvabiLn4K5Tv65RZI=; b=mz4EyRN5KRQUnvCpEit2mjPO/sRqXfCdPSwr+XNWhvtaw/iGS97zWJCJktANr996ZO7QBb DGznb6VVPD44SthgfnjaC6N+hcZ1Da4n2iJjgFouN9oc15ycErTMGDV5NYUfKDS02XHM9Q QegBBk/w1OreMfCu14Z3/jjy108BbhM= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=sgxIqJAZ; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf04.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.124 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746612225; a=rsa-sha256; cv=none; b=18iKUiEYhVyomo1pbwicu/2zazE6bpILYcXARZ5paT5tCyNocLzhIRLXgc+Pvk/uZZroyx xGi/hn2ELHOgkVbOzGcQVrcE/D1aQeZwv+d/XRLCRj2pl/TtVnPd+8FkBIp8rEF2URfY/E QbbbMJi2eBNbU4O3ybok+EACRU3s+e4= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1746612220; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=EWQps3oaNtZZ7GNbFF8Y5bHARbDvabiLn4K5Tv65RZI=; b=sgxIqJAZ85b4SoNGEzv3rsVZEwfQhCWs/piDav3mOLvhhW4iYarE8IK1I/lTJkaKyYTtr6QIWytISEJJRKj0cZK76U0YEgNiJ6WAGbh4Qh1I9dxVd+YfhW00WZD1Ea/OFh1N7y3z5k1g833fFJWEY4ohEB+XM33XMkCWqYZUPd0= Received: from 30.74.144.111(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WZpIPoD_1746612218 cluster:ay36) by smtp.aliyun-inc.com; Wed, 07 May 2025 18:03:39 +0800 Message-ID: Date: Wed, 7 May 2025 18:03:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH 2/2] mm: mincore: use folio_pte_batch() to batch process large folios To: David Hildenbrand , Dev Jain , akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, 21cnbao@gmail.com, ryan.roberts@arm.com, ziy@nvidia.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <7ad05bc9299de5d954fb21a2da57f46dd6ec59d0.1742960003.git.baolin.wang@linux.alibaba.com> <17289428-894a-4397-9d61-c8500d032b28@arm.com> <6a8418ba-dbd1-489f-929b-e31831bea0cf@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: ohhi44ooi717cfq9ugeuq4wudqwxryws X-Rspamd-Queue-Id: 85C6940008 X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1746612223-348874 X-HE-Meta: U2FsdGVkX1+NrdclYsdGfQLBvmGIu1gz5KslBsDvRl38cXp5X5lgLExKGnRM99KuIrIS/8TI0ce0SpuqZ9SpzVPaUhgmFEB4jkq19qIOOZJahncUhdk1SWKlqWpwcNNzaMkNVKNxyH6eoRz4rjIygIbKTIJJihx0zzUeOE8dktJ2Lci/QHQ/dam+begUFOz1adhAgB9us63alfAIgK6N/IOyQpBFbm/dj3rYkaEYIpMDWa7F5r1zXSr/BwMJMT8cYcCezUeKpAAmuG0+q3KEd0TKmThhV3aNpDz7jNgKMjazYmP4Chvabc5aMIlh/29wez//X3FC9ddnAN+4JC878gTuoQ5GfcHK5nG0gJbiX6FEeomwPT4WnXLEzZvdFycaf5o6esw6TMu9G5raDsV4Wf+zH5j8/ONfCNarmtjJuXtMZ7QT3HkJF25KTu2bH7sABNNr749Zmvibbi49o7nua0+du1eJRDE0n13H3Ookf5M8XDjNFqstkCv+H1rN03dtGvVIqCRNu6HDkynlnnmPron8MS/flcoC4Fip4Si+QjPWPabAYgsEavnP1IvZ7MI5W+KSEfPqK+12PUlQ96oEGtDvTg/sJaVqlacKPLrUyCn0fscup2ovNJ3RWPPna5R9+XQnKCKHu1mckhIVnks3C2iLwrgNRjd+4VF0MmCh++87SgIsgAmbIVQQgpLHFnNy8KC008jkjG7E/DvHK3/of5ZqCffN3N49qv2uC2+sCyKhMpc7JXJNuoN2PgL1AlvwjSkzhLRmDl0xSoz+kJ5MzYfgnWYyHWZSq24C8pIGaxDLhO6vggf0efarBfJLEenaHCYlrmywetuy+aAog9CdACxXBbkOFcRk5vVF0vRSsUp55HadXHy8DwcQ8/TbQ8zDDD3IFzOWB7mfo9a9X+QMwxg2FTWKVh6wrrHcAWmkpj20drVtVLIb/3EyaNgoFj8NErYhBp+MPUC3AnlVlBb dR1JJyzL PSuPWucf7md0e/CsYI4ER+mr/pXdvTaxdjFIjnN9LUCeL/fnv4kZlBnpOaENKEAmR9bZ2AQ//5BiittHsYagFOznzv6MVLHf7nVpjvl1iJGxcjLqT3wPa31OBfj3N6ShPjHI0fbDNNTqGplJVOf/BO0ty1zEjv9Uc/49eTO6uCMEKfqnAEELezirWWbZ9Nry/DBPRrbOqt5ZyYO9XxvVCaK7mh2dXr7YmwEWf7gS+WPxGa71fxola0BrSBk4FYT+4nqneiAYHTXYXDjuFRDvMNo5RsQdDhhwA/4Szpu6kSgEMw+YpQk6ldGQFFk/COzDEvNuZ0TVTR3gxzSUwQwZn2TBp3oruSQOVarY++K1/kN5BXIKQq0HszjxWV+BX81pIenW6EJiALClmGONkPE/c26oqAWuhXiVb0qUMeYOJJPdMHCo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/5/7 17:54, David Hildenbrand wrote: > On 07.05.25 11:48, Baolin Wang wrote: >> >> >> On 2025/5/7 13:12, Dev Jain wrote: >>> >>> >>> On 26/03/25 9:08 am, Baolin Wang wrote: >>>> When I tested the mincore() syscall, I observed that it takes longer >>>> with >>>> 64K mTHP enabled on my Arm64 server. The reason is the >>>> mincore_pte_range() >>>> still checks each PTE individually, even when the PTEs are contiguous, >>>> which is not efficient. >>>> >>>> Thus we can use folio_pte_batch() to get the batch number of the >>>> present >>>> contiguous PTEs, which can improve the performance. I tested the >>>> mincore() >>>> syscall with 1G anonymous memory populated with 64K mTHP, and >>>> observed an >>>> obvious performance improvement: >>>> >>>> w/o patch        w/ patch        changes >>>> 6022us            1115us            +81% >>>> >>>> Moreover, I also tested mincore() with disabling mTHP/THP, and did not >>>> see any obvious regression. >>>> >>>> Signed-off-by: Baolin Wang >>>> --- >>>>    mm/mincore.c | 27 ++++++++++++++++++++++----- >>>>    1 file changed, 22 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/mm/mincore.c b/mm/mincore.c >>>> index 832f29f46767..88be180b5550 100644 >>>> --- a/mm/mincore.c >>>> +++ b/mm/mincore.c >>>> @@ -21,6 +21,7 @@ >>>>    #include >>>>    #include "swap.h" >>>> +#include "internal.h" >>>>    static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned >>>> long addr, >>>>                unsigned long end, struct mm_walk *walk) >>>> @@ -105,6 +106,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned >>>> long addr, unsigned long end, >>>>        pte_t *ptep; >>>>        unsigned char *vec = walk->private; >>>>        int nr = (end - addr) >> PAGE_SHIFT; >>>> +    int step, i; >>>>        ptl = pmd_trans_huge_lock(pmd, vma); >>>>        if (ptl) { >>>> @@ -118,16 +120,31 @@ static int mincore_pte_range(pmd_t *pmd, >>>> unsigned long addr, unsigned long end, >>>>            walk->action = ACTION_AGAIN; >>>>            return 0; >>>>        } >>>> -    for (; addr != end; ptep++, addr += PAGE_SIZE) { >>>> +    for (; addr != end; ptep += step, addr += step * PAGE_SIZE) { >>>>            pte_t pte = ptep_get(ptep); >>>> +        step = 1; >>>>            /* We need to do cache lookup too for pte markers */ >>>>            if (pte_none_mostly(pte)) >>>>                __mincore_unmapped_range(addr, addr + PAGE_SIZE, >>>>                             vma, vec); >>>> -        else if (pte_present(pte)) >>>> -            *vec = 1; >>>> -        else { /* pte is a swap entry */ >>>> +        else if (pte_present(pte)) { >>>> +            if (pte_batch_hint(ptep, pte) > 1) { >>>> +                struct folio *folio = vm_normal_folio(vma, addr, pte); >>>> + >>>> +                if (folio && folio_test_large(folio)) { >>>> +                    const fpb_t fpb_flags = FPB_IGNORE_DIRTY | >>>> +                                FPB_IGNORE_SOFT_DIRTY; >>>> +                    int max_nr = (end - addr) / PAGE_SIZE; >>>> + >>>> +                    step = folio_pte_batch(folio, addr, ptep, pte, >>>> +                            max_nr, fpb_flags, NULL, NULL, NULL); >>>> +                } >>>> +            } >>> >>> Can we go ahead with this along with [1], that will help us generalize >>> for all arches. >>> >>> [1] https://lore.kernel.org/all/20250506050056.59250-3-dev.jain@arm.com/ >>> (Please replace PAGE_SIZE with 1) >> >> As discussed with Ryan, we don’t need to call folio_pte_batch() >> (something like the code below), so your patch seems unnecessarily >> complicated. However, David is unhappy about the open-coded >> pte_batch_hint(). > > I can live with the below :) > > Having something more universal does maybe not make sense here. Any form > of patching contiguous PTEs (contiguous PFNs) -- whether with folios or > not -- is not required here as we really only want to > > (a) Identify pte_present() PTEs > (b) Avoid the cost of repeated ptep_get() with cont-pte. Good. I will change the patch and resend it. Thanks.