From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id D04A2D61010 for ; Thu, 29 Jan 2026 12:28:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2E8D6B0089; Thu, 29 Jan 2026 07:28:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C06746B008A; Thu, 29 Jan 2026 07:28:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B08326B008C; Thu, 29 Jan 2026 07:28:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9DCCA6B0089 for ; Thu, 29 Jan 2026 07:28:53 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 1D8C913BF39 for ; Thu, 29 Jan 2026 12:28:53 +0000 (UTC) X-FDA: 84384930546.09.7600F54 Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf24.hostedemail.com (Postfix) with ESMTP id 253C8180006 for ; Thu, 29 Jan 2026 12:28:50 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NjKZLrx3; spf=pass (imf24.hostedemail.com: domain of vernon2gm@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=vernon2gm@gmail.com; arc=pass ("google.com:s=arc-20240605:i=1"); dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1769689731; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=dHqjURYN4atUfE0BElSDPuu2blTOLD3T/7U5tuCcJy0=; b=aGj4/FNmDIsz+UlltbNm5X8AT8isSHxddj5aaDzysIIYNO/EB7SfZDylwg3EGzTSTEnKFK Xnm28vKP7twCCsZaOHZNGyBkBlNJ9MZB/6JTS0V+sAjJN9pNA3zLgVtigs4e2qxzW4exAt L2gNOMmfipUM03QIeOu9DCEwMeX1Vx4= ARC-Authentication-Results: i=2; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=NjKZLrx3; spf=pass (imf24.hostedemail.com: domain of vernon2gm@gmail.com designates 209.85.208.179 as permitted sender) smtp.mailfrom=vernon2gm@gmail.com; arc=pass ("google.com:s=arc-20240605:i=1"); dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1769689731; a=rsa-sha256; cv=pass; b=khsVxW8WPTjWkqYk1msEoUT5WZYVsM+DG7Q8+UfEHcJUCQvgp+7MaYeRUyURNNu2/5ibSc Q46kYTO2qZMC1hnUhvb/SK3sVkRXEY5ILnJH5gZ1FXTUzhu1TrsHtDunNKDvPEZJ4zpWe5 s8oZo9mIvh+dhP8PBn63+9iNvPJn/Jc= Received: by mail-lj1-f179.google.com with SMTP id 38308e7fff4ca-37fd6e91990so9759191fa.3 for ; Thu, 29 Jan 2026 04:28:50 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1769689729; cv=none; d=google.com; s=arc-20240605; b=Fbz8mvqCLmpo6Xgzyb0q+FfKDkdJ6CWu/DKfRiKyr76peHQM1+caoYeKWoVfH0tXnL puXZ1sjRGdUIz2ILFh1F01Tb168+h5fzPRslEHZrk2FNgdaE96q6y1iOdunD9U8cd3qb fBxf5Aol4c53iCPK3GSUcZiRPE8gUIM+ndy/YgIf3Et9Ykq9SxHh1juKUI7fBdpXF7gn BUb/MCYqbKmWVKEZGQ1NPAyGAjOTFdp4jwhQU2C5jp+9ttBr56JXXhiazwItAIgQ2GD1 ekAzQ/BS49gX8+sZkfxKo7ugcaB8lhN6GOFu4JmYMgcg1+xROYDAmDPt9llQNFYlHD0N LY9w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20240605; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:dkim-signature; bh=dHqjURYN4atUfE0BElSDPuu2blTOLD3T/7U5tuCcJy0=; fh=y5DM3VEPAwm4CRiroXjj0l3H9MEi99XhRsPpRQeAXtY=; b=jNHyig/7CfZDvVz6YlMF5sumt2km1mRSoRLuXgdcN85N6gORXoyH06R82d8m1SqChj OfypOZCzpjU2ON0YTCL/cbHxTavLJeRWuQx8CAtnbdveEDHEAoABvSXKI6mj/zAZvKkf FsOCErmLdGp97vG0WR711JR8dI6TfbzLV6uY/LGUaN0rG6o3+PtLGFbrzY9tGTuI60Jo uI6oPNJbRBYOVgAtqFaX6Elxhacc8xPxNt5gnPPkdGedBcU8AIXeQ9hTbTvye9y4KE1a xpUt50s5h/0MymCQTYm3xgbD9LI18J4bN6qq0pxhIkk3SSoi0PHA7DIx8IdZiy52GWTE AZqg==; darn=kvack.org ARC-Authentication-Results: i=1; mx.google.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1769689729; x=1770294529; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=dHqjURYN4atUfE0BElSDPuu2blTOLD3T/7U5tuCcJy0=; b=NjKZLrx3If14jFRdqv+Cn0YTtEWoldW1sHDUs1Ifw1Y+pz8x5y27XaRy7Mdxx1vz7h Z+8992wsd+MDrj+RBuYkMpd0ZflSafAdmcPiirwcRdAlARTamkrLtQZkKdUAi7y/PEvN BbpQ8cpx1oyGUM1MLNYgOPGbxtHJM9xnqpgdc3Ryr5dW8R6n+ly5Nqi5H0I2EmNryuc5 dJRwiOzlRq0E8U3XRPbd329ThHHW0SI5RNiEICrpfR8vT55Aeir/MvnFBBqUabodFOnN N1WDHmTHA+cB9jZc4o8Zyx8FYgBE3jJssOn7hEG9mBhJEAUb+0mi1roRlUOHSbigeZOl sfLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1769689729; x=1770294529; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-gg:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=dHqjURYN4atUfE0BElSDPuu2blTOLD3T/7U5tuCcJy0=; b=iTHorwOgPbsstHEeoLTTs4TKFgdvY2luvapm30whW9qalFCyErMWX4nc2V2D+CFXdM EOaze9RdeUtX8PNV+zVZ0LjbRszKQRSzFcO/oA2/YTuwA1HxM/aqAfjE0zyG0RVYGHkR TbfLuHmFrrYzaJZ8lwFoxHW0dYN1EhE5B8mbDfkpROUYARUFikxjK2DH1jvTMbekgZnb 4uqrUuqPMEoQS9AGdx8xl4bZvSS96IhntHCAQdAOz+8aw2/jYzKgyORbmPefWbkjlVFO 05uwIOJtr8sy7Sv1FO90O7F0wRNFd/9+LEgEu9RaRO66EoU7y/sp7KohqTFRxmIdPSnA qt0A== X-Forwarded-Encrypted: i=1; AJvYcCXdstlqt3HitKKJ0kzxvnWhSYuRT6m/2B2ceJD7Uc48proZ+FdiaHraK8EZWcrvyKbuxGznlMRFQA==@kvack.org X-Gm-Message-State: AOJu0YyfSuKq346QF9cQRWcKNKz/VRx2Q0t1qS3M78rwLZFuZGBL+sXL 7C9Ce1Cpu2ZFl0T1Y8GLNQNHOwQqYqluuEPPwS+X1N5ZvXmZoVbAmG/x8tmQ4qY9YfLNn95JRG9 RDUaeIThPCsf96+UIb9J1xsr++AAIEPw= X-Gm-Gg: AZuq6aL8KC8Sn0YxLVTE8iXB8JVIPvaULi58nvdT+B22Xm7Q8s/VpkcNBFKThqJgxB+ y69bh07si5+qSlxm3TdVBPfC6H2m10GFJp7Ql/v5iIqffjJYGqQOAyl62sSZ/4XX0g0hCpbh3pD ww8hwekea4NqvuPB7lcQQFbskKSsy+wTylTq85KKcqWwbSPMAf5tXQcQ0JqoqJ1HrYpq71utZmv suO+HPkxVPsA6kTdgIdiuASeqDY0YstWy1huICnizs5pbWaclYtWvqFrIfOBxECJnc+AUz5u9t3 fHvPWoY= X-Received: by 2002:a2e:b895:0:b0:382:fccd:f999 with SMTP id 38308e7fff4ca-3861c90df4emr38830551fa.25.1769689729230; Thu, 29 Jan 2026 04:28:49 -0800 (PST) MIME-Version: 1.0 References: <20260123082232.16413-1-vernon2gm@gmail.com> <20260123082232.16413-3-vernon2gm@gmail.com> <72dfd11f-dca4-447e-89c5-7a87cb675bda@arm.com> <4exllw7pf4yfhfzpyg3ictsnznf5lepv5tjd7zvycldjwmh6jm@j6rara3ogrux> <7c4b5933-7bbd-4ad7-baef-830304a09485@arm.com> In-Reply-To: From: Vernon Yang Date: Thu, 29 Jan 2026 20:28:37 +0800 X-Gm-Features: AZwV_QgPLE28dBC2wSWpclQwzNa1tJ8emQ_klM22EQIeDU8pM5A_ljYeGghyBzU Message-ID: Subject: Re: [PATCH mm-new v5 2/5] mm: khugepaged: refine scan progress number To: Lance Yang Cc: akpm@linux-foundation.org, david@kernel.org, lorenzo.stoakes@oracle.com, ziy@nvidia.com, baohua@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Dev Jain , Vernon Yang Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 253C8180006 X-Stat-Signature: c8htkytdt7dos9htg88wq1xupd1iar5i X-Rspam-User: X-HE-Tag: 1769689730-631558 X-HE-Meta: U2FsdGVkX18ao8/GCVqNzJ9403NiV7AUjMUWV225A58Qpd61C0ppYL3KFPkdCwSeU1JAo0yqicfa5dJHixKxDjoaEpnIm7mLdV5t0PVlu2QjEnwTx23FYM1ZQUCjuGa+XhI/R4lTpZRqftKhabF1+0mZeCi5I1Ghx4Z20k7IJjVbGUOK32aP3l7bVfRjgY/AoM1Nd27qQy43MyBWyzHKVdfxL8w3buQUzwa6jHfKkNeWZeZ4ZFINPFJCwF43i7KJoriTrk/dsgeypqlFdV96ppoUOkXzbUe4VV6SXQub4DqrcoJ0OXEiBAB5Bx0Pv3/fRjj/G3XO39pbE10CB9UeAUm1z5M4noO8b5kOLg9l3E4meWwqL0OFP3g9iLih2R5ZBwX23KAev6nCduWIoMEgJT3oRZH5VtXwnMOORZPncWe4o+u5v40ctip3fxAk730OBqvprJHLx1srOY3wBvWdpthFidpo+s57ei/NKpHMdkmD0vmF3NuV3Eyd0lsykasvzBSBsakOK+BDkGoF2G5+qQV5RGJkWf+XOHkTYJMgugnyIWzaWTksehNmSD2EtzdEsCNyuTWffxWziNjMVpNhGd0iuz/5HoEHftekN3xR+khAFln4BrFgzhG2Qs0aFusrcG7uIU5FD6qKHnpLajFwZP5c1jr+l9fZ8sUGkoKuvP0AfjdqutHRCOe8HLquKjC3QRUTREyu+euH2ARIN1gN6atA8NaUEeuCuEBnMx3u+LKXUoEczNHNecXmk/KOkKgg0ZJOnG9hBQlXHj42xNKFEzAPd44baBfoo+L2zVL1+JM5565OcrOyncW3BDMSeOJsWgWGxPihHDUBUM7y8x8YVrojNdgbyDkS8qs8ArcIbwLsxwHyQhS0C/XRNs6NxEzjNw3nAXm1yGA9KqX2qPZPFyUCCcrgxOItocFTl2QyqHquiEC81KKMXWUYaLkVZAk+9lWXsrug/Mr4YEKEWtQ oOM9FW0n tGTEjoPX1+UhW2bEiuW5iB4u7/MHAzFFAy9UsAtqqb30LUy0h0C6dkzoPlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jan 29, 2026 at 5:18=E2=80=AFPM Lance Yang w= rote: > > On 2026/1/29 13:35, Dev Jain wrote: > > > > On 28/01/26 8:04 pm, Vernon Yang wrote: > >> On Wed, Jan 28, 2026 at 01:59:33PM +0530, Dev Jain wrote: > >>> On 23/01/26 1:52 pm, Vernon Yang wrote: > >>>> From: Vernon Yang > >>>> > >>>> Currently, each scan always increases "progress" by HPAGE_PMD_NR, > >>>> even if only scanning a single PTE/PMD entry. > >>>> > >>>> - When only scanning a sigle PTE entry, let me provide a detailed > >>>> example: > >>>> > >>>> static int hpage_collapse_scan_pmd() > >>>> { > >>>> for (addr =3D start_addr, _pte =3D pte; _pte < pte + HPAGE_PMD_NR= ; > >>>> _pte++, addr +=3D PAGE_SIZE) { > >>>> pte_t pteval =3D ptep_get(_pte); > >>>> ... > >>>> if (pte_uffd_wp(pteval)) { <-- first scan hit > >>>> result =3D SCAN_PTE_UFFD_WP; > >>>> goto out_unmap; > >>>> } > >>>> } > >>>> } > >>>> > >>>> During the first scan, if pte_uffd_wp(pteval) is true, the loop exit= s > >>>> directly. In practice, only one PTE is scanned before termination. > >>>> Here, "progress +=3D 1" reflects the actual number of PTEs scanned, = but > >>>> previously "progress +=3D HPAGE_PMD_NR" always. > >>>> > >>>> - When the memory has been collapsed to PMD, let me provide a detail= ed > >>>> example: > >>>> > >>>> The following data is traced by bpftrace on a desktop system. After > >>>> the system has been left idle for 10 minutes upon booting, a lot of > >>>> SCAN_PMD_MAPPED or SCAN_NO_PTE_TABLE are observed during a full scan > >>>> by khugepaged. > >>>> > >>>> @scan_pmd_status[1]: 1 ## SCAN_SUCCEED > >>>> @scan_pmd_status[6]: 2 ## SCAN_EXCEED_SHARED_PTE > >>>> @scan_pmd_status[3]: 142 ## SCAN_PMD_MAPPED > >>>> @scan_pmd_status[2]: 178 ## SCAN_NO_PTE_TABLE > >>> Could you elaborate what is [1], [6] etc and 1,2,142, etc? > >> These 1,6 are value of "enum scan_result", you can directly refer to t= he > >> notes on the right. > >> > >> These 1,2,142,178 are number of different "enum scan_result" from > >> trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file. > >> > >> as example, SCAN_PMD_MAPPED has 142 times during a full scan by > >> khugepaged. > > > > Thanks. Can you please mention this in the patch description. You can s= imply > > right it like this: > > > > "From trace_mm_khugepaged_scan_pmd and trace_mm_khugepaged_scan_file, t= he > > following statuses were observed, with frequency mentioned next to them= : > > > > SCAN_SUCCEED: 1 > > SCAN_PMD_MAPPED: 142 > > ....." > > > > and so on. > > > >> > >>>> total progress size: 674 MB > >>>> Total time : 419 seconds ## include khugepaged_scan_sleep_mi= llisecs > >>>> > >>>> The khugepaged_scan list save all task that support collapse into hu= gepage, > >>>> as long as the task is not destroyed, khugepaged will not remove it = from > >>>> the khugepaged_scan list. This exist a phenomenon where task has alr= eady > >>>> collapsed all memory regions into hugepage, but khugepaged continues= to > >>>> scan it, which wastes CPU time and invalid, and due to > >>>> khugepaged_scan_sleep_millisecs (default 10s) causes a long wait for > >>>> scanning a large number of invalid task, so scanning really valid ta= sk > >>>> is later. > >>>> > >>>> After applying this patch, when the memory is either SCAN_PMD_MAPPED= or > >>>> SCAN_NO_PTE_TABLE, just skip it, as follow: > >>>> > >>>> @scan_pmd_status[6]: 2 > >>>> @scan_pmd_status[3]: 147 > >>>> @scan_pmd_status[2]: 173 > >>>> total progress size: 45 MB > >>>> Total time : 20 seconds > >>>> > >>>> Signed-off-by: Vernon Yang > >>>> --- > >>>> include/linux/xarray.h | 9 ++++++++ > >>>> mm/khugepaged.c | 47 ++++++++++++++++++++++++++++++++++----= ---- > >>>> 2 files changed, 47 insertions(+), 9 deletions(-) > >>>> > >>>> diff --git a/include/linux/xarray.h b/include/linux/xarray.h > >>>> index be850174e802..f77d97d7b957 100644 > >>>> --- a/include/linux/xarray.h > >>>> +++ b/include/linux/xarray.h > >>>> @@ -1646,6 +1646,15 @@ static inline void xas_set(struct xa_state *x= as, unsigned long index) > >>>> xas->xa_node =3D XAS_RESTART; > >>>> } > >>>> > >>>> +/** > >>>> + * xas_get_index() - Get XArray operation state for a different ind= ex. > >>>> + * @xas: XArray operation state. > >>>> + */ > >>>> +static inline unsigned long xas_get_index(struct xa_state *xas) > >>>> +{ > >>>> + return xas->xa_index; > >>>> +} > >>>> + > >>>> /** > >>>> * xas_advance() - Skip over sibling entries. > >>>> * @xas: XArray operation state. > >>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c > >>>> index 6f0f05148765..de95029e3763 100644 > >>>> --- a/mm/khugepaged.c > >>>> +++ b/mm/khugepaged.c > >>>> @@ -68,7 +68,10 @@ enum scan_result { > >>>> static struct task_struct *khugepaged_thread __read_mostly; > >>>> static DEFINE_MUTEX(khugepaged_mutex); > >>>> > >>>> -/* default scan 8*HPAGE_PMD_NR ptes (or vmas) every 10 second */ > >>>> +/* > >>>> + * default scan 8*HPAGE_PMD_NR ptes, pmd_mapped, no_pte_table or vm= as > >>>> + * every 10 second. > >>>> + */ > >>>> static unsigned int khugepaged_pages_to_scan __read_mostly; > >>>> static unsigned int khugepaged_pages_collapsed; > >>>> static unsigned int khugepaged_full_scans; > >>>> @@ -1240,7 +1243,8 @@ static enum scan_result collapse_huge_page(str= uct mm_struct *mm, unsigned long a > >>>> } > >>>> > >>>> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *= mm, > >>>> - struct vm_area_struct *vma, unsigned long start_addr, boo= l *mmap_locked, > >>>> + struct vm_area_struct *vma, unsigned long start_addr, > >>>> + bool *mmap_locked, unsigned int *cur_progress, > >>>> struct collapse_control *cc) > >>>> { > >>>> pmd_t *pmd; > >>>> @@ -1255,6 +1259,9 @@ static enum scan_result hpage_collapse_scan_pm= d(struct mm_struct *mm, > >>>> > >>>> VM_BUG_ON(start_addr & ~HPAGE_PMD_MASK); > >>>> > >>>> + if (cur_progress) > >>>> + *cur_progress +=3D 1; > >>> Why not be a little more explicit, and do this addition if find_pmd_o= r_thp_or_none fails, > >>> or pte_offset_map_lock fails? The way you do it right now is not read= able - it gives no > >>> idea as to why on function entry we do a +1 right away. Please do add= some comments too. > >> If this way is not clear enough, we can directly add 1 in > >> find_pmd_or_thp_or_none() etc, BUT it's a bit redundant. > >> Please take a look at which one is better. > >> > >> case 1: > >> as the V4 PATCH #2 [1] and #3 [2], only hpage_collapse_scan_pmd(). > >> [1] https://lore.kernel.org/linux-mm/20260111121909.8410-3-yanglinchen= g@kylinos.cn > >> [2] https://lore.kernel.org/linux-mm/20260111121909.8410-4-yanglinchen= g@kylinos.cn > >> > >> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm, > >> struct vm_area_struct *vma, unsigned long start_addr, > >> bool *mmap_locked, unsigned int *cur_progress, > >> struct collapse_control *cc) > >> { > >> ... > >> result =3D find_pmd_or_thp_or_none(mm, start_addr, &pmd); > >> if (result !=3D SCAN_SUCCEED) { > >> if (cur_progress) > >> *cur_progress +=3D 1; // here > >> goto out; > >> } > >> ... > >> pte =3D pte_offset_map_lock(mm, pmd, start_addr, &ptl); > >> if (!pte) { > >> if (cur_progress) > >> *cur_progress +=3D 1; // here > >> result =3D SCAN_NO_PTE_TABLE; > >> goto out; > >> } > >> > >> for (addr =3D start_addr, _pte =3D pte; _pte < pte + HPAGE_PMD_NR= ; > >> _pte++, addr +=3D PAGE_SIZE) { > >> if (cur_progress) > >> *cur_progress +=3D 1; // here > >> ... > >> } > >> } > >> > >> case 2: > >> > >> static enum scan_result hpage_collapse_scan_pmd(struct mm_struct *mm, > >> struct vm_area_struct *vma, unsigned long start_addr, > >> bool *mmap_locked, unsigned int *cur_progress, > >> struct collapse_control *cc) > >> { > >> ... > >> result =3D find_pmd_or_thp_or_none(mm, start_addr, &pmd); > >> if (result !=3D SCAN_SUCCEED) { > >> if (cur_progress) > >> *cur_progress +=3D 1; // here > > > > Let us be more explicit and set this equal to 1, instead of adding 1. > > > >> goto out; > >> } > >> ... > >> pte =3D pte_offset_map_lock(mm, pmd, start_addr, &ptl); > >> if (!pte) { > >> if (cur_progress) > >> *cur_progress +=3D 1; // here > > > > Same comment as above. > > > >> result =3D SCAN_NO_PTE_TABLE; > >> goto out; > >> } > >> > >> for (addr =3D start_addr, _pte =3D pte; _pte < pte + HPAGE_PMD_NR= ; > >> _pte++, addr +=3D PAGE_SIZE) { > >> ... > >> } > >> ... > >> out_unmap: > >> if (cur_progress) { > >> if (_pte >=3D pte + HPAGE_PMD_NR) > >> *cur_progress +=3D HPAGE_PMD_NR; // here > >> else > >> *cur_progress +=3D _pte - pte + 1; // here > >> } > >> } > > > > I will vote case 2. In case 1 I don't like the fact that the if (cur_pr= ogress) > > branch will be checked each iteration - and I don't think the compiler = can > > optimize this since the body of the loop is complex, so this check cann= ot > > be hoisted out of the loop. > > > > > >> > >> case 3: > >> current patch, and add more comments to clearer. > >> > >>>> + > >>>> result =3D find_pmd_or_thp_or_none(mm, start_addr, &pmd); > >>>> if (result !=3D SCAN_SUCCEED) > >>>> goto out; > >>>> @@ -1396,6 +1403,12 @@ static enum scan_result hpage_collapse_scan_p= md(struct mm_struct *mm, > >>>> result =3D SCAN_SUCCEED; > >>>> } > >>>> out_unmap: > >>>> + if (cur_progress) { > >>>> + if (_pte >=3D pte + HPAGE_PMD_NR) > >>>> + *cur_progress +=3D HPAGE_PMD_NR - 1; > >>>> + else > >>>> + *cur_progress +=3D _pte - pte; > >>>> + } > >>>> pte_unmap_unlock(pte, ptl); > >>>> if (result =3D=3D SCAN_SUCCEED) { > >>>> result =3D collapse_huge_page(mm, start_addr, referenced, > >>>> @@ -2286,8 +2299,9 @@ static enum scan_result collapse_file(struct m= m_struct *mm, unsigned long addr, > >>>> return result; > >>>> } > >>>> > >>>> -static enum scan_result hpage_collapse_scan_file(struct mm_struct *= mm, unsigned long addr, > >>>> - struct file *file, pgoff_t start, struct collapse_control= *cc) > >>>> +static enum scan_result hpage_collapse_scan_file(struct mm_struct *= mm, > >>>> + unsigned long addr, struct file *file, pgoff_t start, > >>>> + unsigned int *cur_progress, struct collapse_control *cc) > >>>> { > >>>> struct folio *folio =3D NULL; > >>>> struct address_space *mapping =3D file->f_mapping; > >>>> @@ -2376,6 +2390,18 @@ static enum scan_result hpage_collapse_scan_f= ile(struct mm_struct *mm, unsigned > >>>> cond_resched_rcu(); > >>>> } > >>>> } > >>>> + if (cur_progress) { > >>>> + unsigned long idx =3D xas_get_index(&xas) - start; > >>>> + > >>>> + if (folio =3D=3D NULL) > >>>> + *cur_progress +=3D HPAGE_PMD_NR; > >>> I think this whole block needs some comments. Can you explain, why yo= u > >>> do a particular increment in each case? > >>> > >>>> + else if (xa_is_value(folio)) > >>>> + *cur_progress +=3D idx + (1 << xas_get_order(&xas= )); > >>>> + else if (folio_order(folio) =3D=3D HPAGE_PMD_ORDER) > >>>> + *cur_progress +=3D idx + 1; > >>>> + else > >>>> + *cur_progress +=3D idx + folio_nr_pages(folio); > >>>> + } > >> The "idx" represent PTEs number already scanned when exiting > >> xas_for_each(). > >> > >> However, the last valid folio size was not counted in "idx" (except > >> folio =3D=3D NULL, "idx" equal to HPAGE_PMD_NR), which can be further > >> divided into three cases: > > > > But, the number of PTEs you account in these three cases, are *not* > > scanned, right? So we can simply drop these 3 cases. > > > >> > >> 1. shmem swap entries (xa_is_value), add folio size. > >> 2. the folio is HPAGE_PMD_ORDER, the memory has been collapsed > >> to PMD, so add 1 only. > >> 3. Normal folio, add folio size. > >> > >> Is the version below more readable? > >> > >> if (cur_progress) { > >> *cur_progress +=3D xas.xa_index - start; > >> > >> if (folio) { > >> if (xa_is_value(folio)) > >> *cur_progress +=3D 1 << xas_get_order(&xa= s); > >> else if (folio_order(folio) =3D=3D HPAGE_PMD_ORDE= R) > >> *cur_progress +=3D 1; > >> else > >> *cur_progress +=3D folio_nr_pages(folio); > >> } > >> } > > > > Yep, this is unneeded complexity. This looks really ugly and the benefi= ts of > > this are not clear. You can simply do > > > > if (cur_progress) > > *cur_progress =3D xas.xa_index - start; > > > > I agree with Dev here. The extra complexity in hpage_collapse_scan_file() > doesn't seem worth it. > > Suggest: > > if (cur_progress) > *cur_progress =3D max(xas.xa_index - start, 1UL); > > Just keeps it simple, and handles the idx=3D0 case you mentioned as well. Thank you for your suggestion, but two scenarios are still missing. For a detailed explanation, please refer to the link below. https://lore.kernel.org/linux-mm/20260123082232.16413-1-vernon2gm@gmail.com= /T/#mc3969cb0d52e28ffea2fb96260f0880a5f005848