From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 76091C369DC for ; Thu, 1 May 2025 22:30:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 783BF6B0088; Thu, 1 May 2025 18:30:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7099F6B0089; Thu, 1 May 2025 18:30:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5AA726B008A; Thu, 1 May 2025 18:30:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 34CA96B0088 for ; Thu, 1 May 2025 18:30:10 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E17B281B3B for ; Thu, 1 May 2025 22:30:09 +0000 (UTC) X-FDA: 83395783338.10.2ED4FB5 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf22.hostedemail.com (Postfix) with ESMTP id 8AFD4C0011 for ; Thu, 1 May 2025 22:30:07 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HneoLKNr; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746138608; a=rsa-sha256; cv=none; b=2B0ZoGck3VN2dXqa34K5xR1po82vR0RuenGCazJHTFfzALaaG2f2T15DdZ5Dbh3HNiXW8q i+0Hj5Drm5h5ht/gkoQ1fgy8ofl2+Sos0swp7EpbcQbzsYUelK9gpNWNBZFyXO0eqsBNf3 lg9hfct9vEDlAs29WeHa2jsO2rvSWAU= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=HneoLKNr; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf22.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746138607; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xSA7qrEENLsT9vO+347Cgs8GdRO1/pgbFqo1xZrRAbM=; b=WzmJiYLSprSOkD/eSmnkhnwYEqnPHDNvINFgzPGBSMYGMBrqR42MCucbj7a94ktXwn/xVy GArUf/tlfPaoan5/jz3w/1Mbsxog6YwMpgrzhji2Q5wtaJm+qPDiXoitkJAyxpPzFw4LRl +Pzm0cfGul2An37MljZYQ3XUrPuQqfE= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1746138607; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=xSA7qrEENLsT9vO+347Cgs8GdRO1/pgbFqo1xZrRAbM=; b=HneoLKNrh+8wLdtCHvGMFmr75AD9WUabzBoSyNncGtswMWQ08oSReBGfbSZIdGHVimr/kn nyLgCyTD10xfFOp/XBFjzgYjYnztxoPk+zs2dhQjSFjQpXJ+dSwQCEXigLvFPjH/h37bdY opRZd/ENqhtI4VxjGdf/f8UqobCqBYo= Received: from mail-yw1-f198.google.com (mail-yw1-f198.google.com [209.85.128.198]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-301-gDGZWPaIODOHtefE7bmlJg-1; Thu, 01 May 2025 18:30:05 -0400 X-MC-Unique: gDGZWPaIODOHtefE7bmlJg-1 X-Mimecast-MFC-AGG-ID: gDGZWPaIODOHtefE7bmlJg_1746138605 Received: by mail-yw1-f198.google.com with SMTP id 00721157ae682-708aead74d2so21641787b3.0 for ; Thu, 01 May 2025 15:30:05 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1746138605; x=1746743405; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xSA7qrEENLsT9vO+347Cgs8GdRO1/pgbFqo1xZrRAbM=; b=pKed7Xi/ZmVzhEUpmCj1bBOku2jkdJhRdUy/HR+j8qwvR0nlgce+MdcSfuLF4oVV2u PrYqAzSzUhl6b6ay+8tNmWVfRPlMnFm82+IXDm8jJw5Srt0DtZq/4iBoN3w4MYgfPeny bEewx32f+Ns09VfJWq2ZskNOt2Gk5Pn5TXG+AI989+acKDQA4ab9mFxnieEho5P1MdT1 s5fXKSupoumW7IUkciR5k+Fiu1hydrtdWURg/Caayhtfgx+z7SxnPndnTet8/+jeLUME qVs2IoRQR2WzGxmGKTZJTxhaMTJDLlkFTkX3pvz0MtCKccbkxvD0TS+HywuGbqASSO3D fkQQ== X-Gm-Message-State: AOJu0YwbeE75XSf+NZ5OLGJf1NbkYx9oKLoZcCIPjkyiZfu1Xie15wCT gbTxRo6/yrzz5Nc8c0snmajM7lTvCGLJAek6rMfI0SU3zw/ONi0+VggyvFlEbFanooQDMbHL0PI FPDbxRtF+mNdGjy+Z57kO6GUJNC4BrN7dmEMxEv3JE/6vHB+gpaQfjtcNTVKWWWdKLLSLjP+it1 9ZG0wV0Fqx1ncaUfId7AeGqFc= X-Gm-Gg: ASbGncsOhIcgKC2rG70yzxOREs2kHwRGuTul5mhli3esyGunUgQJn+5cQ9hx/4AGOeg 062CSKju991xIniJB0vWAw5UCfrkr+g9jp3c0UfVJ/hhh2WW+sqsdCvhhL5c2sjjn3G4czkskWw SnqoTvXZU= X-Received: by 2002:a05:690c:9993:b0:708:39f9:ae22 with SMTP id 00721157ae682-708cf22111fmr14887437b3.27.1746138604896; Thu, 01 May 2025 15:30:04 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGnhQhsdK/pUl5Ep5trnkbOHq20FLtQLGswtcLzNbSGWgsVqB6zCIXd1Y2WGiMg/2+ingdL2FSVSDCMuR/Ikcw= X-Received: by 2002:a05:690c:9993:b0:708:39f9:ae22 with SMTP id 00721157ae682-708cf22111fmr14887027b3.27.1746138604409; Thu, 01 May 2025 15:30:04 -0700 (PDT) MIME-Version: 1.0 References: <20250428181218.85925-1-npache@redhat.com> <20250428181218.85925-8-npache@redhat.com> In-Reply-To: From: Nico Pache Date: Thu, 1 May 2025 16:29:38 -0600 X-Gm-Features: ATxdqUEKwzN7AUPendRf30QvugXPulXaFgyWPRdBbqxwd-i4pO_KPKRotBGLrUo Message-ID: Subject: Re: [PATCH v5 07/12] khugepaged: add mTHP support To: Jann Horn Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, akpm@linux-foundation.org, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, david@redhat.com, baohua@kernel.org, baolin.wang@linux.alibaba.com, ryan.roberts@arm.com, willy@infradead.org, peterx@redhat.com, ziy@nvidia.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, dev.jain@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: SsXNk5DeqV-7fAMDnvWVGjsiRvv1yINOLKCxwHMAkAk_1746138605 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 8AFD4C0011 X-Rspamd-Server: rspam04 X-Stat-Signature: se95hosxw4d89yteosaoptxr6r4yhsr4 X-HE-Tag: 1746138607-870915 X-HE-Meta: U2FsdGVkX1/36RvE5rzBZ83+eW8PvvG1XaOnhGEI1rczgdXaCjOI1ziK+3n5ImEwVtvfbWaDEBSosh0JDYQWIXmCh6YmbROdfj4bS2kl2tjDB3Yvwczn9/zPM00fGApqERwhSMsxXVQ5TG4sqkIMu93ntA1+lz7WOKCaOrVYsTvOgczeEJattC4PLHLqZPagDnM1s2AG1sk/fyu2FADhNbIF8EHvMwHdGzAG39P4+SV1G6m8wLa2nfianY0IF9mfmbOWQaQ6wwUFUSJBty0S6mSWiSBIC8Fh+ZBm5uYyPhxQpTlNtcR67lEYimojv4E60WU7uaB+0erv9bS2MT449eyS/WqDiPjPa00AAEPpZDyo7rTtFph3DMMSdYn/5RrGKpJC/LypXCSJcJpLkFXVQnecROSTu88HX7nGkYBkAA4QvvTMu+iretU7ewigwFteg+clf6A0heimBOYYh4BzhqsE29W6WjzJxedfJksD9fj3FOvC6qkSxTsl7/Y9wPe4S/4LuFin8eAToZ+Ss7SLGaiPBjYNfJ8AaeiWGGBt2N2eTRYRgHGVs8kQWN5rG3PYo/mVRHHmLPKR9lZ/wHnnSnPhmUMYQLsKFASac62AeAwSBaeIrmBto+RRyLF7oImrZ8sySD+VL0ED45Q/luPoC8NgJNGxQ0lfVOIinh+yL2bjsiVz1GL54AZCsUcgQw5tG6kTT+UCY4UY3bxzagCW0rL4iMUPi6pFHSeO0YumaEAdM1GC6WjVRa5R8gS2Q/aAv3qA4lLCGkvNsfwYoYc/CxDC7GkJPN3xCchynRn6MpA1V2RlpgqGdLH/v4KjtRD9wqcuedQSNP7WghuRyMta5FFg7Qn/349R32aOlGKHEpzO4HHt0dFaFSK38W/ejkhi8jurzhmL1gjUa9t+45pPPqyzmPwskU9F9BTabt62mEJ3qJP/VINqAsda1gXkAib3H1eFbrnHPp9NhNLHwq2 oFOc/Uxd /W9OlfapFUOgDW85hxjyhbu3dZg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Apr 30, 2025 at 2:53=E2=80=AFPM Jann Horn wrote: > > On Mon, Apr 28, 2025 at 8:12=E2=80=AFPM Nico Pache wr= ote: > > Introduce the ability for khugepaged to collapse to different mTHP size= s. > > While scanning PMD ranges for potential collapse candidates, keep track > > of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit > > represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. I= f > > mTHPs are enabled we remove the restriction of max_ptes_none during the > > scan phase so we dont bailout early and miss potential mTHP candidates. > > > > After the scan is complete we will perform binary recursion on the > > bitmap to determine which mTHP size would be most efficient to collapse > > to. max_ptes_none will be scaled by the attempted collapse order to > > determine how full a THP must be to be eligible. > > > > If a mTHP collapse is attempted, but contains swapped out, or shared > > pages, we dont perform the collapse. > [...] > > @@ -1208,11 +1211,12 @@ static int collapse_huge_page(struct mm_struct = *mm, unsigned long address, > > vma_start_write(vma); > > anon_vma_lock_write(vma->anon_vma); > > > > - mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, addres= s, > > - address + HPAGE_PMD_SIZE); > > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm, _addre= ss, > > + _address + (PAGE_SIZE << order)); > > mmu_notifier_invalidate_range_start(&range); > > > > pmd_ptl =3D pmd_lock(mm, pmd); /* probably unnecessary */ > > + > > /* > > * This removes any huge TLB entry from the CPU so we won't all= ow > > * huge and small TLB entries for the same virtual address to > > It's not visible in this diff, but we're about to do a > pmdp_collapse_flush() here. pmdp_collapse_flush() tears down the > entire page table, meaning it tears down 2MiB of address space; and it > assumes that the entire page table exclusively corresponds to the > current VMA. > > I think you'll need to ensure that the pmdp_collapse_flush() only > happens for full-size THP, and that mTHP only tears down individual > PTEs in the relevant range. (That code might get a bit messy, since > the existing THP code tears down PTEs in a detached page table, while > mTHP would have to do it in a still-attached page table.) Hi Jann! I was under the impression that this is needed to prevent GUP-fast races (and potentially others). As you state here, conceptually the PMD case is, detach the PMD, do the collapse, then reinstall the PMD (similarly to how the system recovers from a failed PMD collapse). I tried to keep the current locking behavior as it seemed the easiest way to get it right (and not break anything). So I keep the PMD detaching and reinstalling for the mTHP case too. As Hugh points out I am releasing the anon lock too early. I will comment further on his response. As I familiarize myself with the code more, I do see potential code improvements/cleanups and locking improvements, but I was going to leave those to a later series. Thanks -- Nico >