From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D112BE7719A for ; Fri, 10 Jan 2025 01:28:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D27156B00A2; Thu, 9 Jan 2025 20:28:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CB3396B00A4; Thu, 9 Jan 2025 20:28:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ADE156B00A5; Thu, 9 Jan 2025 20:28:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 871086B00A2 for ; Thu, 9 Jan 2025 20:28:34 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0EDC18071E for ; Fri, 10 Jan 2025 01:28:34 +0000 (UTC) X-FDA: 82989807348.01.948A018 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf04.hostedemail.com (Postfix) with ESMTP id A736340012 for ; Fri, 10 Jan 2025 01:28:31 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CicrqXvx; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736472511; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=C+xTTi57fd7QpNVX8CraejdvYx8PS44boRe11Z7JSMg=; b=0+9xxogGmi9gaX8hW8ru67kaYzMbnOkXkCBH23LKbMSC6eVQkf/BClevM0NuP0hm2npDz8 55N3+Qxbyo0WiVb/PCXrz+AweG2GALWT/0AG953UcwI9no095EpbP6M2UPDDPbMMRZrskZ 2EGTzKCuEPKdfbozoFOOJ1yeUk5F0yk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736472511; a=rsa-sha256; cv=none; b=3rf3nK6q+YflLkYSzP6F72s6UFJYmRGzGLftfh8P1Tdx1ae8gSVVQsy7ylvvkNwFYNqf1n oWKplwN/9O0GS3ypeaT12oIc+kf3N6FbzlOAPq8du6Oj+ilV8GedlxxbbO/PkGjP2gQhh8 MGDS88nxVG/W9HZ9syhjnC5cV5uAWI0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CicrqXvx; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf04.hostedemail.com: domain of npache@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=npache@redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1736472511; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=C+xTTi57fd7QpNVX8CraejdvYx8PS44boRe11Z7JSMg=; b=CicrqXvx/IOyz+dkxIC3IGU9yYtUUCAH1fhbGKCUPY/mEdSmYfryTxQahdWiuWGXYW3nEn 8YXQkRk4nvd0If9YSij10pvIx/LCUKHNlJ7KgkwTWrpP9XsDS8SGPqgzJumLJI7YAydAcN KOmDwAPO9dtPRCQTwMKNS6/GlVXG8Mk= Received: from mail-yb1-f197.google.com (mail-yb1-f197.google.com [209.85.219.197]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-292-nWwavFeTMvedE-P1q2ni7A-1; Thu, 09 Jan 2025 20:28:28 -0500 X-MC-Unique: nWwavFeTMvedE-P1q2ni7A-1 X-Mimecast-MFC-AGG-ID: nWwavFeTMvedE-P1q2ni7A Received: by mail-yb1-f197.google.com with SMTP id 3f1490d57ef6-e53bf9e60e4so3621356276.1 for ; Thu, 09 Jan 2025 17:28:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736472508; x=1737077308; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=C+xTTi57fd7QpNVX8CraejdvYx8PS44boRe11Z7JSMg=; b=G9raZoPdGRMf+ggesi1mdbjywO3MLDEtDJKc+pm/KXRNzjAt4a6istJJy1rh8apbQI F3ywHLYzkStfdsHKgBT2pEiPnzU882lBxjx7UTFaprl8fIw8CWUmA0pPQzP4KWQjHHns IhTaZQ8V2YunF7LR+pAXGEzdZPmkrw2v79niFL3SRLGTHOcXZnJonPqUa4hk3HfUF/qY FOpT2RUUx+/8hj9BigkEZF06lDJO7oMOtfSj5Cv9UUdZN51Qq4qYOYprh6E8KBLBeTnv DKZ2BrNa5AuncJBY0BG/pHjNllLGZwDsy46eTXFK+KH5RtZmmuOcLRYpGt9CrmG2jruC Llpg== X-Forwarded-Encrypted: i=1; AJvYcCXJ+w0/TIIkalfu4u7fLGWQBOcFF6tPU2I2RuSDlMF9vmeTs2EOK4VuTW77g7Gw2GKSYDG8JwpxMg==@kvack.org X-Gm-Message-State: AOJu0Yyg0kHZoRSmPQopH80ILOklk2Am1/lxkO6/bptxQ42Rln5upBGU VLQGteYJpI3v177VgDi4Pv1aSYOu10oirKD9i/iFGMLROKO1HFfcdWkknD+EVpNxMQapurThPfP htfrj+LAjUcz2Z7Y4R+lDYyIjxb//Vqgns0qqKdpAZ9eVXfHp7gwEg+PbIGskuXnmKXd+7glY7b Qb5JRKlsAuHnFk+GkzUDFY3OY= X-Gm-Gg: ASbGnctMJkGJEeXjWKGSUSlz0w5CQVbay2zv7LvuBw8VmrSKvmDaugrHydmoliDIUQA nMHOyiFJdvF9GmCgt6lbgZiHssDwWdc9kMZIaGmxi3FlZIFpJ60AR X-Received: by 2002:a05:690c:d91:b0:6ef:87de:5ddf with SMTP id 00721157ae682-6f531246a65mr73478427b3.11.1736472508330; Thu, 09 Jan 2025 17:28:28 -0800 (PST) X-Google-Smtp-Source: AGHT+IEl1UHBym4/4vNiq6Wqzbk2lhZb7mqJVudxJj+AYmOzUwvfQYiod4Oo4ZlC6nlo3fhpLRe7LUsS+IGMRV6td6k= X-Received: by 2002:a05:690c:d91:b0:6ef:87de:5ddf with SMTP id 00721157ae682-6f531246a65mr73478327b3.11.1736472508031; Thu, 09 Jan 2025 17:28:28 -0800 (PST) MIME-Version: 1.0 References: <20250108233128.14484-1-npache@redhat.com> In-Reply-To: From: Nico Pache Date: Thu, 9 Jan 2025 18:28:02 -0700 X-Gm-Features: AbW1kvZK0CSOE4DZ8XgehV6BCGSgp4zf3Q5RRqDDsDYiLgmYhHsfCzYV28QRgLw Message-ID: Subject: Re: [RFC 00/11] khugepaged: mTHP support To: Dev Jain Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, willy@infradead.org, kirill.shutemov@linux.intel.com, david@redhat.com, aarcange@redhat.com, raquini@redhat.com, sunnanyong@huawei.com, usamaarif642@gmail.com, audra@redhat.com, akpm@linux-foundation.org X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: ViyECY2rXupLDyLTXEyY1ojb-Kq23n_W4WVbxSAeVcU_1736472508 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 1j68mwum5ig4psu3jz7kawzah5dxdxzt X-Rspamd-Queue-Id: A736340012 X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1736472511-639010 X-HE-Meta: U2FsdGVkX18s5b4Ce5Wjdd599ZXHxrfHZ6BPM+qfbaA0G1YAU+62EmshyPQDPHRp5FNlPBy320C2yuHz4SDfSkWnlT7DaWg3V30mYESmYUnttUjQrUjn7L8aYUhf+3jELBSmJ/ignT4H4kJd0wa5idgVoo9MqE8Ep0ISubj1qeP/f0ezU1iwm9dVTdY59539xxc8nCXA6svE1So+9B4jRhVXvlK7pRS+xIgobSiIDHhwbDyCAlnWjAPiFZ7UrUk71gBXGPfko9XjS7Yzr2XjwDC7rfCO5jeKhfyIhwkyFHLyigzoZkGdmVbxUdsjsgZGKZ4OBoJValvpdgWd3clqRDKzAh8jznchZeFrf57u9/X/VPTrEH0PPAkXhVz6mtIoKhTJfDZ4LpoRv9wPal02C1A4uQ4F/PPtNzQg0ff8gPY1npFI8+bR54j7955Lfr4BBlUEw2VgGrxPu19Lq+bX99so410CauChrG5WUF+pjAfx8EDJfNTwObjNQGmrvKjYYUnrte/wBHUtDJZy9KFpsDSMb+ETVicG69tKB5GsP30HGOXNPEuUHVyxBatnNnv2uKvrBaC68ff2U1Jh6AngYLKnFxlG9TihfDYpL4QezMpr8hGifgdJVhvwas+pbXgTQm14sKWnM0IxCkFLbaCj7zyaEtyqcFjCmdzdkpPjTo+dnOYd+tscvXW3IshoGDEN5K8XwCii2I6Hsr7lGdVWmp+rZVe0zHRz/y20fU4PzD+c2ajnYpU7F2f0kDBNkzuI2XnlQmMMkTylLbmKWZaiQhyd2m+3+bQ5YUfvYsM8c4/mm0m/VzytYBAdE1lcpFXQwonqV2ldayYn7oVuiANf5Uv+O511HA5ysnw++k8LYRRTwEORIwFDABRQwbIibcU+mDFOPHE2I58VaZTyvAMXmj8aQX8g5mOXccp8/SLnaxTpJ0OY+33CSBO6uur4HU4tjkLqcKYQz6zzB/4qG9u VS1LFOCu ut+WuX6OOtZsGF4FenS8VOvsB0gOANYxhsAWBtiK28w0cQtmQCrGo5jxGyKr+G81tMdPGbZxwqP2hcVKT/DMVLPJJgI8+PojDmVKU/fWRPe6J5MvuMTNfqROeNZ4e0jA6Ti+ppzhCztAMByUM8RZ0qHOhY8ZW2jDFGH2dF11bhAUrJpLyXqK69NwA7CSXpTzC2AhS+q/8YW5GHBM2SZAq8jmRolmWmbAqEn6f2Q6wlWqwPu8v+SCFu44Xq+crax0kA6C9e/DGeswHy7uQNnBqzdFiWejCErhaj0kMLvvpYqTtzOZy3bocaSzOlxD3hNesJIW8q2TilAxJGxPc9JUVn1QWeGSIFonB1sxk73ly56Dpk/qVy4ZZ+8tUzAKgL/t/Xw9MlYsD7sMdEoARaXN8iWdWxA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 8, 2025 at 11:27=E2=80=AFPM Dev Jain wrote: > > > On 09/01/25 5:01 am, Nico Pache wrote: > > The following series provides khugepaged and madvise collapse with the > > capability to collapse regions to mTHPs. > > > > To achieve this we generalize the khugepaged functions to no longer dep= end > > on PMD_ORDER. Then during the PMD scan, we keep track of chunks of page= s > > (defined by MTHP_MIN_ORDER) that are fully utilized. This info is track= ed > > using a bitmap. After the PMD scan is done, we do binary recursion on t= he > > bitmap to find the optimal mTHP sizes for the PMD range. The restrictio= n > > on max_ptes_none is removed during the scan, to make sure we account fo= r > > the whole PMD range. max_ptes_none is mapped to a 0-100 range to > > determine how full a mTHP order needs to be before collapsing it. > > > > Some design choices to note: > > - bitmap structures are allocated dynamically because on some arch's > > (like PowerPC) the value of MTHP_BITMAP_SIZE cannot be computed at > > compile time leading to warnings. > > - The recursion is masked through a stack structure. > > - A MTHP_MIN_ORDER was added to compress the bitmap, and ensure it wa= s > > 64bit on x86. This provides some optimization on the bitmap operat= ions. > > if other arches/configs that have larger than 512 PTEs per PMD wan= t to > > compress their bitmap further we can change this value per arch. > > > > Patch 1-2: Some refactoring to combine madvise_collapse and khugepaged > > Patch 3: A minor "fix"/optimization > > Patch 4: Refactor/rename hpage_collapse > > Patch 5-7: Generalize khugepaged functions for arbitrary orders > > Patch 8-11: The mTHP patches > > > > This series acts as an alternative to Dev Jain's approach [1]. The two > > series differ in a few ways: > > - My approach uses a bitmap to store the state of the linear scan_pm= d to > > then determine potential mTHP batches. Devs incorporates his direc= tly > > into the scan, and will try each available order. > > - Dev is attempting to optimize the locking, while my approach keeps= the > > locking changes to a minimum. I believe his changes are not safe f= or > > uffd. > > - Dev's changes only work for khugepaged not madvise_collapse (altho= ugh > > i think that was by choice and it could easily support madvise) > > - Dev scales all khugepaged sysfs tunables by order, while im removi= ng > > the restriction of max_ptes_none and converting it to a scale to > > determine a (m)THP threshold. > > - Dev turns on khugepaged if any order is available while mine still > > only runs if PMDs are enabled. I like Dev's approach and will most > > likely do the same in my PATCH posting. > > - mTHPs need their ref count updated to 1< > Well, I did not miss it :) Sorry! I missed that in my initial review of your code. Seeing that would have saved me a few hours of debugging xD > > int nr_pages =3D folio_nr_pages(folio); > folio_ref_add(folio, nr_pages - 1); Once I found the fix I forgot to cross reference with your series. Missing this ref update was causing the issue I alluded to in your RFC thread. When you said you ran into some issues on the debug configs I figured it was the same one. > > > > > Patch 11 was inspired by one of Dev's changes. > > > > [1] https://lore.kernel.org/lkml/20241216165105.56185-1-dev.jain@arm.co= m/ > > > > Nico Pache (11): > > introduce khugepaged_collapse_single_pmd to collapse a single pmd > > khugepaged: refactor madvise_collapse and khugepaged_scan_mm_slot > > khugepaged: Don't allocate khugepaged mm_slot early > > khugepaged: rename hpage_collapse_* to khugepaged_* > > khugepaged: generalize hugepage_vma_revalidate for mTHP support > > khugepaged: generalize alloc_charge_folio for mTHP support > > khugepaged: generalize __collapse_huge_page_* for mTHP support > > khugepaged: introduce khugepaged_scan_bitmap for mTHP support > > khugepaged: add mTHP support > > khugepaged: remove max_ptes_none restriction on the pmd scan > > khugepaged: skip collapsing mTHP to smaller orders > > > > include/linux/khugepaged.h | 4 +- > > mm/huge_memory.c | 3 +- > > mm/khugepaged.c | 436 +++++++++++++++++++++++++-----------= - > > 3 files changed, 306 insertions(+), 137 deletions(-) > > >