From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BF681E77188 for ; Mon, 6 Jan 2025 10:04:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 525646B0088; Mon, 6 Jan 2025 05:04:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 486F26B008C; Mon, 6 Jan 2025 05:04:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 19FD06B0089; Mon, 6 Jan 2025 05:04:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id E38206B0082 for ; Mon, 6 Jan 2025 05:04:50 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6E3CE1614D4 for ; Mon, 6 Jan 2025 10:04:50 +0000 (UTC) X-FDA: 82976593140.04.2D5830B Received: from mail-wm1-f42.google.com (mail-wm1-f42.google.com [209.85.128.42]) by imf26.hostedemail.com (Postfix) with ESMTP id 6890A14000B for ; Mon, 6 Jan 2025 10:04:48 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SLrlOvjI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736157888; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KUWWdbE0JwLkOyEqtHtl+YMzGRX6Cx3X1P0UKjdToP4=; b=dmE6uSn8XYlYMloOjqo9kbxE2gGD7C7kKVy+SxEjARGqeLxA/+kE5ghSmI9zTovykQHSYy YD1exnpGwm9gYn9ITHf1lWnxyOQ4qXPL6DykrQD/gYPeMKbXIxrpzQTWPNOPvHoCJF3j4t UPsjf85lKQUA7mbfqQxLExDv5HXbhzs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736157888; a=rsa-sha256; cv=none; b=PIAuXc7Rv7KSAya1Y9JUGuezIpoUITZBKxIDS/CWRpDlFj15dV9JniPz5JuvvUP34Z4PCV OQATBhksWYZI/8JUMhz6mhc1PaO2Cvbpkv76FuxjGFcjUEHV/CwE2MKT6WLe2XYrFbNaY1 gPBRHIHvTaaiISm/2mKdY2oW0DwTAno= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=SLrlOvjI; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf26.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.42 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com Received: by mail-wm1-f42.google.com with SMTP id 5b1f17b1804b1-4361b6f9faeso85241865e9.1 for ; Mon, 06 Jan 2025 02:04:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736157887; x=1736762687; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=KUWWdbE0JwLkOyEqtHtl+YMzGRX6Cx3X1P0UKjdToP4=; b=SLrlOvjIpJoh3huN+Pse9wxxvnHqcT0M9RO+qCJ7U8Kln8IeJ8m2zPWvNqY7nvNZrU BWz+Pe32o2ATOh8T38vCwyjoCEOxZ2bUHVoa/F2FUHjJfqaybu63UIXZwqkar85+2blT gu5Nwuj2HoQ5c+YYtVSY2xM6TuTpORXB+Y4P/SeZh8msi+sfy5Khpa4on0dHElKKKukO w4jYnJVC502TnPsBQQPKfN2GZVmIx3pgpAGTSKIdrrkw2/Yu2BEzRQ5NzD8aS82VQtC0 xLzVIPIF0BoBOhlEOC+vzsV2D4jU6VnVdbEtcioKwP681zquMKWFheQfjrZDS39Md816 JSsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736157887; x=1736762687; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=KUWWdbE0JwLkOyEqtHtl+YMzGRX6Cx3X1P0UKjdToP4=; b=C3k/QB2gfVUhuKhfHFB9IbK3v5XdECPw9Q9G401w64SJhfUvlf2DSE8uDVoSFbJC6g Ym0cdV+RYRmOnpueCXxxJv6B678I4qGWOrGxjntRJRFbOz62S8A4k5+XQLpTBWGyhyZi MRUgJFHS5CmZtvByxmkysaf/7jYORP7ep9S3vVpHvnYsGExG7C3W+/mT1Stm9hDWEgFQ iJ3l4trY3+00aokJpTncgA0BZAslOSPUzEd6/Fz53KtlFe1TU2XS0upQb/7MSw4W4+YL +gJkJNzkyaKf0t9cE8T9SeMM4xlJd2kZ8cZcHFxc/2m/NEe70pQYidNGG1oCNgyXG3Aq 9RXA== X-Forwarded-Encrypted: i=1; AJvYcCV2P1WObnuhvtntxZlENkMo6NkbJjxIxaR8TOlvfoEDOCaxkMLHrBcuOuKLybw90waV6opg9/PQPw==@kvack.org X-Gm-Message-State: AOJu0YzVOds/xTgw/hqRk1E2ju8O+nYBEjBYTftzg1qF5x8a+0GXHUuj bLjuAXwuyMZkcTUQMCP1ybsr63XL71+rQg7G1CRhGdQMEC0XWBqu X-Gm-Gg: ASbGncu65sPpzGjTsputXashZdYzRp/pZOHiHbz5kKmuiEv/8dGOh+lNuMyUplxwxQx 83VjcaGzlKGaeHTzfWTw/ClqII3S5WY46UWMRJ+FnukhViHvR9FD9ZFv0wM3ChuskcLcHeUDuSX QJp+qjtw0GpEHoPf8VffPkpYuhFjqz6c/ePYgSnGVEW4lE2GuLY4xc5+2RnCzFwcl78sIF9KHgH eu4ptehcIF267yqFiNFdE0kZOty6hvBdOQAbycOQPqg1irxpG5G449D6WuE9mJkFai4+nesCFIb L3wyYc31dzSYMEXApL7bDSCWZVu34icHWPTGnxTEwwhs5SvuEWPL X-Google-Smtp-Source: AGHT+IERPh32d5Udv4bEm5Vk1YmTAy/qLYZxoujIM5UW2o0iTGU1c8fx7/ZAMMTysUS6CgPgE6HZqg== X-Received: by 2002:a05:600c:1d12:b0:434:f9ad:7222 with SMTP id 5b1f17b1804b1-436699ffa31mr495988575e9.7.1736157886567; Mon, 06 Jan 2025 02:04:46 -0800 (PST) Received: from ?IPV6:2a01:4b00:b211:ad00:1096:2c00:b223:9747? ([2a01:4b00:b211:ad00:1096:2c00:b223:9747]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-4366127c508sm563667385e9.33.2025.01.06.02.04.45 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 06 Jan 2025 02:04:46 -0800 (PST) Message-ID: <56bf9df5-febf-4bef-966f-d4d71365a18d@gmail.com> Date: Mon, 6 Jan 2025 10:04:45 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH 07/12] khugepaged: Scan PTEs order-wise To: Dev Jain , akpm@linux-foundation.org, david@redhat.com, willy@infradead.org, kirill.shutemov@linux.intel.com Cc: ryan.roberts@arm.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, cl@gentwo.org, vbabka@suse.cz, mhocko@suse.com, apopple@nvidia.com, dave.hansen@linux.intel.com, will@kernel.org, baohua@kernel.org, jack@suse.cz, srivatsa@csail.mit.edu, haowenchao22@gmail.com, hughd@google.com, aneesh.kumar@kernel.org, yang@os.amperecomputing.com, peterx@redhat.com, ioworker0@gmail.com, wangkefeng.wang@huawei.com, ziy@nvidia.com, jglisse@google.com, surenb@google.com, vishal.moola@gmail.com, zokeefe@google.com, zhengqi.arch@bytedance.com, jhubbard@nvidia.com, 21cnbao@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Johannes Weiner References: <20241216165105.56185-1-dev.jain@arm.com> <20241216165105.56185-8-dev.jain@arm.com> Content-Language: en-US From: Usama Arif In-Reply-To: <20241216165105.56185-8-dev.jain@arm.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: 7ej5ikskkzk3b6g8n3mf6nmrnswo47id X-Rspamd-Queue-Id: 6890A14000B X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1736157888-11627 X-HE-Meta: U2FsdGVkX187BhRQOsLjFsapuCz6+H4Dccqnyk+Ajwk4WTBdl9l3k680bzl1B/7IyQhi2OkA3FRvb0MWC3YTaCqBRGvDMxihGB/HSqfOL9HzgxV7dUE5kf2KO9PpSTelKaIIBzTyqAtrlQfBp+CRTInGkDp15MTny0r0geVdnjumuxzKbxJeXlMLNLq5E0WIPnuuoCDS7wTuq3J/378ufE5/bSMbE8EEKY8hX1G+27yuWWrbB+5r2hEbMG/N+txQlK0lBSzbstDqecKK8945PmQ/JScw8qIDcr9v0wdVL6betOseLza8sWJ6JSjixjYMNritUgdhVPIqVcyB3K58W0RJVtz2dcUkJI3+c/0aKe4w8BA7S36Lg2vJ+ktnoXi2K7uVnIHyAucq8IZKluNFxqtZ7ij+TPRfeqV3ZalinbtwkbPsb6QyaJZDHSnI101vXmggVm5eWaPxEJNx/DhbhgcxuHgGhBKe22VyLW0NZLNbiqyuVeSqeknn3QBUw7UlqsxxxuubVCUTMVJbLqjN9fGXkLzdnlUt78U/+flQxrDMZgrPsn8kESXfYdjuWakhOsJcQmFjTTDFVM7QK1yxoD/fb9K4sxxvvYNvdY/CXbU9BR4TN4HsadFqWbkWr7HwpEYgujBDEYeO6xqrVoNqtIMcW3+qT9a5O/d5np0FqpFsU1If9wz8YGY8qU+RSlVn1m18DvgE0vOQCT0fe1sAokt1nnogZoGfBrQS+Q5OtwR1YZ1fc0pdB+Tb6Kvm7eyUsDGb/OyVwd03VgBl81WlT6AGihxtsSkfBkVPndyZcmB5lMmWKr++iWQYXPN1zPqGtywQhqM+PTysXZiWyf9415w2hyyVnOKEaYOg+omKgp1HdiulLv8Hh0SW89+tMC6bfmWxsjmFTjPcZOsK+iLp6gO8uz5jx++kvV9bezsIRsAdL8+8INrDWrZfOkKTl6Cwi4JOfX+vBSR6E9rQaiu 75zOcj/Z lIHGIvLvFhJVMNxhD8frFtewdWYtl8MhoFUk7SkSO/xs3+4329oSmiteLCxx87x9iNGDi1tznaEknARnp0yYM5Q+6V1wQXIbr9w0KbnBlDb0yX/JdIcWf/tP34tEZGCRBrC5I1Mc/spuaUsUxGyeKabwYRnVOZiGBRy9Re8t3FZ9GnasxCC/0yFyXu+fFiezDkb6xGfHN6DjjtqZRYcXuBODWif7VkTZ2Wl1FipYThKdp55hKrDX+eXXDgfwet2SrV24cm2L7q5S+LXs0B8RM4YEzN8TqBka8MCYQLz4+Mhfz4+Zh7BoMgiRk/I4JZ3dzA8Bu6d8CoNHnxvJIex3RStERdrXySM4RXG3//Gr8O8LX8MncYw29UtodezLu536oAEYHJu2Ms5qtJQnVwttGOaD5St1MwHxe8s5TYvE+YEzqV1S3YVD0Lis+hFmKohjT/aii3p6VVnryq442dxJiIqlQMfaD6w942GUBSkTlPKC1wLv7V45aut21AxaFtFCYxS/gtTMl2X0eRPt5Thmbl8XJ5A+PFDl9sEbcUq5F/bikFziau9aSVZN/wuP6GdpGB5nMfWoQicorz2E9aojMdgxqyM3r6m/O14lj X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 16/12/2024 16:51, Dev Jain wrote: > Scan the PTEs order-wise, using the mask of suitable orders for this VMA > derived in conjunction with sysfs THP settings. Scale down the tunables; in > case of collapse failure, we drop down to the next order. Otherwise, we try to > jump to the highest possible order and then start a fresh scan. Note that > madvise(MADV_COLLAPSE) has not been generalized. > > Signed-off-by: Dev Jain > --- > mm/khugepaged.c | 84 ++++++++++++++++++++++++++++++++++++++++--------- > 1 file changed, 69 insertions(+), 15 deletions(-) > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 886c76816963..078794aa3335 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -20,6 +20,7 @@ > #include > #include > #include > +#include > > #include > #include > @@ -1111,7 +1112,7 @@ static int alloc_charge_folio(struct folio **foliop, struct mm_struct *mm, > } > > static int collapse_huge_page(struct mm_struct *mm, unsigned long address, > - int referenced, int unmapped, > + int referenced, int unmapped, int order, > struct collapse_control *cc) > { > LIST_HEAD(compound_pagelist); > @@ -1278,38 +1279,59 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm, > unsigned long address, bool *mmap_locked, > struct collapse_control *cc) > { > - pmd_t *pmd; > - pte_t *pte, *_pte; > - int result = SCAN_FAIL, referenced = 0; > - int none_or_zero = 0, shared = 0; > - struct page *page = NULL; > + unsigned int max_ptes_shared, max_ptes_none, max_ptes_swap; > + int referenced, shared, none_or_zero, unmapped; > + unsigned long _address, org_address = address; > struct folio *folio = NULL; > - unsigned long _address; > - spinlock_t *ptl; > - int node = NUMA_NO_NODE, unmapped = 0; > + struct page *page = NULL; > + int node = NUMA_NO_NODE; > + int result = SCAN_FAIL; > bool writable = false; > + unsigned long orders; > + pte_t *pte, *_pte; > + spinlock_t *ptl; > + pmd_t *pmd; > + int order; > > VM_BUG_ON(address & ~HPAGE_PMD_MASK); > > + orders = thp_vma_allowable_orders(vma, vma->vm_flags, > + TVA_IN_PF | TVA_ENFORCE_SYSFS, BIT(PMD_ORDER + 1) - 1); > + orders = thp_vma_suitable_orders(vma, address, orders); > + order = highest_order(orders); > + > + /* MADV_COLLAPSE needs to work irrespective of sysfs setting */ > + if (!cc->is_khugepaged) > + order = HPAGE_PMD_ORDER; > + > +scan_pte_range: > + > + max_ptes_shared = khugepaged_max_ptes_shared >> (HPAGE_PMD_ORDER - order); > + max_ptes_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); > + max_ptes_swap = khugepaged_max_ptes_swap >> (HPAGE_PMD_ORDER - order); > + referenced = 0, shared = 0, none_or_zero = 0, unmapped = 0; > + Hi Dev, Thanks for the patches. Looking at the above code, I imagine you are planning to use the max_ptes_none, max_ptes_shared and max_ptes_swap that is used for PMD THPs for all mTHP sizes? I think this can be a bit confusing for users who aren't familiar with kernel code, as the default values are for PMD THPs, for e.g. max_ptes_none is 511, and the user might not know that it is going to be scaled down for lower order THPs. Another thing is, what if these parameters have different optimal values then the scaled down versions of mTHP? The other option is to introduce these parameters as new sysfs entries per mTHP size. These parameters can be very difficult to tune (and are usually left at their default values), so I don't think its a good idea to introduce new sysfs parameters, but just something to think about. Regards, Usama > + /* Check pmd after taking mmap lock */ > result = find_pmd_or_thp_or_none(mm, address, &pmd); > if (result != SCAN_SUCCEED) > goto out; > > memset(cc->node_load, 0, sizeof(cc->node_load)); > nodes_clear(cc->alloc_nmask); > + > pte = pte_offset_map_lock(mm, pmd, address, &ptl); > if (!pte) { > result = SCAN_PMD_NULL; > goto out; > } > > - for (_address = address, _pte = pte; _pte < pte + HPAGE_PMD_NR; > + for (_address = address, _pte = pte; _pte < pte + (1UL << order); > _pte++, _address += PAGE_SIZE) { > pte_t pteval = ptep_get(_pte); > if (is_swap_pte(pteval)) { > ++unmapped; > if (!cc->is_khugepaged || > - unmapped <= khugepaged_max_ptes_swap) { > + unmapped <= max_ptes_swap) { > /* > * Always be strict with uffd-wp > * enabled swap entries. Please see > @@ -1330,7 +1352,7 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm, > ++none_or_zero; > if (!userfaultfd_armed(vma) && > (!cc->is_khugepaged || > - none_or_zero <= khugepaged_max_ptes_none)) { > + none_or_zero <= max_ptes_none)) { > continue; > } else { > result = SCAN_EXCEED_NONE_PTE; > @@ -1375,7 +1397,7 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm, > if (folio_likely_mapped_shared(folio)) { > ++shared; > if (cc->is_khugepaged && > - shared > khugepaged_max_ptes_shared) { > + shared > max_ptes_shared) { > result = SCAN_EXCEED_SHARED_PTE; > count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); > goto out_unmap; > @@ -1432,7 +1454,7 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm, > result = SCAN_PAGE_RO; > } else if (cc->is_khugepaged && > (!referenced || > - (unmapped && referenced < HPAGE_PMD_NR / 2))) { > + (unmapped && referenced < (1UL << order) / 2))) { > result = SCAN_LACK_REFERENCED_PAGE; > } else { > result = SCAN_SUCCEED; > @@ -1441,9 +1463,41 @@ static int hpage_collapse_scan_ptes(struct mm_struct *mm, > pte_unmap_unlock(pte, ptl); > if (result == SCAN_SUCCEED) { > result = collapse_huge_page(mm, address, referenced, > - unmapped, cc); > + unmapped, order, cc); > /* collapse_huge_page will return with the mmap_lock released */ > *mmap_locked = false; > + > + /* Immediately exit on exhaustion of range */ > + if (_address == org_address + (PAGE_SIZE << HPAGE_PMD_ORDER)) > + goto out; > + } > + if (result != SCAN_SUCCEED) { > + > + /* Go to the next order. */ > + order = next_order(&orders, order); > + if (order < 2) > + goto out; > + goto maybe_mmap_lock; > + } else { > + address = _address; > + pte = _pte; > + > + > + /* Get highest order possible starting from address */ > + order = count_trailing_zeros(address >> PAGE_SHIFT); > + > + /* This needs to be present in the mask too */ > + if (!(orders & (1UL << order))) > + order = next_order(&orders, order); > + if (order < 2) > + goto out; > + > +maybe_mmap_lock: > + if (!(*mmap_locked)) { > + mmap_read_lock(mm); > + *mmap_locked = true; > + } > + goto scan_pte_range; > } > out: > trace_mm_khugepaged_scan_pmd(mm, &folio->page, writable, referenced,