From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6ECA9C3DA4A for ; Thu, 15 May 2025 03:05:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 61B7D6B00BB; Wed, 14 May 2025 23:05:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57D786B00D6; Wed, 14 May 2025 23:05:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A84D6B00FB; Wed, 14 May 2025 23:05:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 172526B00BB for ; Wed, 14 May 2025 23:05:11 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4477B160521 for ; Thu, 15 May 2025 03:05:11 +0000 (UTC) X-FDA: 83443650822.26.1144AF6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 7ABE480003 for ; Thu, 15 May 2025 03:05:09 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BI3LKyvq; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747278309; a=rsa-sha256; cv=none; b=zh/9YBNcaon46BA4BY9sLJrp6KOP0exim64L7mf4mfVVF6Aq0OVTjxC2O/dEWspA3/mvR1 9aLhB/49dcrMTpcyXoo1tdNdJrbY4KLFxhmqrSsFUq+bAm4pqYHprYLU4N4UPWyX5fn1gg nKPX7QdfT4NX1qVrEtYp71qloCmNBDc= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=BI3LKyvq; dmarc=pass (policy=quarantine) header.from=redhat.com; spf=pass (imf30.hostedemail.com: domain of npache@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=npache@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747278309; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jUd8UnFkJTtPK4IgdH54O/pbcO1DToD7H68z+e10DIQ=; b=bZHHI9tdM7CTQ9kuV3lOflzw2/82DNCYJkaf9KG/T3Ik0mSagWZoBAot5W4lwxDayDrc0f 8sX4xEmU3wIqkzHZ4Z1V647ay72Nb+BcOrxs2q55Zt8WkHAD7cNjomaMyvbNWaOnEaa1QA DfGqGf3LEfAR8//9Adh8MtVScQA8dWM= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1747278308; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jUd8UnFkJTtPK4IgdH54O/pbcO1DToD7H68z+e10DIQ=; b=BI3LKyvqkNVbPpPwifRTp3WcjKrp1cdYjGFLCfxJPiUWYvpYEsBXLQ5PQ/KFlhyiB/BVOh KeZ2j/2FBtwiKIs18rarKOA37dxBkDiBTxBe7HqegN7nYCRN41pZcNrYANi0iO4OUqrW6L GigsmWc9GWBcDLM9JVdsQRe9L3gojXA= Received: from mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (ec2-54-186-198-63.us-west-2.compute.amazonaws.com [54.186.198.63]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-269-wMEYCSzPMP6CG0uvMDmx3Q-1; Wed, 14 May 2025 23:05:04 -0400 X-MC-Unique: wMEYCSzPMP6CG0uvMDmx3Q-1 X-Mimecast-MFC-AGG-ID: wMEYCSzPMP6CG0uvMDmx3Q_1747278296 Received: from mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.93]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-03.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 811D51956050; Thu, 15 May 2025 03:04:55 +0000 (UTC) Received: from h1.redhat.com (unknown [10.22.88.116]) by mx-prod-int-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 123A01800DB9; Thu, 15 May 2025 03:04:39 +0000 (UTC) From: Nico Pache To: linux-mm@kvack.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org Cc: david@redhat.com, ziy@nvidia.com, baolin.wang@linux.alibaba.com, lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com, ryan.roberts@arm.com, dev.jain@arm.com, corbet@lwn.net, rostedt@goodmis.org, mhiramat@kernel.org, mathieu.desnoyers@efficios.com, akpm@linux-foundation.org, baohua@kernel.org, willy@infradead.org, peterx@redhat.com, wangkefeng.wang@huawei.com, usamaarif642@gmail.com, sunnanyong@huawei.com, vishal.moola@gmail.com, thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com, kirill.shutemov@linux.intel.com, aarcange@redhat.com, raquini@redhat.com, anshuman.khandual@arm.com, catalin.marinas@arm.com, tiwai@suse.de, will@kernel.org, dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org, jglisse@google.com, surenb@google.com, zokeefe@google.com, hannes@cmpxchg.org, rientjes@google.com, mhocko@suse.com, rdunlap@infradead.org Subject: [PATCH v6 05/12] khugepaged: generalize __collapse_huge_page_* for mTHP support Date: Wed, 14 May 2025 21:03:05 -0600 Message-ID: <20250515030312.125567-6-npache@redhat.com> In-Reply-To: <20250515030312.125567-5-npache@redhat.com> References: <20250515030312.125567-1-npache@redhat.com> <20250515030312.125567-2-npache@redhat.com> <20250515030312.125567-3-npache@redhat.com> <20250515030312.125567-4-npache@redhat.com> <20250515030312.125567-5-npache@redhat.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.93 X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 7ABE480003 X-Stat-Signature: oo8m1yjx4mjhmsgj6ef9n5umq1kpw79q X-HE-Tag: 1747278309-599483 X-HE-Meta: U2FsdGVkX186pjfOSbWhA1SY1toNeWQ1YDKBCnTVA7wn9Yh8ru4+tls/Hnwk7OdUjG7SdaGsJWmAX4OpVvtO+BxUAqct5nJfqJwcVckN+6g8Im6WyMfA2MJV3Jwh3/z37JdKGGiTCJ4YSdPCCUDH2BokD7fxA8Hbs50IqE6EcB9zO8xycx8th6k8dQOpetVKmeDaB8ACaKsrAhcBHisCae2Idy/2QlWGEoYqlXVfKZxznNxiWm+KRNf8MGwP7JUhfpa2jfxMxZw+PpBWaYG+lUsXvx74ppNka/i6AzVtWjQ0JKKVOt5o8APmZ29wQJyNXN6v08kejr9HI0gE0NVg5XbuiC8dnqVQPaH+onEnHDcREvclQUviwumkb+0lXb7244D6i1u8eMkDqJ8LYr1EWR+r0fA2tqGmqFwbJSPswIXkR1xwDm04gzkfutA+c9xD03aUQnJlknxIFcm6yWtOwm5TDiiXnWhtKbtdAUfj4R7er1FAgu9uvqhh9tUNJbExb02PpANO6ODO7US+tHwGQOfAcutSTnt/1FMmpihNqZsMN2WmKueXwq+q1zqGJrvFX5AqdD9Z0jG3gL8dTrkej14IAZ+n6F2XOrZeC5RRZB5hWy87XO3GxjnBdD3s/hY2/K34lWjm33bD+Ya9WjgsQNr04HjqvtX3cVU9Kog0ekWLRlemkhz4efceqN49PW3kDGAanuIO/Voe02Bf/IyGIFHy+k6P3GAEudiVDFYb7QduLlWZQ0xGUUCb376ZYd4rXKXfjSGBbBbAj5fSJJSNw8LRJR+PsuNmc7K/L1fEsbSsiRRKjRB47v458lZQ8h/5wHj91RkJGZ45ws+poOB9c/wBclYZkhVSwpeuN3IcySjRjk/Vj3hxwN0av8WEuyyKa8vcEjWq1f16qN+OiXXIQnYTO3CCtrblqnzZ8GdYy3eGyj6kw8vFrayXVl27SwgZQeE59sZD2UTq4l/wtrs tN1110+o 1yLjUxRB7GuYgwEB7ptdDvm1ekauo6zJ7NvoUjS/Z7VnbwGfnamL7l098dwtSKU9PTzONKk/Z7XXje0eOCmT2Mr4PaQa0Qpe4qLnAoZZdapShqq2gvgkWYCGtYzfs6H1czmfrOiGqSGPazVA5+C2xWUP+O1Bn01+/nvY5y8u898BGqElqbDXMeYf8qaY3F68rw8xPK7iIXLjbWwa5cG39KwO7BviXeP0MuU/tMNHeoMSwHpmrnXoYEDTdkTf/w7UbuHcRZGhOtWgrKKMEai1QGPfTWt+sypyxs2pfaKrsfEqZXhrS31VBkY8CZueARVM5bXVme/Y8eSVcoL1ygXWwdAi9btVmz1p0KV1qGxV+55/2e/BKMioWsL9xu/rW4trXQNTJgvVB0AVPoBdvesvxv0Al78V1CJs8HhBKAIcHZye5goXkoIH8fkDP7ryAFWS4oT//QbYXZB91s/tQgu/iR2/VS2ydClaigdocF6vvpM+roESycVpY2k3wovOzfpwDXcePoYY+nkJCSFbYU5ZWgysyOA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: generalize the order of the __collapse_huge_page_* functions to support future mTHP collapse. mTHP collapse can suffer from incosistant behavior, and memory waste "creep". disable swapin and shared support for mTHP collapse. No functional changes in this patch. Reviewed-by: Baolin Wang Co-developed-by: Dev Jain Signed-off-by: Dev Jain Signed-off-by: Nico Pache --- mm/khugepaged.c | 48 ++++++++++++++++++++++++++++++------------------ 1 file changed, 30 insertions(+), 18 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cf94ccdfe751..2af8f50855d4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -565,15 +565,17 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, struct collapse_control *cc, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct page *page = NULL; struct folio *folio = NULL; pte_t *_pte; int none_or_zero = 0, shared = 0, result = SCAN_FAIL, referenced = 0; bool writable = false; + int scaled_none = khugepaged_max_ptes_none >> (HPAGE_PMD_ORDER - order); - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; + for (_pte = pte; _pte < pte + (1 << order); _pte++, address += PAGE_SIZE) { pte_t pteval = ptep_get(_pte); if (pte_none(pteval) || (pte_present(pteval) && @@ -581,7 +583,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, ++none_or_zero; if (!userfaultfd_armed(vma) && (!cc->is_khugepaged || - none_or_zero <= khugepaged_max_ptes_none)) { + none_or_zero <= scaled_none)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -609,8 +611,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, /* See hpage_collapse_scan_pmd(). */ if (folio_maybe_mapped_shared(folio)) { ++shared; - if (cc->is_khugepaged && - shared > khugepaged_max_ptes_shared) { + if (order != HPAGE_PMD_ORDER || (cc->is_khugepaged && + shared > khugepaged_max_ptes_shared)) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -711,13 +713,14 @@ static void __collapse_huge_page_copy_succeeded(pte_t *pte, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { struct folio *src, *tmp; pte_t *_pte; pte_t pteval; - for (_pte = pte; _pte < pte + HPAGE_PMD_NR; + for (_pte = pte; _pte < pte + (1 << order); _pte++, address += PAGE_SIZE) { pteval = ptep_get(_pte); if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { @@ -764,7 +767,8 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, + u8 order) { spinlock_t *pmd_ptl; @@ -781,7 +785,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, * Release both raw and compound pages isolated * in __collapse_huge_page_isolate. */ - release_pte_pages(pte, pte + HPAGE_PMD_NR, compound_pagelist); + release_pte_pages(pte, pte + (1 << order), compound_pagelist); } /* @@ -802,7 +806,7 @@ static void __collapse_huge_page_copy_failed(pte_t *pte, static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, pmd_t *pmd, pmd_t orig_pmd, struct vm_area_struct *vma, unsigned long address, spinlock_t *ptl, - struct list_head *compound_pagelist) + struct list_head *compound_pagelist, u8 order) { unsigned int i; int result = SCAN_SUCCEED; @@ -810,7 +814,7 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, /* * Copying pages' contents is subject to memory poison at any iteration. */ - for (i = 0; i < HPAGE_PMD_NR; i++) { + for (i = 0; i < (1 << order); i++) { pte_t pteval = ptep_get(pte + i); struct page *page = folio_page(folio, i); unsigned long src_addr = address + i * PAGE_SIZE; @@ -829,10 +833,10 @@ static int __collapse_huge_page_copy(pte_t *pte, struct folio *folio, if (likely(result == SCAN_SUCCEED)) __collapse_huge_page_copy_succeeded(pte, vma, address, ptl, - compound_pagelist); + compound_pagelist, order); else __collapse_huge_page_copy_failed(pte, pmd, orig_pmd, vma, - compound_pagelist); + compound_pagelist, order); return result; } @@ -1000,11 +1004,11 @@ static int check_pmd_still_valid(struct mm_struct *mm, static int __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, - int referenced) + int referenced, u8 order) { int swapped_in = 0; vm_fault_t ret = 0; - unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE); + unsigned long address, end = haddr + (PAGE_SIZE << order); int result; pte_t *pte = NULL; spinlock_t *ptl; @@ -1035,6 +1039,14 @@ static int __collapse_huge_page_swapin(struct mm_struct *mm, if (!is_swap_pte(vmf.orig_pte)) continue; + /* Dont swapin for mTHP collapse */ + if (order != HPAGE_PMD_ORDER) { + pte_unmap(pte); + mmap_read_unlock(mm); + result = SCAN_EXCEED_SWAP_PTE; + goto out; + } + vmf.pte = pte; vmf.ptl = ptl; ret = do_swap_page(&vmf); @@ -1154,7 +1166,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, * that case. Continuing to collapse causes inconsistency. */ result = __collapse_huge_page_swapin(mm, vma, address, pmd, - referenced); + referenced, HPAGE_PMD_ORDER); if (result != SCAN_SUCCEED) goto out_nolock; } @@ -1201,7 +1213,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, pte = pte_offset_map_lock(mm, &_pmd, address, &pte_ptl); if (pte) { result = __collapse_huge_page_isolate(vma, address, pte, cc, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); spin_unlock(pte_ptl); } else { result = SCAN_PMD_NULL; @@ -1231,7 +1243,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, result = __collapse_huge_page_copy(pte, folio, pmd, _pmd, vma, address, pte_ptl, - &compound_pagelist); + &compound_pagelist, HPAGE_PMD_ORDER); pte_unmap(pte); if (unlikely(result != SCAN_SUCCEED)) goto out_up_write; -- 2.49.0