From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34041C7EE30 for ; Thu, 26 Jun 2025 12:44:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BAB466B0096; Thu, 26 Jun 2025 08:44:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B2CD06B0099; Thu, 26 Jun 2025 08:44:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CE0B6B009A; Thu, 26 Jun 2025 08:44:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8788F6B0096 for ; Thu, 26 Jun 2025 08:44:56 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 37A8F801C5 for ; Thu, 26 Jun 2025 12:44:56 +0000 (UTC) X-FDA: 83597521392.06.9DECD14 Received: from mail-wr1-f48.google.com (mail-wr1-f48.google.com [209.85.221.48]) by imf07.hostedemail.com (Postfix) with ESMTP id 4ABF14000D for ; Thu, 26 Jun 2025 12:44:54 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TTK5V2Ni; spf=pass (imf07.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.221.48 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750941894; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W9Q6vWQMAD+pyX5yU0mYRqo1CgBOlEkw6ZKpgSd/AcU=; b=0TdwnjPJS5fKyqI22lFzhPiL4g0W6zx3q05vFwAUj/aVjX/X+8QY1eNRI2JIYAjZlrzhLB Li2NhB0c2lwZEieYsNaCik/JYlq0atSuxk5HOzcFikTJtAXBBd1ezbeJcN4P5sqc4Dh2D7 GQxVLXWjCGOOflTOH5h9RhrosjOVzBU= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=TTK5V2Ni; spf=pass (imf07.hostedemail.com: domain of ioworker0@gmail.com designates 209.85.221.48 as permitted sender) smtp.mailfrom=ioworker0@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750941894; a=rsa-sha256; cv=none; b=r6goHDILpe9o2Wru7yTIA8rkrFR3b+4sBpPE03shewqrks/DowckF5d9tLbLvH0kKIWHhc fG20JPPPxgE4o0rJFGzl4tTKKrSyCNlfwaPREm/HwSc75S7eF8nZoJkKlynKRlM5sGkENm yNhEx68u9doWDyRHkFA+4W2JwKcMBK0= Received: by mail-wr1-f48.google.com with SMTP id ffacd0b85a97d-3a54700a46eso493881f8f.1 for ; Thu, 26 Jun 2025 05:44:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750941893; x=1751546693; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=W9Q6vWQMAD+pyX5yU0mYRqo1CgBOlEkw6ZKpgSd/AcU=; b=TTK5V2Ni170xnx8iaG0HU8a5U1QGjNFPIx4gyciXDKcwymycXPd2hA7Pu/3afiCVXu 4uegrR9BymKL8PvoArZktU9QQgeYBKX/KrmKAmMafmMvqgoDMAkLG2wAaeFhUmOlapoU 2SR0YIAXNwlC7FEeMTSpt4fNSN167HZyWeaU2HQtWgnNuSV9Epmr+mPDlCGAb+aIFcYW XDdyz+kN+8bfLRWBMWYVhPKz/ZhLv/O9wm2oCJJbWPpG7yXyIrQBlBL7SbKSxLkpG3Yp SdaAP4pvpGvNNHuZJpHMsUok3jDSqD56GvY40kO9fY2kvo4Dir3jTdWWvfU63fxw+yDa Uhqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750941893; x=1751546693; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=W9Q6vWQMAD+pyX5yU0mYRqo1CgBOlEkw6ZKpgSd/AcU=; b=g+eFihi6X6c0D08uig8IG+pykk6SEQmJ4i7KVAJDBb1Hrz2djxhNKFOUpVHx+QWoA7 w25iYB2Zh2Fed+kQc+FdfTL8u7DL96nlAUAogTVcxwT+sYX0ijokIkYUmBvWfgsRqLis RCzAIgc8rwuBOIoK1ovcXEvpITvvrnSBRG27fgy0VbzB2eiHTozy5gfNYj4b8o/i+bjB nNRMUtWgo/nffA095guM9qI0bEHBkm62X8vEBXuzY28e5ojCQ47KjhsQo4kZJlX0wvZj k0Mw4o5us7tpkWdwEzY3dogdjcjzGcPVBwoxWHCWQVvMq5ygicXXvLDiS3AyLPC5I3tZ +5uA== X-Forwarded-Encrypted: i=1; AJvYcCVZP1o55wwc6hgezN470t83wuh3kJLJkYdScl56NU/ZFCF367SMUC4hNfCpYz64BuFNPE+A7XmrJQ==@kvack.org X-Gm-Message-State: AOJu0Yzrp9b0YZb6a+MU7UARkKKiHrtM30PHV4gcBFRCxsrLr5Esxprg a74I6g37jn3qJzmKLjLQPq2yD/iw5e6c7vx7ew4BsU1AfDnphqCnaGpK X-Gm-Gg: ASbGnctZTUGlTangZ0XAgvVWWSq6PT5xdkMyMuvD6JRjAAJXt4c3+BSif57h7/4rIcY eGk5yRJGytzwWGjQfjK9rUK2TwOXrCaHfhBW+S6i25Tz2BsQaRToy9QPsqOsB7L5NDKdHtY85tw KxxdGVvM2KLNL3mw8R1K2gS2wH2+AJVlZDnxX0FTEhb77W4qltFGLAXr8iKCM0M+xgvYJKU3Htk qXY1YLNBymrZ1KSYEnIn6yCTcLh0hr8yK75u0E/IedHwvMFUT5k53j9tWwoyoFd5d5jbPQnTxhM C78zZDpLfQ2hMzGxyc6unwq4066yuNIGGqpRlDlKyAKzGHg= X-Google-Smtp-Source: AGHT+IHK9z/BBG/9EKAw7EpO3uOD6zjU0SI80m4T9S7pG+NIa80Str49cvuzTLt7WfihJvvYhpI4ug== X-Received: by 2002:a5d:5847:0:b0:3a3:7593:818b with SMTP id ffacd0b85a97d-3a6ed62eb62mr6202221f8f.21.1750941892202; Thu, 26 Jun 2025 05:44:52 -0700 (PDT) Received: from EBJ9932692.tcent.cn ([2a09:0:1:2::3086]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3a831051c49sm161455f8f.30.2025.06.26.05.44.47 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 05:44:51 -0700 (PDT) From: Lance Yang To: ioworker0@gmail.com Cc: 21cnbao@gmail.com, akpm@linux-foundation.org, baolin.wang@linux.alibaba.com, chrisl@kernel.org, david@redhat.com, kasong@tencent.com, lance.yang@linux.dev, linux-arm-kernel@lists.infradead.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-riscv@lists.infradead.org, lorenzo.stoakes@oracle.com, ryan.roberts@arm.com, v-songbaohua@oppo.com, x86@kernel.org, ying.huang@intel.com, zhengtangquan@oppo.com Subject: Re: [PATCH v4 3/4] mm: Support batched unmap for lazyfree large folios during reclamation Date: Thu, 26 Jun 2025 20:44:45 +0800 Message-ID: <20250626124445.77865-1-ioworker0@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250626092905.31305-1-ioworker0@gmail.com> References: <20250626092905.31305-1-ioworker0@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4ABF14000D X-Stat-Signature: d533ys1jnmu11q7sa4146e7uobjxzz9t X-HE-Tag: 1750941894-794243 X-HE-Meta: U2FsdGVkX19HBGypaR5A1vsSEJDVFFHYLY7H57pifKEyf+UIqR68WJkgF7EXV3WuWOUYyVW7e2y7G7+ful+4CzO9O5C0SixIYiLm2x9ZtASQeh2wS8P8qSRXv8Sp2/SiO1jLVnxr4cQ4mf7uVJjKXzcXfwO8olcdcaqyvPQBLOkv+QRWkgrOLWsqtvwFrPP8lmB4QL5ulk1/dwFwiLh4dpr6HojPgc8MAjpNV36ohh9MES8kPAWLi9fHLfZSw2++05WoNJC66RlpBhqrvR9sR2ZNbeP7Cqtx4WX9g7fz3d7IgwsOkSMHQ5EW7YOU0x2B70vUXZpEXkzTClhfOXzG9ALTI64GkSNs3ni1MRDQd1cSZbt4WwpyNGl/NKgZZipY2M7FhQyVeiLlGitkfb2DsBgBg75AgLGtX7aj6FYquxNn6oLJ/1HIws4/0efziSjf7OiAbBsnRDOol6tpOIDh+6g+wOJwb5QC2P8EwFE4Uo6FH/+TYOp3wp/HpJiwPpG80s1ZTjE34gzJM5mmPGJIhKrmVbopfG0U+qk/D//jHfoPNiIXSXKJc3g6aIZQq3Rbj9gz+S1Dc9pbWNcJ/en2F1ODNJtq9BG7HlWLVyu5KGP+Zx21k+cyAhlwh4/GtIEwxrHFahoHXbPdCFjeiRb3anHKFjmZ2l2RZJMeTtXGyUR6M+QwkmG2oWPms4vdg6DAFeUYDWy5Vz0mJyCHaUGprzUWQ1qIUIyu4UF+G40pbH1eN+cUgokHnlJPcsRUp2b8IYWQQmEoslnEEB0ooqWjVkTcVtTXJfZkRzDBDgTWDbBfulzLK57GndTn5SzR9itftHE/4wNIGg51fKl+iceywxvPO/8fkq5zuKmNWKJv9QdBDU2Y3DJKR7SAqK8oCqxGPLewBbnx+nLPFTc79c3YcIdGhOnOIN1O1UxIno/gZ0WbKDa4+MlcUVbkNUADMMKB5pMjJKC8TPKgQF6siNx dIs8oghe GiKuU28ytniaq2yenUhhx0yW473Q0xDrLEvQl9RHszJGJLik9/WuKfhVOlAS01KkKnxttaw0KG25wtqsietd4ODB3zvsEjKFqdcPpImy7E0TN0Qsb8sJg/ZSuAA2G/BNxgf4cPkBYHpei7kQpAljQuD9YfLs/4zUlIPZI/pv3STWfdoQERz09wO1Yz8v32jP++sX3qrsAkSDhkrPuyCNFIHXxYTToV8LzI+U68wqNk7dMcmPwJMQLmxAmj7vMdaXt/swtPj4Nye/PzA/+4xGQn3gfii+YF2Uf1uFho3UHcdxkLnjHocR2C01yxK4zVCa6v/UlYz/wf89rnVOnNTgWEQEFiZkA0ysLWqELv5j5K2MvXqsnO5Ini8Kt1/8f9wNpTmrxPrLqoNOyYkIDoJ70oaY01Ioo/V5lD529FRq0xooGBiPmEG8Yuxx1MdR29VYJDjgqBMTwsUIZKeVVbfb4vrihwg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/6/26 17:29, Lance Yang wrote: > Before I send out the real patch, I'd like to get some quick feedback to > ensure I've understood the discussion correctly ;) > > Does this look like the right direction? > > diff --git a/mm/rmap.c b/mm/rmap.c > index fb63d9256f09..5ebffe2137e4 100644 > --- a/mm/rmap.c > +++ b/mm/rmap.c > @@ -1845,23 +1845,37 @@ void folio_remove_rmap_pud(struct folio *folio, struct page *page, > #endif > } > > -/* We support batch unmapping of PTEs for lazyfree large folios */ > -static inline bool can_batch_unmap_folio_ptes(unsigned long addr, > - struct folio *folio, pte_t *ptep) > +static inline unsigned int folio_unmap_pte_batch(struct folio *folio, > + struct page_vma_mapped_walk *pvmw, > + enum ttu_flags flags, pte_t pte) > { > const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; > - int max_nr = folio_nr_pages(folio); > - pte_t pte = ptep_get(ptep); > + unsigned long end_addr, addr = pvmw->address; > + struct vm_area_struct *vma = pvmw->vma; > + unsigned int max_nr; > + > + if (flags & TTU_HWPOISON) > + return 1; > + if (!folio_test_large(folio)) > + return 1; > > + /* We may only batch within a single VMA and a single page table. */ > + end_addr = pmd_addr_end(addr, vma->vm_end); > + max_nr = (end_addr - addr) >> PAGE_SHIFT; > + > + /* We only support lazyfree batching for now ... */ > if (!folio_test_anon(folio) || folio_test_swapbacked(folio)) > - return false; > + return 1; > if (pte_unused(pte)) > - return false; > - if (pte_pfn(pte) != folio_pfn(folio)) > - return false; > + return 1; > + > + /* ... where we must be able to batch the whole folio. */ > + if (pte_pfn(pte) != folio_pfn(folio) || max_nr != folio_nr_pages(folio)) > + return 1; > + max_nr = folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags, > + NULL, NULL, NULL); > > - return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, > - NULL, NULL) == max_nr; > + return (max_nr != folio_nr_pages(folio)) ? 1 : max_nr; > } > > /* > @@ -2024,9 +2038,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > if (pte_dirty(pteval)) > folio_mark_dirty(folio); > } else if (likely(pte_present(pteval))) { > - if (folio_test_large(folio) && !(flags & TTU_HWPOISON) && > - can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) > - nr_pages = folio_nr_pages(folio); > + nr_pages = folio_unmap_pte_batch(folio, &pvmw, flags, pteval); > end_addr = address + nr_pages * PAGE_SIZE; > flush_cache_range(vma, address, end_addr); > > @@ -2206,13 +2218,16 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, > hugetlb_remove_rmap(folio); > } else { > folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); > - folio_ref_sub(folio, nr_pages - 1); > } > if (vma->vm_flags & VM_LOCKED) > mlock_drain_local(); > - folio_put(folio); > - /* We have already batched the entire folio */ > - if (nr_pages > 1) > + folio_put_refs(folio, nr_pages); > + > + /* > + * If we are sure that we batched the entire folio and cleared > + * all PTEs, we can just optimize and stop right here. > + */ > + if (nr_pages == folio_nr_pages(folio)) > goto walk_done; > continue; > walk_abort: > -- Oops ... Through testing on my machine, I found that the logic doesn't behave as expected because I messed up the meaning of max_nr (the available scan room in the page table) with folio_nr_pages(folio) :( With the following change: diff --git a/mm/rmap.c b/mm/rmap.c index 5ebffe2137e4..b1407348e14e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1850,9 +1850,9 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio, enum ttu_flags flags, pte_t pte) { const fpb_t fpb_flags = FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + unsigned int max_nr, nr_pages = folio_nr_pages(folio); unsigned long end_addr, addr = pvmw->address; struct vm_area_struct *vma = pvmw->vma; - unsigned int max_nr; if (flags & TTU_HWPOISON) return 1; @@ -1870,12 +1870,13 @@ static inline unsigned int folio_unmap_pte_batch(struct folio *folio, return 1; /* ... where we must be able to batch the whole folio. */ - if (pte_pfn(pte) != folio_pfn(folio) || max_nr != folio_nr_pages(folio)) + if (pte_pfn(pte) != folio_pfn(folio) || max_nr < nr_pages) return 1; - max_nr = folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags, - NULL, NULL, NULL); - return (max_nr != folio_nr_pages(folio)) ? 1 : max_nr; + max_nr = folio_pte_batch(folio, addr, pvmw->pte, pte, nr_pages, + fpb_flags, NULL, NULL, NULL); + + return (max_nr != nr_pages) ? 1 : max_nr; } /* -- ... then things work as expected for the lazyfree case, without any splitting.