From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CCBA0CD11C2 for ; Wed, 3 Apr 2024 00:47:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43D4C6B0099; Tue, 2 Apr 2024 20:47:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3ED376B009B; Tue, 2 Apr 2024 20:47:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28D576B009C; Tue, 2 Apr 2024 20:47:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0BA5C6B0099 for ; Tue, 2 Apr 2024 20:47:56 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C1AEBA0457 for ; Wed, 3 Apr 2024 00:47:55 +0000 (UTC) X-FDA: 81966383310.08.8DFFD6C Received: from out-180.mta1.migadu.com (out-180.mta1.migadu.com [95.215.58.180]) by imf09.hostedemail.com (Postfix) with ESMTP id D18E814000F for ; Wed, 3 Apr 2024 00:47:52 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="e3/RzaVz"; spf=pass (imf09.hostedemail.com: domain of lance.yang@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=lance.yang@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712105273; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VpwvRSrPLH+MnRF++w0ONvrc5w145lT6umS6mZQL/vg=; b=QsUSkkS6DJTNeOxuDl3kcEJ6fG/bN/96xPUUmnRYpxyv3c6IOWfBj4SZFYyz+5ZXgu3Zi8 o86lGlOVHHvrkbPllf8EDVpT2ZKLQNulrAHBzxF9eCNayRTVjlYmybYUKoyAb76n15OURV njn0wMzWxHQ8wr9B7MiY5Nqx7x5L2bc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712105273; a=rsa-sha256; cv=none; b=TkPlu6f6eJsnOTWyRE0f3ShMzqt+AUT7pNQe9+NKOX4QZ7uC6Bf3PCFme4YxoDWFF0X/mW aGRSmE/1MaknNNGtBjekfiWcXZZCsHZMg2hUx1xhgU+qskBBPhqCwfNZGoJQq2JmNj3Joz pq+wTYGShcx1p0bT72K7Jv0dh2f/ZJQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b="e3/RzaVz"; spf=pass (imf09.hostedemail.com: domain of lance.yang@linux.dev designates 95.215.58.180 as permitted sender) smtp.mailfrom=lance.yang@linux.dev; dmarc=pass (policy=none) header.from=linux.dev MIME-Version: 1.0 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1712105270; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VpwvRSrPLH+MnRF++w0ONvrc5w145lT6umS6mZQL/vg=; b=e3/RzaVzNG6TFVqMp1I3dKvaLbhhTDjRNWOAfPTyrIaIemAdswPC7WeuMiE2xRCEstXlCW 2hS8g55t2J6DNsOQJt9y2RmAcVMm3TuBfp05cuIGYHXa1PXrqwW9g3CFybT3YwQPKH9hsi 8H9G7PDNkfndf5JN/IgEOPX388fdZDE= Date: Wed, 03 Apr 2024 00:47:47 +0000 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: "Lance Yang" Message-ID: TLS-Required: No Subject: Re: [PATCH v5 2/6] mm: swap: free_swap_and_cache_nr() as batched free_swap_and_cache() To: "Zi Yan" , "Ryan Roberts" Cc: "Andrew Morton" , "David Hildenbrand" , "Matthew Wilcox" , "Huang Ying" , "Gao Xiang" , "Yu Zhao" , "Yang Shi" , "Michal Hocko" , "Kefeng Wang" , "Barry Song" <21cnbao@gmail.com>, "Chris Li" , "Lance Yang" , linux-mm@kvack.org, linux-kernel@vger.kernel.org In-Reply-To: References: <20240327144537.4165578-1-ryan.roberts@arm.com> <20240327144537.4165578-3-ryan.roberts@arm.com> X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: D18E814000F X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: m49yti46tq58rfw1k1priwkei5w11619 X-HE-Tag: 1712105272-342949 X-HE-Meta: U2FsdGVkX19Y621b2zghQbBAl3LoIwkL/+ipSNwhlSRKDfnepDCxPL2AA3QMJAJwvCoyuAJVogPYIdyg3KoKJkAoTvWqRywOAGAesfgYY4ALT9Frg5k7wo+td1NMDdiSVRthzW9ehjR7yF4gcPMF1Vo5AfTqigunN92pmjKo+38MSE7DT2PpZRs5RF7WI8Vq+euxKxoh+J9+5ZqefHeq+OLTo2hWU86i7UiI9bwYMNR0NcQ+qUZU7GDeFkW2r5zhtfKXzFjr+mizVoy8ICxA0ZJbaXQltJwS+t1t+X/QQT3foWQqD3lN94sOJJaS+qwo1T7VSG1rl1CMdPsJVC/qP2K3t4g3/ckynKP+7mT9NrHm3P6xo22rEdjkDZbpg4Ojv8QTjBhLkE0CKkz4jrYRZ6qJU2mbxAvGavs8EwHvKmg40TLiIz1+jeK/2ZJiW0e/ULEKFBH1SqsiAKmrf8BtHR24mvShY8jXm549VP9aPZj9F3qZdsEC8R5UHj1kyI74qHMkwruwMSK2CD6vQ9cyJgm24SaRDMr7muvPK3vuPhOgiaAcSXoaUWY2c5vcxG6KdGXuXa2H6/geQHIF64bqRCmyygM0NfvapAUTTRgBCUD7/aeVwdtAdj9+H5wCGHayIURL+AS3BTRAPUFUPWbzPOxXEqwPOhlaaIQcUODQA8p0DZReNQcmSwTyb0/7Bh0MRBT+1R+u67/9dNyPYg+NjlXAcu4TjBN1Vb/cXhR7R46rbgXUQVrry5X+R1pbk34KnzHxWRh/DXvJE7rLWFtEQ6pzpAl5mN8NAQSikqqq4PWIPNr/lYbeOf6O7R/cjHXZn4oMVhbZvUctlnJAAYQk7wodMqE/dRCO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: April 3, 2024 at 8:30 AM, "Zi Yan" wrote: >=20 >=20On 27 Mar 2024, at 10:45, Ryan Roberts wrote: >=20 >=20>=20 >=20> Now that we no longer have a convenient flag in the cluster to dete= rmine > >=20 >=20> if a folio is large, free_swap_and_cache() will take a reference a= nd > >=20 >=20> lock a large folio much more often, which could lead to contention= and > >=20 >=20> (e.g.) failure to split large folios, etc. > >=20 >=20> Let's solve that problem by batch freeing swap and cache with a ne= w > >=20 >=20> function, free_swap_and_cache_nr(), to free a contiguous range of = swap > >=20 >=20> entries together. This allows us to first drop a reference to each= swap > >=20 >=20> slot before we try to release the cache folio. This means we only = try to > >=20 >=20> release the folio once, only taking the reference and lock once - = much > >=20 >=20> better than the previous 512 times for the 2M THP case. > >=20 >=20> Contiguous swap entries are gathered in zap_pte_range() and > >=20 >=20> madvise_free_pte_range() in a similar way to how present ptes are > >=20 >=20> already gathered in zap_pte_range(). > >=20 >=20> While we are at it, let's simplify by converting the return type o= f both > >=20 >=20> functions to void. The return value was used only by zap_pte_range= () to > >=20 >=20> print a bad pte, and was ignored by everyone else, so the extra > >=20 >=20> reporting wasn't exactly guaranteed. We will still get the warning= with > >=20 >=20> most of the information from get_swap_device(). With the batch ver= sion, > >=20 >=20> we wouldn't know which pte was bad anyway so could print the wrong= one. > >=20 >=20> Signed-off-by: Ryan Roberts > >=20 >=20> --- > >=20 >=20> include/linux/pgtable.h | 28 +++++++++++++++ > >=20 >=20> include/linux/swap.h | 12 +++++-- > >=20 >=20> mm/internal.h | 48 +++++++++++++++++++++++++ > >=20 >=20> mm/madvise.c | 12 ++++--- > >=20 >=20> mm/memory.c | 13 +++---- > >=20 >=20> mm/swapfile.c | 78 ++++++++++++++++++++++++++++++----------- > >=20 >=20> 6 files changed, 157 insertions(+), 34 deletions(-) > >=20 >=20> diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > >=20 >=20> index 09c85c7bf9c2..8185939df1e8 100644 > >=20 >=20> --- a/include/linux/pgtable.h > >=20 >=20> +++ b/include/linux/pgtable.h > >=20 >=20> @@ -708,6 +708,34 @@ static inline void pte_clear_not_present_full= (struct mm_struct *mm, > >=20 >=20> } > >=20 >=20> #endif > >=20 >=20> +#ifndef clear_not_present_full_ptes > >=20 >=20> +/** > >=20 >=20> + * clear_not_present_full_ptes - Clear consecutive not present PT= Es. > >=20 >=20> + * @mm: Address space the ptes represent. > >=20 >=20> + * @addr: Address of the first pte. > >=20 >=20> + * @ptep: Page table pointer for the first entry. > >=20 >=20> + * @nr: Number of entries to clear. > >=20 >=20> + * @full: Whether we are clearing a full mm. > >=20 >=20> + * > >=20 >=20> + * May be overridden by the architecture; otherwise, implemented = as a simple > >=20 >=20> + * loop over pte_clear_not_present_full(). > >=20 >=20> + * > >=20 >=20> + * Context: The caller holds the page table lock. The PTEs are al= l not present. > >=20 >=20> + * The PTEs are all in the same PMD. > >=20 >=20> + */ > >=20 >=20> +static inline void clear_not_present_full_ptes(struct mm_struct *= mm, > >=20 >=20> + unsigned long addr, pte_t *ptep, unsigned int nr, int full) > >=20 >=20> +{ > >=20 >=20> + for (;;) { > >=20 >=20> + pte_clear_not_present_full(mm, addr, ptep, full); > >=20 >=20> + if (--nr =3D=3D 0) > >=20 >=20> + break; > >=20 >=20> + ptep++; > >=20 >=20> + addr +=3D PAGE_SIZE; > >=20 >=20> + } > >=20 >=20> +} > >=20 >=20> +#endif > >=20 >=20> + > >=20 >=20 > Would the code below be better? >=20 >=20for (i =3D 0; i < nr; i++, ptep++, addr +=3D PAGE_SIZE) >=20 FWIW for=20(; nr-- > 0; ptep++, addr +=3D PAGE_SIZE) pte_clear_not_present_full(mm, addr, ptep, full); Thanks, Lance > pte_clear_not_present_full(mm, addr, ptep, full); >=20 >=20-- >=20 >=20Best Regards, >=20 >=20Yan, Zi >