From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23D13C10DC1 for ; Fri, 8 Dec 2023 14:00:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9070D6B0087; Fri, 8 Dec 2023 09:00:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8B72B6B0089; Fri, 8 Dec 2023 09:00:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77EA76B008A; Fri, 8 Dec 2023 09:00:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 6857F6B0087 for ; Fri, 8 Dec 2023 09:00:45 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 38ED4C0237 for ; Fri, 8 Dec 2023 14:00:45 +0000 (UTC) X-FDA: 81543811650.06.40291BE Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf10.hostedemail.com (Postfix) with ESMTP id 33132C005A for ; Fri, 8 Dec 2023 14:00:38 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702044039; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=z/uVckyP8uNM6DOi3Qw9e9WVAllt8uem/N8x8xZppww=; b=YU78nGY3kGZBITEN7kM0vxdtK6u1WVs7RLrUJHQrU61hqbin1yRMih2DRK1isuqhw85Nir 3WI7RtuWOM16R6/K8tOKevNxaAyjFTnIN11/BskRuApky/iDejsM/B2CXp8/ZAzCQmWZJu ARNCnmBpOGBK1nXSb6gHAMYlnYsnl84= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702044039; a=rsa-sha256; cv=none; b=J0pBO67K7Kb4g/Zha7kor7QXFMfxHjgFLEeBr5rXEnSLzISZqHeeTO7JXc9pBsnmb1reqY 7wGLrP+S9rwg05yoML4a9gr/8LGv94KHkmbgJbxmU591G6ZPBq75FcLSEHWVssU8b/+6sW k0lQ11FaXz8Mt0UiEfkA89UHoLiy5mA= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 7AC4A106F; Fri, 8 Dec 2023 06:01:23 -0800 (PST) Received: from [10.57.73.30] (unknown [10.57.73.30]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A6E103F6C4; Fri, 8 Dec 2023 06:00:35 -0800 (PST) Message-ID: <992ba52a-fcbc-4a6a-9bd1-416b4ae60cc8@arm.com> Date: Fri, 8 Dec 2023 14:00:34 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] arm64: mm: support THP_SWAP on hardware with MTE To: Matthew Wilcox , Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, catalin.marinas@arm.com, david@redhat.com, linux-mm@kvack.org, steven.price@arm.com, will@kernel.org, linux-arm-kernel@lists.infradead.org, mhocko@suse.com, shy828301@gmail.com, v-songbaohua@oppo.com, wangkefeng.wang@huawei.com, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com, linux-kernel@vger.kernel.org References: <20231208073401.70153-1-v-songbaohua@oppo.com> Content-Language: en-GB From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 33132C005A X-Rspam-User: X-Stat-Signature: 3ga75uceejminkz1pf7i7ocuim4wi4jd X-Rspamd-Server: rspam03 X-HE-Tag: 1702044038-143009 X-HE-Meta: U2FsdGVkX1/R49Sot2mpO4565u877BPoUp0WTCetUQPpsxD64w9UQqLoQ5ECLALhXhD5JgdKgQp5fUFFt7KPvLZqIqt9ErIf7VKHUdRe/eLQkfi+2chwampEKNCSaYBJ7TtSq39JkRwHwgob5pLnrUrZobArc7AUmc+rx+UiRn3CQrm33BaudHF7BdF8p4T6nVqiTzoYTIBtPCdhSPl/9oGQqor0beuSnB/kniN/cgazBoTIMwqX/lawrPsNKvzRAspm8Cc7l2YJ7alq2LckysaRXCexqi/QTxH0TQEjQdiOKmSSwG93h1Rsf8+G2SkaWg+hQStMIKjAbxFUWn3oOpHBfG83YwPV4QEqD5OmDh2EnSrupqYRz7qlkFz/ztkeXAeEoZuMMs2eTB3i62gmXEtig6Bbms1XvLHoTMbCa2dLFBGl1I94b8PXNgx8MZkjatkK1oW7QnYtpnu5YlsJOkSeF0p1nxjswLmPDckWqLPXpMP4z4UEp6psVOyA0kuc7ckNoq3bDGyN44GZzhkbd2789v0BdqLxaVBAVZjP2ZrYXwBSgtHr+BwOzcF0YUnVa4r47Wj7s4L1n6S2y0ImMiAF9YpkafOdC5rO5U41N/U4+nBMjCUnilH07ywUhxeHmf6gUb6wegUayEh0dqidEcSLa6TgeUoI8UFQ4QNemFx4QkMlY9PQaFJ2IcV+zEbWIXXtxBY4aZgJ0SCyMP3CyCJZ/SQIoJ4XMusOLPlX2KcMkjMRcOBR93iYXNiP/0wQStCTW3r7yNNDci/Ea97JI2urnaOGRGImB/0wCPfhGDGSUhUuXWye3kgJLi/6mqNzn0+loMsJV67GBVpixt/aExl26TvLcAzSzAPRQjwMaqGvpaaQyc/XtAw8MVa88HZ5ZSgx2p/9IIzxRcRZUfFAORjmAgcHY+RnTXrRpWUyzSZGwIfe7DhOV93f1YJZbermbF2HxW6VMO0YvBCXVLd sDqW0Iwj Ac73+gklXrOMbyLIrhujNn9+b6Gmbf2soZxOf0faEwF5OyrUFRx7Os6q2Yl1sMgJxaYPwuhn+cEe3dQp34dGL59cUPBTB7bgRRRNBeG+dMdwPaAB5UUykMRn7do0D8clOueG7VRfcsue7JfNpZEUzvIizxjf+j4SQZ+86SSxTokxZzJsT9uAFuQJO7rclluzxnsyJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 08/12/2023 13:22, Matthew Wilcox wrote: > On Fri, Dec 08, 2023 at 08:34:01PM +1300, Barry Song wrote: >> arch_prepare_to_swap() should take folio rather than page as parameter >> because we support THP swap-out as a whole. It saves tags for all >> pages in a large folio. >> >> Meanwhile, arch_swap_restore() now moves to use page parameter rather >> than folio because swap-in, refault and MTE tags are working at the >> granularity of base pages rather than folio: >> 1. a large folio in swapcache can be partially unmapped, thus, MTE >> tags for the unmapped pages will be invalidated; >> 2. users might use mprotect() to set MTEs on a part of a large folio. > > I would argue that using mprotect() to set MTEs on part of a large folio > should cause that folio to be split. Could the user give us any > stronger signal that this memory is being used for different purposes, > and therefore should not be managed as a single entity? I agree this probably makes sense here. But splitting is best effort as I understand it? It can fail due to long-term GUP, right? In which case we still have to handle the MTE on partial large folio case safely, even if not performantly. As an aside, I don't think it's clear cut that we would always prefer to split based on user space mprotect/madvise/etc calls. IIUC, there are garbage collectors that temporarily mark pages RO then switch back to RW. I wouldn't want to split here and lose the benefits of contpte forever. I'm handwaving because I haven't looked into the exact mechanisms yet. But I think we need to understand these users better before deciding on an "always split based on user hints" policy. > >> Thus, it won't be easy to manage MTE tags at the granularity of folios >> since we do have some cases in which a part of pages in a large folios >> have valid tags, while the other part of pages haven't. Furthermore, >> trying to restore MTE tags for a whole folio can lead to many loops and >> early exiting even if the large folio in swapcache are still entirely >> mapped since do_swap_page() only sets PTE and frees swap for the base >> page where PF is happening. >> >> But we still have a chance to restore tags for a whole large folio >> once we support swap-in large folio. So this job is deferred till we >> can do refault and swap-in as a large folio. > > I strongly disagree with changing the interface to arch_swap_restore() > from folio to page.