From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E766BC4167B for ; Fri, 8 Dec 2023 00:00:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F4916B0072; Thu, 7 Dec 2023 19:00:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7A4616B0074; Thu, 7 Dec 2023 19:00:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 66C5A6B0075; Thu, 7 Dec 2023 19:00:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 540FD6B0072 for ; Thu, 7 Dec 2023 19:00:30 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D62DA1201E6 for ; Fri, 8 Dec 2023 00:00:29 +0000 (UTC) X-FDA: 81541694178.25.4E5DE1D Received: from mail-vs1-f42.google.com (mail-vs1-f42.google.com [209.85.217.42]) by imf07.hostedemail.com (Postfix) with ESMTP id 1A83840024 for ; Fri, 8 Dec 2023 00:00:27 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OugxjqtT; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.42 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701993628; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=HlnJWHSSLeyHnhaU7JOqxzY1KDfAFZV5s1/3N5/ws/o=; b=LNSucmyMLmI/DQyajUMzQHPReRacwcy4MnmSc4Q8oDYxM2ktbWb73RWBa33y67h5fpzOYj LIaCV2+XKYbchPgl7fcE7ScmTmH0a44xc2WmQLVU6fP3KyMtUGuXxZtysb+5dfYUjuNl2q C8/MLMjAp/wL7nkNK6vlSuXIuX0LQJY= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=OugxjqtT; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf07.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.42 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701993628; a=rsa-sha256; cv=none; b=VKeLodFvmpgp/ijo8mIbibngwV741DS+kGXkh8X3T9VV/AVnCOwpKAZTRT7WIyKSZ60ehs /WJQ7chiY1/doosFh792RuyKjZ7eGkTVPAa9ZCzp5Q5DmR+6Hi5ecGW4R0rDKFJSBhmib9 4jTJDv1TJXbhcTJ0fxbnRQgtyBvlNUU= Received: by mail-vs1-f42.google.com with SMTP id ada2fe7eead31-462e70f1c20so487667137.0 for ; Thu, 07 Dec 2023 16:00:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1701993627; x=1702598427; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=HlnJWHSSLeyHnhaU7JOqxzY1KDfAFZV5s1/3N5/ws/o=; b=OugxjqtTkVrp4g2eVAhUG2U7DKwlWTVQKtVhnuaGxlvuYqYYHdUhHEW4LklBxEUZGD 4vrz1G5jPjeDBCChtVyeak92WzWjD8lr/Dpuj9sVrxRDjP0fwLlWV7D/xiEX3r8lsKi6 utKEKALJ/VhCR8Pgq7YfPGoh+pOQUjYMaBal/DDCAm6z3WRVHlx5EZAJ31NJYsZQVk4H 4on1PXf9TPmyMYFRZGJGUzhY970w9KHL1dOjYS8hEWDxfet5vm7TSg3YqYXd8Hl1m/Gv ickiT8RsXiYI7XH9SQntVn/Ofus0117HroyfKFqijkf+3q69ERPZJncQ//7OhL2RyR8i j4mA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701993627; x=1702598427; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=HlnJWHSSLeyHnhaU7JOqxzY1KDfAFZV5s1/3N5/ws/o=; b=rbKxQK0N59xJC9lx8M7H+xzX34ajMi+ObiloU8ShCN2jpFi2H4yL8KXkLLQ4Shaxrr jvuJ8hmXuo/vkvPTXiFRcdE4vbYxSd+Zd5IC6ILDtasuWxX4cFPSd/CtYxAnmn20tBL7 PoVOETmlPve+UTozpQEm49jm+jyuKI5YpiVWwbZdNxIb4ZEtDELl2y9EUiyaleZV05CZ lBlENqxs8svbCSi6fGhmKQaogTu/XHuK0/LZ546aZDTDl5pjHESot33s5Wvyy8qS7ku8 ba24MRsD/vKDyeeSkZAAiQlBpv42yD6J8rPWOIjzsrnN5IAepL9l99zwJY/mWAWH6hfv E8Vw== X-Gm-Message-State: AOJu0YxoHRiusYDdAG2sfwrSCgXmWih63gQhWgTiYUqtkAKBB1CXQu5q 2rHFjEH8hmk55HqVOywnzcnOD8aGLPgRBgNjv40= X-Google-Smtp-Source: AGHT+IFvFhu/aXaxifAYsgST4W9l6OWnNSKj54mm6ebeXA9gjRJixSzvRtn4LJyuW5YOo6Iyt4DGPoombQoJx1G28Kg= X-Received: by 2002:a05:6102:5489:b0:464:44e0:8f9 with SMTP id bk9-20020a056102548900b0046444e008f9mr4223136vsb.35.1701993627086; Thu, 07 Dec 2023 16:00:27 -0800 (PST) MIME-Version: 1.0 References: <20231114014313.67232-1-v-songbaohua@oppo.com> <8c7f1a2f-57d2-4f20-abb2-394c7980008e@redhat.com> <5de66ff5-b6c8-4ffc-acd9-59aec4604ca4@redhat.com> <71c4b8b2-512a-4e50-9160-6ee77a5ec0a4@arm.com> <679a144a-db47-4d05-bbf7-b6a0514f5ed0@arm.com> <8aa8f095-1840-4a2e-ad06-3f375282ab6a@arm.com> <7065bbd3-64b3-4cd6-a2cd-146c556aac66@redhat.com> <1dcd6985-aa29-4df7-a7cb-ef57ae658861@redhat.com> In-Reply-To: <1dcd6985-aa29-4df7-a7cb-ef57ae658861@redhat.com> From: Barry Song <21cnbao@gmail.com> Date: Fri, 8 Dec 2023 13:00:15 +1300 Message-ID: Subject: Re: [RFC V3 PATCH] arm64: mm: swap: save and restore mte tags for large folios To: David Hildenbrand Cc: Ryan Roberts , Steven Price , akpm@linux-foundation.org, catalin.marinas@arm.com, will@kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mhocko@suse.com, shy828301@gmail.com, v-songbaohua@oppo.com, wangkefeng.wang@huawei.com, willy@infradead.org, xiang@kernel.org, ying.huang@intel.com, yuzhao@google.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1A83840024 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: ttx984jb1kai9tdsaoum1bh69wb7enji X-HE-Tag: 1701993627-234744 X-HE-Meta: U2FsdGVkX1/1+BJYHGyNIwuvITBxMykIEr2jOeSzuxj82a5bluiQ7C0q7QioeF5KP0yvUj0hfpWd1svLBKKXvWC8tjpFMT6C82DL7GrLEzagbgFqWDpmJ2zWYJLJpSMJEQWG2C+iu13tiUvEE5W/m/MrjqTE7HjoQYMmY1op4fSYui9rzYRSPvnkhojrAJJvvHsvFCBXVk/hSILOHh0DIXJWSwEhxMlmKNOEu15NDnyoMmduW/pG/48tSWd1et6h+vP3mvINdC34SaIUKrl56iR2ayo/EoCnoHcwUZdXOw4MICafX+NdoCYRXgj0OYkgt51jGcahEiVB90j+cdbVfvqVprxaBJ2gYn2mq1krZ9woUgs1qAgDCD91vhgSvM3LAgO2TY1PSX33vL73fCJXVgusXoeGoxW4A7uQ18xxyTrpAVLaZ5ofzxDRrGxicIRabsd9jpTkDWB0CEwr7eQhETK64zBD3YzC09XgKtGD0Gh57eQWjjj5+7KxrRWlc5EXVAbODsWnpJiFT7WYEyLs9IS5dS6a6UMSdsP70PcEM4lK8j9wxTAi2Zd+S5kCJo8d52D4tjHJ8SBYBqkzWheHeinCZA7dys7BRFq23AFGRKTao0z8Kv1noXeNAH/NvoDCsZ2i4U+Wh4qSgMV4sevWlLI5CyUgR2hGiEqQlt6ikFYBsVLBgHUGBHpBpIShyL4K1aJFFadCeZLj4N04Oq9F96tVAUXLZPcwLP99GoN5Fferh1nqik8IdlIrTYSwV0uNRqBBwm67g9sf5sbR6CVp+RFQuSoyXB8jmqR8iPnwslUFUUqX6mbrmsE+1wcJSRgdtlPEStlVK+p29A9Y4n3VopTA8BAVZWD4fpklUtSAjnnfBYfrNAx3gB655sAqznLHlXv0jLxJsB1i6qoc9sGgdKmnNYfNmwqCUbiVzMUcjgFw0EJDbrqRnJ/4XPE8AkIdf1bPHCNe3yqqpVwK4xs BP0Y0hbj Ikdl4snQkph4getLNvaynEn3S7DD2NYxOQ7c/LmoHlM9e2Fa/FUDnNS8rlR58w3sqHjEahSLzhVAnp0i8Tls/COr8hAPSecMHJEL5vaUXpbQs2fw1JLaPWNFl9NahM9lY2XxBw7R4TWBFK9yR/qGV1zjwm7sRvjvLqWeqh1ZeJFTqD1jfdXgnVYyFq3JabkgZiCxF2avTGDyRAvvWHHxfuH2Y4frGmXZ4793yWp2NQOUYavhcXcxtBXitVScmQQjnLOtzBFCiT91DiLMOoSyY5/NcrqlrblcokfbtIQQVSMQYnIPoHMchReo11csuLWVr2hd3ROX2UNZOBy7UB085fzFgD8mW0dQ3pUmDz7GueFX7aTKtcHRRD942EA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Dec 7, 2023 at 11:04=E2=80=AFPM David Hildenbrand wrote: > > >> > >>> not per-folio? I'm also not sure what it buys us - instead of reading= a per-page > >>> flag we now have to read 128 bytes of tag for each page and check its= zero. > >> > >> My point is, if that is the corner case, we might not care about that. > > > > Hi David, > > Hi! > > > my understanding is that this is NOT a corner. Alternatively, it is > > really a common case. > > If it happens with < 1% of all large folios on swapout/swapin, it's not > the common case. Even if some scenarios you point out below can and will > happen. > Fair enough. If we define "corner case" based on the percentage of those fo= lios which can get partial MTE tags set or get partial MTE tags invalidated, I a= gree this is a corner case. I thought that a corner case was a case which could rarely happen. > > > > 1. a large folio can be partially unmapped when it is in swapche and > > after it is swapped out > > in all cases, its tags can be partially invalidated. I don't think > > this is a corner case, as long > > as userspaces are still working at the granularity of basepages, this > > is always going to > > happen. For example, userspace libc such as jemalloc can identify > > PAGESIZE, and use > > madvise(DONTNEED) to return memory to the kernel. Heap management is > > still working > > at the granularity of the basepage. > > > > 2. mprotect on a part of a large folio as Steven pointed out. > > > > 3.long term, we are working to swap-in large folios as a whole[1] just > > like swapping out large > > folios as a whole. for those ptes which are still contiguous swap > > entries, i mean, which > > are not unmapped by userspace after the large folios are swapped out > > to swap devices, > > we have a chance to swap in a whole large folio, we do have a chance > > to restore tags > > for the large folio without early-exit. but we still have a good > > chance to fall back to base > > page if we fail to allocate large folio, in this case, do_swap_page() > > still works at the > > granularity of basepage. and do_swap_page() will call swap_free(entry),= tags of > > > > this particular page can be invalidated as a result. > > I don't immediately see how that relates. You get a fresh small folio > and simply load that tag from the internal datastructure. No messing > with large folios required, because you don't have a large folio. So no > considerations about large folio batch MTE tag restore apply. right. I was thinking the original large folio was partially swapped-in and forgot the new allocated page was actually one folio with only one page :-) Indeed, in that case, it is still restoring the MTE tag for the whole folio with one page. > > > > > 4. too many early-exit might be negative to performance. > > > > > > So I am thinking that in the future, we need two helpers, > > 1. void __arch_swap_restore(swp_entry_t entry, struct page *page); > > this is always needed to support page-level tag restore. > > > > 2. void arch_swap_restore(swp_entry_t entry, struct folio *folio); > > this can be a helper when we are able to swap in a whole folio. two > > conditions must be met > > (a). PTEs entries are still contiguous swap entries just as when large > > folios were swapped > > out. > > (b). we succeed in the allocation of a large folio in do_swap_page. > > > > For this moment, we only need 1; we will add 2 in swap-in large folio s= eries. > > > > What do you think? > > I agree that it's better to keep it simple for now. > > -- > Cheers, > > David / dhildenb > Thanks Barry