From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4816CC7EE25 for ; Wed, 17 May 2023 01:37:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D6087900006; Tue, 16 May 2023 21:37:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D104C900003; Tue, 16 May 2023 21:37:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BD885900006; Tue, 16 May 2023 21:37:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AE640900003 for ; Tue, 16 May 2023 21:37:37 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 774091A042E for ; Wed, 17 May 2023 01:37:37 +0000 (UTC) X-FDA: 80798034954.11.72B244D Received: from mail-il1-f179.google.com (mail-il1-f179.google.com [209.85.166.179]) by imf29.hostedemail.com (Postfix) with ESMTP id A3BBA12000C for ; Wed, 17 May 2023 01:37:34 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=bVTAd5SO; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of pcc@google.com designates 209.85.166.179 as permitted sender) smtp.mailfrom=pcc@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684287454; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QLugHt0XGPwYTpMjCAQObsffP9JpeGqJnJJt+CEzfkw=; b=lTLwrx8L9dT/k+BaeWsnWj99p8pIghD9Jyy1feA0ds5KvH4/Zqxy45om8R2WfWTbk8toOM lwTXreQBqeTOz2pYw659kJllmLFQUfPhlqp922CnctqL0uZXlIVGinJGxgVqe3I4ztHAnI D1AzNa7Pkj/fbFUCa33quhBybRs8JUk= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=bVTAd5SO; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of pcc@google.com designates 209.85.166.179 as permitted sender) smtp.mailfrom=pcc@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684287454; a=rsa-sha256; cv=none; b=Irp+Fq6NFE9Mad7KNsDs8g7c5Bc8qF1IT9UkSQjcdZnOekr3l+3R91FArUdoD/q5tG0OM5 rSay7KQBgjEdTWyhYzLNHb+BhBoNQB1KHBXgbgM9Md5MjlupeZIXaonNq16byn5VktfuFE 3Bk7E63kfYJEOuckv60iwAjYV1oqbh4= Received: by mail-il1-f179.google.com with SMTP id e9e14a558f8ab-33164ec77ccso57305ab.0 for ; Tue, 16 May 2023 18:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684287454; x=1686879454; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=QLugHt0XGPwYTpMjCAQObsffP9JpeGqJnJJt+CEzfkw=; b=bVTAd5SO8NoTEqY8WvVpici8dg2AJF+dp7I0LnvvkPfdcwDNGaZNyd9AJHKI6atf4y I9ui03BXIdH2uqHnitcgCBq151W6TYHraYtG2uM/cA42aUOG7SDWjT46iPhVHOLfQhpd cLb4pgJrYFz7Z5x95Vtw17j5uuh9OE9rEnwI+PUr5ySb9Ls5pnIYhqFZhb0HxHid9mhm +U6Es6zGE5ho/3P3faKVgUhZ6BhfRl6P4WvBQIV5PP0GUfkZwrDFuiCs1HQrGfB+L9bl pxBmE0eDGdapX3w/SzQd370Pn2582BdELI3PnyXY8L5KR22eCiB0cF07R8UJxHQgFxV7 6erg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684287454; x=1686879454; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=QLugHt0XGPwYTpMjCAQObsffP9JpeGqJnJJt+CEzfkw=; b=gpeGBc+ECklhIvKkSe+evH+k0CUS+XXFH1BgZmx9mHL1b9mlP1ZGO4RI04B7VNQJtd 7bktrgkUte+GAeGZZA58ILbiA3kGHc/bE+HTxi6tx95tJs9iUAIqIIXFVoTB3DxI/sPT jQmlJ0K/ciPNJeFNM1aGmQHYSE8MWXpC1It7pZ4KB2BFE3s5OLC44gmbuL14VZcY/w24 pw6Z0oa6HNtN5dIO7GMxQZpFfvIea9ReiHEZKbHWmJrH9lDnb3ZOy9cG+MSBrSr2s0Pv AVffXBLn7BdD2TGxTMGtVRI3meXjAkgeOXfRsY9FSg44AXlb/tH2zKOi+ZskW5XMyXwm YWVA== X-Gm-Message-State: AC+VfDxsZLai3T6AApC/h8N7UP6uQtdF9wtwqJjqScYFLRRVB/aUmY3O kFQAFpa3zfFsEHH2DpU0TLWrWZflJrPj6V1yHSJN8Q== X-Google-Smtp-Source: ACHHUZ6dZfD1Ix5kE87yhR1ityHzz0e/73f67Hdq1xOZtlUzUQebA+DIVW/8Pry1HP7JugJAoQpPKt/YljP3mLUFfFU= X-Received: by 2002:a05:6e02:1a4a:b0:331:948c:86f3 with SMTP id u10-20020a056e021a4a00b00331948c86f3mr64152ilv.19.1684287453632; Tue, 16 May 2023 18:37:33 -0700 (PDT) MIME-Version: 1.0 References: <20230512235755.1589034-1-pcc@google.com> <20230512235755.1589034-2-pcc@google.com> <7471013e-4afb-e445-5985-2441155fc82c@redhat.com> <91246137-a3d2-689f-8ff6-eccc0e61c8fe@redhat.com> In-Reply-To: <91246137-a3d2-689f-8ff6-eccc0e61c8fe@redhat.com> From: Peter Collingbourne Date: Tue, 16 May 2023 18:37:22 -0700 Message-ID: Subject: Re: [PATCH 1/3] mm: Move arch_do_swap_page() call to before swap_free() To: David Hildenbrand Cc: Catalin Marinas , =?UTF-8?B?UXVuLXdlaSBMaW4gKOael+e+pOW0tCk=?= , linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "surenb@google.com" , =?UTF-8?B?Q2hpbndlbiBDaGFuZyAo5by16Yym5paHKQ==?= , "kasan-dev@googlegroups.com" , =?UTF-8?B?S3Vhbi1ZaW5nIExlZSAo5p2O5Yag56mOKQ==?= , =?UTF-8?B?Q2FzcGVyIExpICjmnY7kuK3mpq4p?= , "gregkh@linuxfoundation.org" , vincenzo.frascino@arm.com, Alexandru Elisei , will@kernel.org, eugenis@google.com, Steven Price , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: n19jrjr6eou3fztfwowfbed3ugzgtqkr X-Rspam-User: X-Rspamd-Queue-Id: A3BBA12000C X-Rspamd-Server: rspam07 X-HE-Tag: 1684287454-41570 X-HE-Meta: U2FsdGVkX19ky1tWvYaLd/RR0ovMm4eToXo6+Aymcb46ISjO3+JHtc5DHZf8IP7nGK3GR8vJfbqBQzaJvC8qkrR9pyOYtAZf5Qx39ZXB7i03IgKIUeNjYADv7l/W+66Kqk/ygh1sCpj+RV3kgaqORbyQCQ5/5U5wnvRpTxzSmMeB+VU4TsSlmUCsB2aErqOBLRTC1/iZMDKGQwUbJ4iJ6w1Mq4bLw4rHsfFd9SKsrgM5O2gnfUHakWSEzlTyVxuX2xAw/Wt6NsVcdyrVcZQYcjBoor/2v8thuXkgVK6WzocqWsBXjs1oZXijnGnnNzh1pablOLrWjt3IWxhT7ntBRawNbBDB9lxVOzHHJghDIOzTjxtigBJZ1oi+1BUPDwqywNi8Nsf5lC417YhtRws/+brqOt08aVtYcjCs7BzebWVo4CD5vA77grwGw1O/0AK3SjPNW/lmKAkVXf+I8PMon1VmqjVb7/jjaGkVGfJnYILG4ofOtFTmhP0GRqwOv2yg0rgTUa1tuSaYdhqx/yhMJtdtRwenGVHTQtNb0zNkkDlB3cCkyvaPm75UDL1x5SS/r+syn4S2HriIh+lF2rRTaKw+bKUo5gqZeJddJ/h21KD1CYxmRcoY9uyXHOY6Q1D4gJBwTOA69AHxSkySOBi98dgMnCToAUh8iBYTD9CZvZ4GQXqVMlJ0e8z6Oi8PAfcUSJaWtzmc96yuukXffXkBuYDLqGBqSK39O4mBQCjESLF+MAcpR2xN4MM77j03fJ0evbUXNwdsQXNPxKLkT9dGaIi9O5u6imsxZuouX/iRYY+hjvgT4C3d+6G/g6ldvd39zxpW74CLhhYM81gkemQaEOdRtOgnOT3od/wPDz94FbwOMwphBQciAVb8pMasbDIG1uQx6g4DkBqz7HS7Hia6AiI6isfdjJZ2mm0W+N2zV42XDSnIZ/k50zWyuGGJfMYFRe/HV+5dcvP5kot86jL XGsrztGa wcH3Akn1S4uqg78xkY67bexMvDR2ixcNwn6biSEABd+T7pXcap/H+ScYAfqLmLyF1e+aUe89xSB+yHBkiOBqF95Epkl8bOsP9fQFBEaNts/WKgKJqbjjI8eEirAxFAcdrlcOIQAXd1AYd4Q/sIwTCCJl7RwjW9B62TyoGRz0TiR+rOtCRpM5jmgf2QmqKw7Iv/79ypNoAPQJLvDb9qE14jwm5WJqgl+cF9xXS4YOak5j+5XWBWM0KdNfj3stBvuU2gX+tW5b8YZ8NTF/VtaAMy8p4zNqQ6VWmB2PT/Cf9or3pQmSGI2WAkt2KwlJf1bgKqisAQN+2+xP+9/kqJ9gmfAMgyyCJmDbBol7cAHQdymyUBMz0KjyP1Z3vBsoSax6zzf3zZnMCo7NvRAHB97p/IKgGmrHjQvuPacrkxyE6Y7TWwqZRm+24Mfc98+CV23+cv/1W9wA9x8MdfS4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, May 16, 2023 at 5:31=E2=80=AFAM David Hildenbrand wrote: > > On 15.05.23 19:34, Catalin Marinas wrote: > > On Sat, May 13, 2023 at 05:29:53AM +0200, David Hildenbrand wrote: > >> On 13.05.23 01:57, Peter Collingbourne wrote: > >>> diff --git a/mm/memory.c b/mm/memory.c > >>> index 01a23ad48a04..83268d287ff1 100644 > >>> --- a/mm/memory.c > >>> +++ b/mm/memory.c > >>> @@ -3914,19 +3914,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >>> } > >>> } > >>> - /* > >>> - * Remove the swap entry and conditionally try to free up the swa= pcache. > >>> - * We're already holding a reference on the page but haven't mapp= ed it > >>> - * yet. > >>> - */ > >>> - swap_free(entry); > >>> - if (should_try_to_free_swap(folio, vma, vmf->flags)) > >>> - folio_free_swap(folio); > >>> - > >>> - inc_mm_counter(vma->vm_mm, MM_ANONPAGES); > >>> - dec_mm_counter(vma->vm_mm, MM_SWAPENTS); > >>> pte =3D mk_pte(page, vma->vm_page_prot); > >>> - > >>> /* > >>> * Same logic as in do_wp_page(); however, optimize for pages tha= t are > >>> * certainly not shared either because we just allocated them wit= hout > >>> @@ -3946,8 +3934,21 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >>> pte =3D pte_mksoft_dirty(pte); > >>> if (pte_swp_uffd_wp(vmf->orig_pte)) > >>> pte =3D pte_mkuffd_wp(pte); > >>> + arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_p= te); > >>> vmf->orig_pte =3D pte; > >>> + /* > >>> + * Remove the swap entry and conditionally try to free up the swa= pcache. > >>> + * We're already holding a reference on the page but haven't mapp= ed it > >>> + * yet. > >>> + */ > >>> + swap_free(entry); > >>> + if (should_try_to_free_swap(folio, vma, vmf->flags)) > >>> + folio_free_swap(folio); > >>> + > >>> + inc_mm_counter(vma->vm_mm, MM_ANONPAGES); > >>> + dec_mm_counter(vma->vm_mm, MM_SWAPENTS); > >>> + > >>> /* ksm created a completely new copy */ > >>> if (unlikely(folio !=3D swapcache && swapcache)) { > >>> page_add_new_anon_rmap(page, vma, vmf->address); > >>> @@ -3959,7 +3960,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) > >>> VM_BUG_ON(!folio_test_anon(folio) || > >>> (pte_write(pte) && !PageAnonExclusive(page))); > >>> set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte); > >>> - arch_do_swap_page(vma->vm_mm, vma, vmf->address, pte, vmf->orig_p= te); > >>> folio_unlock(folio); > >>> if (folio !=3D swapcache && swapcache) { > >> > >> > >> You are moving the folio_free_swap() call after the folio_ref_count(fo= lio) > >> =3D=3D 1 check, which means that such (previously) swapped pages that = are > >> exclusive cannot be detected as exclusive. > >> > >> There must be a better way to handle MTE here. > >> > >> Where are the tags stored, how is the location identified, and when ar= e they > >> effectively restored right now? > > > > I haven't gone through Peter's patches yet but a pretty good descriptio= n > > of the problem is here: > > https://lore.kernel.org/all/5050805753ac469e8d727c797c2218a9d780d434.ca= mel@mediatek.com/. > > I couldn't reproduce it with my swap setup but both Qun-wei and Peter > > triggered it. > > > > When a tagged page is swapped out, the arm64 code stores the metadata > > (tags) in a local xarray indexed by the swap pte. When restoring from > > swap, the arm64 set_pte_at() checks this xarray using the old swap pte > > and spills the tags onto the new page. Apparently something changed in > > the kernel recently that causes swap_range_free() to be called before > > set_pte_at(). The arm64 arch_swap_invalidate_page() frees the metadata > > from the xarray and the subsequent set_pte_at() won't find it. > > > > If we have the page, the metadata can be restored before set_pte_at() > > and I guess that's what Peter is trying to do (again, I haven't looked > > at the details yet; leaving it for tomorrow). > > Thanks for the details! I was missing that we also have a hook in > swap_range_free(). > > > > > Is there any other way of handling this? E.g. not release the metadata > > in arch_swap_invalidate_page() but later in set_pte_at() once it was > > restored. But then we may leak this metadata if there's no set_pte_at() > > (the process mapping the swap entry died). > > That was my immediate thought: do we really have to hook into > swap_range_free() at all? As I alluded to in another reply, without the hook in swap_range_free() I think we would either end up with a race or an effective memory leak in the arch code that maintains the metadata for swapped out pages, as there would be no way for the arch-specific code to know when it is safe to free it after swapin. > And I also wondered why we have to do this > from set_pte_at() and not do this explicitly (maybe that's the other > arch_* callback on the swapin path). I don't think it's necessary, as the set_pte_at() call sites for swapped in pages are known. I'd much rather do this via an explicit hook at those call sites, as the existing approach of implicit restoring seems too subtle and easy to be overlooked when refactoring, as we have seen with this bug. In the end we only have 3 call sites for the hook and hopefully the comments that I'm adding are sufficient to ensure that any new swapin code should end up with a call to the hook in the right place. Peter