From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B46EC3601A for ; Mon, 7 Apr 2025 08:43:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 465646B0008; Mon, 7 Apr 2025 04:43:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 414006B000A; Mon, 7 Apr 2025 04:43:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DCF56B000C; Mon, 7 Apr 2025 04:43:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0A2B76B0008 for ; Mon, 7 Apr 2025 04:43:53 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6AA3ABAF0C for ; Mon, 7 Apr 2025 08:43:53 +0000 (UTC) X-FDA: 83306609946.29.C106533 Received: from mail.ispras.ru (mail.ispras.ru [83.149.199.84]) by imf19.hostedemail.com (Postfix) with ESMTP id 324601A0006 for ; Mon, 7 Apr 2025 08:43:50 +0000 (UTC) Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=ispras.ru header.s=default header.b=WWFWpwHc; dmarc=pass (policy=none) header.from=ispras.ru; spf=pass (imf19.hostedemail.com: domain of pchelkin@ispras.ru designates 83.149.199.84 as permitted sender) smtp.mailfrom=pchelkin@ispras.ru ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1744015431; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:dkim-signature; bh=bVvnde9LYGm8pmSZbYF4pWYxMC89ikwIShwOp7lq0S4=; b=Q6JPHxwvXD2g7S1XQmCmE1E+WygRzEk66eKfka9grzUwqLeTNRvD1Ng7gqYOIn5BiihgU5 hMDNuE339dvETdsT4ZHZBrZ28ddorFPQFEl9SnrhgjFaXgkHJrXdzfbL8WG9HsN2NyacbG QoTFMMggdh6WKTMOSxJcSRfddfrbBNc= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1744015431; a=rsa-sha256; cv=none; b=k20DVGqvIAfZM4Pq7buDdAwhIYDJXpPkO6Y65fG0JGjeIwQ65Zw4HevyrMLrMzFlmlwfW3 FN2MFi/rXDsz98dYTrnkBm5wI4LQrWyDloVTar2VkbmKJfA8rsWmWKDmGACECvdj8l5Cxq oNWd+ifK07j8WA5UJ6OcQOYOLHw50Ug= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=ispras.ru header.s=default header.b=WWFWpwHc; dmarc=pass (policy=none) header.from=ispras.ru; spf=pass (imf19.hostedemail.com: domain of pchelkin@ispras.ru designates 83.149.199.84 as permitted sender) smtp.mailfrom=pchelkin@ispras.ru Received: from localhost (unknown [10.10.165.5]) by mail.ispras.ru (Postfix) with ESMTPSA id E09E740755D8; Mon, 7 Apr 2025 08:43:48 +0000 (UTC) DKIM-Filter: OpenDKIM Filter v2.11.0 mail.ispras.ru E09E740755D8 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ispras.ru; s=default; t=1744015429; bh=bVvnde9LYGm8pmSZbYF4pWYxMC89ikwIShwOp7lq0S4=; h=Date:From:To:Cc:Subject:In-Reply-To:From; b=WWFWpwHcIfZjYXczFXwFVnIkCCTE5pxdgCH5GTgRq4R1bN19BTwm5vjkOd+iW5HhI 2/WVv2WX5LMKTygmUk8Jz9ryyW/XW9QZ+DsOK5Vah4+76C0eC7CgCg828jBGfJMpyN 7EM/7/YaGxH+MaXzz/pdKaPlabhEb5IOl5qlv28U= Date: Mon, 7 Apr 2025 11:43:48 +0300 From: Fedor Pchelkin To: David Hildenbrand , peterx@redhat.com Cc: mawupeng , linux-kernel@vger.kernel.org, linux-mm@kvack.org, x86@kernel.org, xrivendell7@gmail.com, wang1315768607@163.com, fleischermarius@gmail.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, hpa@zytor.com, akpm@linux-foundation.org Subject: Re: [PATCH v1] x86/mm/pat: fix VM_PAT handling when fork() fails in copy_page_range() Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <262aa19c-59fe-420a-aeae-0b1866a3e36b@redhat.com> X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 324601A0006 X-Stat-Signature: 4u4ueteey46t9phkq1x4tu7remtcuqd5 X-Rspam-User: X-HE-Tag: 1744015430-633411 X-HE-Meta: U2FsdGVkX1+Jkck2EAClKG4Ja31mL9Mo8fOoOmrRCto3dPXM+Zm7AKBk89SR7BXbzZopTBWQ7SFmDVv3G0+S/052KhAeq+gxe8lWfZv6RkQm5xfiLuNcvmAEKgIrm2Y4IGHLF9rs2Kar7KBpFps8jADdJnInxIAsGic5ISY4+hjXDx3V/XM3rEMBjDysabNDTl35DMgQmaOzurxDMMuBt9HKilyyMg1qCRCzQFEHOy2+SybgGdsVPh0f7WKDSOSyFAJfkJ7MDWVLgJqK4UZ8BEDz64vS+eNBdopL4ffnevu5scuwTR485B406GXVQEyrmJtJ7u4piWAG7dS0UvkXiraHdOYf+ZBY2VBBqipwUmcpsWzVpgYvCJNvXSHPugpjbCIsTwcyy8FZF/jsbeQpzP/FnmHfILQmgWoOYD6+E9jUyb3gRo/+whvjSce9wyZ1MlEN+dgMulYCWJ3Sur7D/A7XOTFwahvoHSHcAyGhgM1Mll1UhtMEOuvAndjxWaHwPcSrn92UZrLKXfDp8GD+Dv+2t3NC90hrE7oKMB2xrftAwTp3F7NWPKUmkkdQDpzfnFVDp8cGBN5puwyb1lfylK3trveN+2zXa+ygMbHpEcwhrFFXdmIC7wjI7e5QhwoQJs45ZFUkYlcUVs/AXGoy1EeC2xdH4Zqf/9UNy8Xm4Nx73hroHwn6eOgVgefjcEMkEuTMms5yZSyJmOUVIbGN66DQO/FEoE5+L8a0j2oI1hMWt5Q7EMCBoGbfJbYU7p3qGJF2sCsTE5vP8sD/GifT3VK+yWikfZh1zT3ghnIPKutHBMTN0aqUKt/Ur+ZZ0fS4ouUbIekNbufhKSq4WUVuTkY20tO2ysuUTzFyNa8my1DNuEKytOvJ8IDfnHfDyL2g8fE9J/ldw5iyzq2lEXW3+WYqNv59LfgmLXSdB9pwlfTBi+oHYy2jRdUWoQZonOXCtsU3KO/6l/HMju8QU3y V31HSknZ dxYAZWJUinwCUGQpWR04mVri74M/NNlJoRdppl5XF/4YjH4lHwDk+fFigBr12eeEn4+P63Fqa1oYuFAosD9PWicLEV1JjlV1/qL0FfhCFH6JhWRKnH0NPY1C8z8ia6cpWKQpmFFJ19RKocxv4DGenRcYybmOkG2beGGCTai/TwFfRR70zFR+yTW34Bx8AZDLzTRh/7PuHOwx4xj+9Cr+/zTfPfYMp+hfpWcs+ptSnJqTr5pyLuf53mzzusG0FhbtukrZ5VTvzHe/iYosi3K4Xy3cF/5e6aP+qC9aBbrCvFqLP5Ro2a1sp2wqPfretFvu9Q7sFrPqtgRGuuC2IldPYZIQo6doO4BlMIQK91SqT/JSIKB5ZwOvF77USIZmXA7bu9AV0KzsFghZO1vY0o5Tgt+LwEWuBM3gKvrnjBI8ftuvTOvMlnetgnYZQyXO5BppfqeKY/3NttYd6nBjqD4naCeX0n7OPYWE2VdDYe1GLdcha/lMXIqhLn7DRPw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, David, Peter Sorry for reviving an old thread. I've tried to keep the context as-is. Here is an original link in the archives: https://lore.kernel.org/lkml/20241029210331.1339581-1-david@redhat.com/T/#u Please see below. On 07.11.24 10:08, David Hildenbrand wrote > On 07.11.24 09:43, mawupeng wrote: > > On 2024/10/31 17:47, David Hildenbrand wrote: > >> On 30.10.24 22:32, Peter Xu wrote: > >>> On Tue, Oct 29, 2024 at 10:03:31PM +0100, David Hildenbrand wrote: > >>>> If track_pfn_copy() fails, we already added the dst VMA to the maple > >>>> tree. As fork() fails, we'll cleanup the maple tree, and stumble over > >>>> the dst VMA for which we neither performed any reservation nor copied > >>>> any page tables. > >>>> > >>>> Consequently untrack_pfn() will see VM_PAT and try obtaining the > >>>> PAT information from the page table -- which fails because the page > >>>> table was not copied. > >>>> > >>>> The easiest fix would be to simply clear the VM_PAT flag of the dst VMA > >>>> if track_pfn_copy() fails. However, the whole thing is about "simply" > >>>> clearing the VM_PAT flag is shaky as well: if we passed track_pfn_copy() > >>>> and performed a reservation, but copying the page tables fails, we'll > >>>> simply clear the VM_PAT flag, not properly undoing the reservation ... > >>>> which is also wrong. > >>> > >>> David, > >>> > >> > >> Hi Peter, > >> > >>> Sorry to not have chance yet reply to your other email.. > >>> > >>> The only concern I have with the current fix to fork() is.. we started to > >>> have device drivers providing fault() on PFNMAPs as vfio-pci does, then I > >>> think it means we could potentially start to hit the same issue even > >>> without fork(), but as long as the 1st pgtable entry of the PFNMAP range is > >>> not mapped when the process with VM_PAT vma exit()s, or munmap() the vma. > >> > >> As these drivers are not using remap_pfn_range, there is no way they could currently get VM_PAT set. > >> > >> So what you describe is independent of the current state we are fixing here, and this fix should sort out the issues with current VM_PAT handling. > >> > >> It indeed is an interesting question how to handle reservations when *not* using remap_pfn_range() to cover the whole area. > >> > >> remap_pfn_range() handles VM_PAT automatically because it can do it: it knows that the whole range will map consecutive PFNs with the same protection, and we expect not parts of the range suddenly getting unmapped (and any driver that does that is buggy). > >> > >> This behavior is, however, not guaranteed to be the case when remap_pfn_range() is *not* called on the whole range. > >> > >> For that case (i.e., vfio-pci) I still wonder if the driver shouldn't do the reservation and leave VM_PAT alone. > >> > >> In the driver, we'd do the reservation once and not worry about fork() etc ... and we'd undo the reservation once the last relevant VM_PFNMAP VMA is gone or the driver let's go of the device. I assume there are already mechanisms in place to deal with that to some degree, because the driver cannot go away while any VMA still has the VM_PFNMAP mapping -- otherwise something would be seriously messed up. > >> > >> Long story short: let's look into not using VM_PAT for that use case. > >> > >> Looking at the VM_PAT issues we had over time, not making it more complicated sounds like a very reasonable thing to me :) > > > > Hi David, > > > > The VM_PAT reservation do seems complicated. It can trigger the same warning in get_pat_info if remap_p4d_range fails: > > > > remap_pfn_range > > remap_pfn_range_notrack > > remap_pfn_range_internal > > remap_p4d_range // page allocation can failed here > > zap_page_range_single > > unmap_single_vma > > untrack_pfn > > get_pat_info > > WARN_ON_ONCE(1); > > > > Any idea on this problem? > > In remap_pfn_range(), if remap_pfn_range_notrack() fails, we call > untrack_pfn(), to undo the tracking. > > The problem is that zap_page_range_single() shouldn't do that > untrack_pfn() call. > > That should be fixed by Peter's patch: > > https://lore.kernel.org/all/20240712144244.3090089-1-peterx@redhat.com/T/#u The fix seemingly has not been applied so the issue in question still persists. There is a long thread on that patch without an explicit conclusion. Did the patch cause any problems or its status changed? Thanks for your time! > > -- > Cheers, > > David / dhildenb >