From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C6912C3DA4A for ; Mon, 15 Jul 2024 15:03:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 420BE6B0088; Mon, 15 Jul 2024 11:03:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3A9836B0089; Mon, 15 Jul 2024 11:03:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 224216B008A; Mon, 15 Jul 2024 11:03:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 003DD6B0088 for ; Mon, 15 Jul 2024 11:03:16 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 69C7F415DC for ; Mon, 15 Jul 2024 15:03:16 +0000 (UTC) X-FDA: 82342305192.08.750A884 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf08.hostedemail.com (Postfix) with ESMTP id 0B1AD160024 for ; Mon, 15 Jul 2024 15:03:13 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MDPo4f1O; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721055756; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2yTZb7+mQJZXBK7qYXVgXHJq3MdxWsgEZ9xe9sXqDfQ=; b=RKw4DpJztl7To9HzanJsETC9BNS1/bs8piB9FAALUpa9P53OKTjcr86NhKw/HE6zciAdyr pNqMNvYlSMS3lx/R4hRdy6QgKCItxBOGxF5yeqMuORq/oq1QDq5h2yEWVUBgUmzf8kz+rX TUNZF2XHSEdTvHm0tfLCnwpjeuHsZtI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721055756; a=rsa-sha256; cv=none; b=0fkXnje+x4+oa/YiyzGWqlXhSMTPmt9Tg3ge3mi+hePsec7ntRsZVfUlt8gCsCkEWOmbqp enfbs6AJHyk8hro3UZ3rrNvfvqmqAkJyCeNCfPHcG16BL4icI4IK1iWZlDWlgSz+2bZHwl uioVHQoTbZbMkAnlgU1JXn5gaotwB1E= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=MDPo4f1O; spf=pass (imf08.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1721055793; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=2yTZb7+mQJZXBK7qYXVgXHJq3MdxWsgEZ9xe9sXqDfQ=; b=MDPo4f1O0JEcA0SRAkaIs9HjUbYj54lX7MxaA/7/E3c0Ir1dawM/ZF2xfhhEh+rXrnfvbM 3pzti0SwvVV81z4T62AGPCg/mLvxSVvdoobOMkogwjwPJpEZ+6MD584NrjYq5klWCTL4k8 NW0KzGjBf/+N3IO1jLeJudTIWXgXmYY= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-453-Ve7-nJXvOkS_Dd0VWjqItA-1; Mon, 15 Jul 2024 11:03:10 -0400 X-MC-Unique: Ve7-nJXvOkS_Dd0VWjqItA-1 Received: by mail-qt1-f200.google.com with SMTP id d75a77b69052e-4401522c6bdso13927261cf.3 for ; Mon, 15 Jul 2024 08:03:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721055790; x=1721660590; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=2yTZb7+mQJZXBK7qYXVgXHJq3MdxWsgEZ9xe9sXqDfQ=; b=PggiA/mpbqdOuhdp7ZpCIWrzhS7ivzmNVfyPRZ+/l9Sx5pOsC0whanOSf467QPeSRn fixTCkXIsBuAMs7bE07bC2JD+z45mD+s56Mydk8yPxUthO0RDD1yhjWNmuCuADCmsldp ipeuD8Ifb3ZYgvCIo5dgtkU9Rrf8U874uQYHRc25JTdzVABHy0vCuhwVXI7x71QlYp4u KxhRhCT+OOtgBUhw3eO0WV0D/6oZc8tb73aYI7+axcC4XpTIVmBbthEwrEE+Fd4/aWFM 7NAdcFfyqss5YpCHX/i0b+910M7TJ0XPtVUqwaQClYIMlecjAivzLU3tIQzEahO1ukS3 KliA== X-Forwarded-Encrypted: i=1; AJvYcCV0nfGR5v9cR+sKBjJ0M8KC6PgR3DTvDXRWDDfuMxdWZuf0DW54TfpSH0oMBo29Q5KVq8gDCgF12H5w8n5Oa4CbG3s= X-Gm-Message-State: AOJu0Yx+xZATHtnb76z+YzFIEV0jfD+heGWoJmA7mYpq+xX544n9odHH aPJiYdMSw0jJe6wdpdPJ7Q5M/kp/OYUjq3AJ9eGpDOio4ZgXEi25Uad0NWHEM/Sn4Wh4zqG4I5t UiKxSzL1dIZw9tg5Kg3Uou7BE7N42GyC0wrscjh4XOFsiKsdb X-Received: by 2002:a05:622a:15c4:b0:44e:cff7:3742 with SMTP id d75a77b69052e-44f792aaa18mr14881cf.8.1721055789774; Mon, 15 Jul 2024 08:03:09 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGcMbJqOp8VyQ0MieqpADYPl+M4EHhZD2Cg/JwSuiTWevBLnSOp3kocJixEawjgtgw4Glmx1w== X-Received: by 2002:a05:622a:15c4:b0:44e:cff7:3742 with SMTP id d75a77b69052e-44f792aaa18mr14551cf.8.1721055789286; Mon, 15 Jul 2024 08:03:09 -0700 (PDT) Received: from x1n (pool-99-254-121-117.cpe.net.cable.rogers.com. [99.254.121.117]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44f5b7c4108sm25210541cf.9.2024.07.15.08.03.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 15 Jul 2024 08:03:08 -0700 (PDT) Date: Mon, 15 Jul 2024 11:03:06 -0400 From: Peter Xu To: David Hildenbrand Cc: David Wang <00107082@163.com>, linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrew Morton , Alex Williamson , Jason Gunthorpe , Al Viro , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Thomas Gleixner , Ingo Molnar , Borislav Petkov , "Kirill A . Shutemov" , x86@kernel.org, Yan Zhao , Kevin Tian , Pei Li , Bert Karwatzki , Sergey Senozhatsky Subject: Re: [PATCH] mm/x86/pat: Only untrack the pfn range if unmap region Message-ID: References: <20240712144244.3090089-1-peterx@redhat.com> <1182a459.1e35.190b0e61754.Coremail.00107082@163.com> <8da2b3bf-b9bf-44e3-88ff-750dc91c2388@redhat.com> MIME-Version: 1.0 In-Reply-To: <8da2b3bf-b9bf-44e3-88ff-750dc91c2388@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Stat-Signature: 1cejptoe3p16eui11ge3qyk9ynparq9z X-Rspamd-Queue-Id: 0B1AD160024 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1721055793-127191 X-HE-Meta: U2FsdGVkX1/eXikW8H5j80/BpsZB1IDBJdr17B2boAm/Q9a4bKWos4PYHoU3/eLMBETXjrKswBS17Kl6S0vwFWyGHDw1H/P/EmOOxgPnhksmJz2n6c6wBzQVOhM+qczhpGLjUCu3H1E4M6U+40+5vCO3JJ5aLJ2EM/R0npcwE0uEGGufQM4dbzikgON6tIAisEUUjPRE74HCPRiBLnacLdlVvONMWGiNjSXsrBasbTiACrIlvRjaXcjR7CFpHhGfZsukx1x80z2FvQ65+fM46firL5JA17vjY5A2xq2oLKPWa8bk8HrvM4kLNYFA1JB1fm/rfsTzTdDKrYt3QMEb2R93RtX+5sd6zOfgTVHevdmlgr8bV1mfxCyTlTQC83bIaA7xUYk8SSeJlhRNBTrRTJl/jxjHS2+QJA/oolgP27N9mZKUophxx+l2RFVZUuI3x46fTc/rSxkmqJMsla8JUX6y+Zb933cjDiqadbiAhl8YG6jBjX0duxaEKkPvGUInEMBV35FjxFRDsD13aCS+HcSsf57jh2f42ZAt7rkYmKwJyd3NyOm6sX+4+p7AdpKd/Rw5mvoSmTrYDmYlqim20lsFSdi0Ei1Zspe2O6Di7eVbIOEB2wI1pZuh+H0Rr48hKpFerXoU4lBr+12kvxL2Y7mRB6EfAX9cUCVJ3S1drgv3KN6JSWB9Y+uhwLIhzcfN7wLVT55tCrHqihviqhiD0Zzj8w0hpLj2nosUpJeMG16RGdxVXaqbsCeaEJTnA+eXZze8qWl7Vt84iRoY+/9krhnh4GumhgBgUz52jUVj/TOoaPEdfq+n+WxD1nbr0YJGz/lkwm9QztL50jTetZf9AaY8KMhEnXj3eYtkDiLSp7WRi3fdv34Ge1nLIe3niAfLZR47rLvvBlBe/gF7KFkNptViNF8wADW6z/yfpWwQPNSJ52zftxJWe8lUUnHpFE1IRznhqAHpa4uW915kM+H uUpvi0Kk JJ3fDE74KOSDOp/kr2GSNjBn9OPP5s7SweHgqyUiriCqmY2OvNsB6p6reDnQ2HS0kvCIFGqq5tfp9fEEVqDhNeElQ9Rg3BRy8Hayp8r43CB0LILp/r9outthvjuwKBAxoiUBwL+gndHYckxaPt4goU+paRR1vx1pce7mful2V8E6Mjx6pH1XwRtUC/8sbrR4J4MpTlFi35pbKwiRAN72Kyv6ZMiW92ybVd8PKvzdPopZMZs99qhUixh9oIyzJtDwTn2xucfCmqf5ChntRS17KgIa1KS9BvDq4tzaxle9IDCtjeyrzR85urlCuto4+J8tWnUnfssdsYD0qMUaku5mELZOh4EebzqwbvVsZfTuRHURXLGtrAzfQveqQbFUKusE3jw1tC7ePhE4lPLkChaAhycSqegEGyH6ZHMUlRCr4wXMnoAV35Bwogm0/TCiyJ5qYvycGk9dAk1PAEoQGrFiDc9ZNz6JHZi8BucRKfYq4Dc+M5QgVRBePuDwTfhrkE3u41IPqgiUJ01dgnD+4pNtHMpA24M72LrYCDRJ/qfogqZHyll8a/Xh6ryL0AIQ/w/VInA9rE6sFO08dsRJqw0cyL9awXA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sun, Jul 14, 2024 at 08:27:25PM +0200, David Hildenbrand wrote: > On 14.07.24 12:59, David Wang wrote: > > > > At 2024-07-12 22:42:44, "Peter Xu" wrote: > > > NOTE: I massaged the commit message comparing to the rfc post [1], the > > > patch itself is untouched. Also removed rfc tag, and added more people > > > into the loop. Please kindly help test this patch if you have a reproducer, > > > as I can't reproduce it myself even with the syzbot reproducer on top of > > > mm-unstable. Instead of further check on the reproducer, I decided to send > > > this out first as we have a bunch of reproducers on the list now.. > > > --- > > > mm/memory.c | 5 ++--- > > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > > > diff --git a/mm/memory.c b/mm/memory.c > > > index 4bcd79619574..f57cc304b318 100644 > > > --- a/mm/memory.c > > > +++ b/mm/memory.c > > > @@ -1827,9 +1827,6 @@ static void unmap_single_vma(struct mmu_gather *tlb, > > > if (vma->vm_file) > > > uprobe_munmap(vma, start, end); > > > > > > - if (unlikely(vma->vm_flags & VM_PFNMAP)) > > > - untrack_pfn(vma, 0, 0, mm_wr_locked); > > > - > > > if (start != end) { > > > if (unlikely(is_vm_hugetlb_page(vma))) { > > > /* > > > @@ -1894,6 +1891,8 @@ void unmap_vmas(struct mmu_gather *tlb, struct ma_state *mas, > > > unsigned long start = start_addr; > > > unsigned long end = end_addr; > > > hugetlb_zap_begin(vma, &start, &end); > > > + if (unlikely(vma->vm_flags & VM_PFNMAP)) > > > + untrack_pfn(vma, 0, 0, mm_wr_locked); > > > unmap_single_vma(tlb, vma, start, end, &details, > > > mm_wr_locked); > > > hugetlb_zap_end(vma, &details); > > > -- > > > 2.45.0 > > > > Hi, > > > > Today, I notice a kernel warning with this patch. > > > > > > [Sun Jul 14 16:51:38 2024] OOM killer enabled. > > [Sun Jul 14 16:51:38 2024] Restarting tasks ... done. > > [Sun Jul 14 16:51:38 2024] random: crng reseeded on system resumption > > [Sun Jul 14 16:51:38 2024] PM: suspend exit > > [Sun Jul 14 16:51:38 2024] ------------[ cut here ]------------ > > [Sun Jul 14 16:51:38 2024] WARNING: CPU: 1 PID: 2484 at arch/x86/mm/pat/memtype.c:1002 untrack_pfn+0x10c/0x120 > > We fail to find what we need in the page tables, indicating that the page > tables might have been modified / torn down in the meantime. > > Likely we have a previous call to unmap_single_vma() that modifies the page > tables, and unmaps present PFNs. > > PAT is incompatible to that, it relies on information from the page tables > to know what it has to undo during munmap(), or what it has to do during > fork(). > > The splat from the previous discussion [1]: > > follow_phys arch/x86/mm/pat/memtype.c:957 [inline] > get_pat_info+0xf2/0x510 arch/x86/mm/pat/memtype.c:991 > untrack_pfn+0xf7/0x4d0 arch/x86/mm/pat/memtype.c:1104 > unmap_single_vma+0x1bd/0x2b0 mm/memory.c:1819 > zap_page_range_single+0x326/0x560 mm/memory.c:1920 > unmap_mapping_range_vma mm/memory.c:3684 [inline] > unmap_mapping_range_tree mm/memory.c:3701 [inline] > unmap_mapping_pages mm/memory.c:3767 [inline] > unmap_mapping_range+0x1ee/0x280 mm/memory.c:3804 > truncate_pagecache+0x53/0x90 mm/truncate.c:731 > simple_setattr+0xf2/0x120 fs/libfs.c:886 > notify_change+0xec6/0x11f0 fs/attr.c:499 > do_truncate+0x15c/0x220 fs/open.c:65 > handle_truncate fs/namei.c:3308 [inline] > > indicates that file truncation seems to end up messing with a PFNMAP mapping > that has PAT set. That is ... weird. I would have thought that PFNMAP would > never really happen with file truncation. > > Does this only happen with an OOT driver, that seems to do weird truncate > stuff on files that have a PFNMAP mapping? > > [1] > https://lore.kernel.org/all/3879ee72-84de-4d2a-93a8-c0b3dc3f0a4c@redhat.com/ Ohhh.. I guess this will also stop working in VFIO, but I think it's fine for now because as Yan pointed out VFIO PCI doesn't register those regions now so VM_PAT is not yet set.. And one thing I said wrong in the previous reply to Yan is, obviously memtype_check_insert() can work with >1 owners as long as the memtype matches.. and that's how fork() works where VM_PAT needs to be duplicated. But this whole thing is a bit confusing to me.. As I think it also means when fork the track_pfn_copy() will call memtype_kernel_map_sync one more time even if we're 100% sure the pgprot will be the same for the kernel mappings.. I wonder whether there's some way that untrack pfn framework doesn't need to rely on the pgtable to fetch the pfn, because VFIO MMIO region protection will also do that in the near future, AFAICT. The pgprot part should be easy there to fetch: get_pat_info() should fallback to vma's pgprot if no mapping found; the only outlier should be CoW pages in reality. The pfn is the real issue so far, so that either track_pfn_copy() or untrack_pfn() may need to know the pfn to untrack, even if it only has the vma information. Thanks, -- Peter Xu