linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Peter Xu <peterx@redhat.com>
Cc: David Wang <00107082@163.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Alex Williamson <alex.williamson@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Andy Lutomirski <luto@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	"Kirill A . Shutemov" <kirill@shutemov.name>,
	x86@kernel.org, Yan Zhao <yan.y.zhao@intel.com>,
	Kevin Tian <kevin.tian@intel.com>, Pei Li <peili.dev@gmail.com>,
	Bert Karwatzki <spasswolf@web.de>,
	Sergey Senozhatsky <senozhatsky@chromium.org>
Subject: Re: [PATCH] mm/x86/pat: Only untrack the pfn range if unmap region
Date: Wed, 17 Jul 2024 16:14:15 +0200	[thread overview]
Message-ID: <2c6ec60e-1eff-417e-aed2-4554ea9a86eb@redhat.com> (raw)
In-Reply-To: <ZpU6KsKuhzPqUpFF@x1n>

[catching up on mails]

>> indicates that file truncation seems to end up messing with a PFNMAP mapping
>> that has PAT set. That is ... weird. I would have thought that PFNMAP would
>> never really happen with file truncation.
>>
>> Does this only happen with an OOT driver, that seems to do weird truncate
>> stuff on files that have a PFNMAP mapping?
>>
>> [1]
>> https://lore.kernel.org/all/3879ee72-84de-4d2a-93a8-c0b3dc3f0a4c@redhat.com/
> 
> Ohhh.. I guess this will also stop working in VFIO, but I think it's fine
> for now because as Yan pointed out VFIO PCI doesn't register those regions
> now so VM_PAT is not yet set..

Interesting, I was assuming that VFIO might be relying on that.

> 
> And one thing I said wrong in the previous reply to Yan is, obviously
> memtype_check_insert() can work with >1 owners as long as the memtype
> matches.. and that's how fork() works where VM_PAT needs to be duplicated.
> But this whole thing is a bit confusing to me..  As I think it also means
> when fork the track_pfn_copy() will call memtype_kernel_map_sync one more
> time even if we're 100% sure the pgprot will be the same for the kernel
> mappings..

I consider the VM_PAT code quite ugly and I wish we could just get rid 
of it (especially, the automatic "entire VMA covered" handling thingy).

> 
> I wonder whether there's some way that untrack pfn framework doesn't need
> to rely on the pgtable to fetch the pfn, because VFIO MMIO region
> protection will also do that in the near future, AFAICT.  The pgprot part
> should be easy there to fetch: get_pat_info() should fallback to vma's
> pgprot if no mapping found; the only outlier should be CoW pages in
> reality.  The pfn is the real issue so far, so that either track_pfn_copy()
> or untrack_pfn() may need to know the pfn to untrack, even if it only has
> the vma information.

I had a prototype to store that information per VMA to avoid the page 
table lookup. VMA splitting was a bit "added complication", but I got it 
to work. (maybe I can still find it if there is demand)

The downside was having to consume more memory for all VMAs in the 
system simply (even if only 8 byte) because a handful of VMAs in the 
system could be VM_PAT. I decided that's not what we want. I managed to 
not consume memory in some configurations, but not in all, so I 
discarded that approach.

I did not explore storing that information in some auxiliary datastructure.

IMHO the whole VM_PAT model is weird:

1) mmap()
2) remap_pfn_range(): if it covers the whole VMA apply some magic
    reservation.
3) munmap(): we unmap *all* PFNs and, therefore, clean up VM_PAT

(VMA splitting make the whole model weirder, but it works, because we 
never merge these VMAs)

This model cannot properly work if we get partial page table zapping via 
truncation/MADV_DONTNEED or similar things after 2). And likely we also 
shouldn't be doing it that way. We should forbid any partial unmappings 
in that model, just like we already disallow MADV_DONTNEED as you note.

As you mention in your other comment, maybe relevant/all? caller should 
just manage the PAT side independently. So maybe we can move to a 
different model.


-- 
Cheers,

David / dhildenb



  reply	other threads:[~2024-07-17 14:14 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-12 14:42 Peter Xu
2024-07-13  1:18 ` David Hildenbrand
2024-07-13  3:36 ` David Wang
2024-07-14 10:59 ` David Wang
2024-07-14 18:27   ` [PATCH] " David Hildenbrand
2024-07-15 15:03     ` Peter Xu
2024-07-17 14:14       ` David Hildenbrand [this message]
2024-07-17 16:27         ` Peter Xu
2024-07-15  7:08 ` Yan Zhao
2024-07-15 14:29   ` Peter Xu
2024-07-16  9:13     ` Yan Zhao
2024-07-16 19:01       ` Peter Xu
2024-07-17  1:38         ` Yan Zhao
2024-07-17 14:15           ` Peter Xu
2024-07-18  1:50             ` Yan Zhao
2024-07-18 14:03               ` Peter Xu
2024-07-18 23:18                 ` Yan Zhao
2024-07-19  8:28                   ` David Hildenbrand
2024-07-19 14:13                     ` Peter Xu
2024-07-22  6:49                       ` Yan Zhao
2024-07-22 13:52                         ` Peter Xu
2024-07-22  6:43                     ` Yan Zhao
2024-07-22  9:17                       ` David Hildenbrand
2024-07-23 20:27                         ` Peter Xu
2024-07-23 21:36                           ` David Hildenbrand
2024-07-23 21:44                             ` Jason Gunthorpe
2024-07-24  8:53                               ` David Hildenbrand
2024-07-17 14:17         ` David Hildenbrand
2024-07-17 16:30           ` Peter Xu
2024-07-17 16:31             ` Jason Gunthorpe
2024-07-17 18:10               ` Peter Xu
2024-07-17 16:32             ` David Hildenbrand
2024-07-17 18:12               ` Peter Xu
2024-07-20  2:18 ` Liam R. Howlett
2024-07-22 15:15   ` Peter Xu
2024-07-22 20:22     ` Liam R. Howlett
2024-07-22 21:17       ` Peter Xu
2024-07-23 10:12         ` David Hildenbrand
2024-07-23 17:58           ` Liam R. Howlett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c6ec60e-1eff-417e-aed2-4554ea9a86eb@redhat.com \
    --to=david@redhat.com \
    --cc=00107082@163.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.williamson@redhat.com \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mingo@redhat.com \
    --cc=peili.dev@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=senozhatsky@chromium.org \
    --cc=spasswolf@web.de \
    --cc=tglx@linutronix.de \
    --cc=viro@zeniv.linux.org.uk \
    --cc=x86@kernel.org \
    --cc=yan.y.zhao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox