linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Chih-En Lin <shiyn.lin@gmail.com>, Nadav Amit <namit@vmware.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Qi Zheng <zhengqi.arch@bytedance.com>,
	Matthew Wilcox <willy@infradead.org>,
	Christophe Leroy <christophe.leroy@csgroup.eu>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Luis Chamberlain <mcgrof@kernel.org>,
	Kees Cook <keescook@chromium.org>,
	Iurii Zaikin <yzaikin@google.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	William Kucharski <william.kucharski@oracle.com>,
	"Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>,
	Peter Xu <peterx@redhat.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Tong Tiangen <tongtiangen@huawei.com>,
	Pasha Tatashin <pasha.tatashin@soleen.com>,
	Li kunyu <kunyu@nfschina.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Minchan Kim <minchan@kernel.org>, Yang Shi <shy828301@gmail.com>,
	Song Liu <song@kernel.org>, Miaohe Lin <linmiaohe@huawei.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	Andy Lutomirski <luto@kernel.org>,
	Fenghua Yu <fenghua.yu@intel.com>,
	Dinglan Peng <peng301@purdue.edu>,
	Pedro Fonseca <pfonseca@purdue.edu>,
	Jim Huang <jserv@ccns.ncku.edu.tw>,
	Huichun Feng <foxhoundsk.tw@gmail.com>
Subject: Re: [RFC PATCH v2 9/9] mm: Introduce Copy-On-Write PTE table
Date: Wed, 28 Sep 2022 16:03:19 +0200	[thread overview]
Message-ID: <c12f848d-cb54-2998-8650-2c2a5707932d@redhat.com> (raw)
In-Reply-To: <YzNUwxU44mq+KnCm@strix-laptop>

On 27.09.22 21:53, Chih-En Lin wrote:
> On Tue, Sep 27, 2022 at 06:38:05PM +0000, Nadav Amit wrote:
>> I only skimmed the patches that you sent. The last couple of patches seem a
>> bit rough and dirty, so I am sorry to say that I skipped them (too many
>> “TODO” and “XXX” for my taste).
>>
>> I am sure other will have better feedback than me. I understand there is a
>> tradeoff and that this mechanism is mostly for high performance
>> snapshotting/forking. It would be beneficial to see whether this mechanism
>> can somehow be combined with existing ones (mshare?).
> 
> Still thanks for your feedback. :)
> I'm looking at the PTE refcount and mshare patches. And, maybe it can
> combine with them in the future.
> 
>> The code itself can be improved. I found the reasoning about synchronization
>> and TLB flushes and synchronizations to be lacking, and the code to seem
>> potentially incorrect. Better comments would help, even if the code is
>> correct.
>>
>> There are additional general questions. For instance, when sharing a
>> page-table, do you properly update the refcount/mapcount of the mapped
>> pages? And are there any possible interactions with THP?
> 
> Since access to those mapped pages will cost a lot of time, and this
> will make fork() even have more overhead. It will not update the
> refcount/mapcount of the mapped pages.

Oh no.

So we'd have pages logically mapped into two processes (two page table 
structures), but the refcount/mapcount/PageAnonExclusive would not 
reflect that?

Honestly, I don't think it is upstream material in that hacky form. No, 
we don't need more COW CVEs or more COW over-complications that 
destabilize the whole system.

IMHO, a relaxed form that focuses on only the memory consumption 
reduction could *possibly* be accepted upstream if it's not too invasive 
or complex. During fork(), we'd do exactly what we used to do to PTEs 
(increment mapcount, refcount, trying to clear PageAnonExclusive, map 
the page R/O, duplicate swap entries; all while holding the page table 
lock), however, sharing the prepared page table with the child process 
using COW after we prepared it.

Any (most once we want to *optimize* rmap handling) modification 
attempts require breaking COW -- copying the page table for the faulting 
process. But at that point, the PTEs are already write-protected and 
properly accounted (refcount/mapcount/PageAnonExclusive).

Doing it that way might not require any questionable GUP hacks and 
swapping, MMU notifiers etc. "might just work as expected" because the 
accounting remains unchanged" -- we simply de-duplicate the page table 
itself we'd have after fork and any modification attempts simply replace 
the mapped copy.

But devil is in the detail (page table lock, TLB flushing).

"will make fork() even have more overhead" is not a good excuse for such 
complexity/hacks -- sure, it will make your benchmark results look 
better in comparison ;)

-- 
Thanks,

David / dhildenb



  parent reply	other threads:[~2022-09-28 14:03 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-27 16:29 [RFC PATCH v2 0/9] Introduce Copy-On-Write to Page Table Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 1/9] mm: Add new mm flags for Copy-On-Write PTE table Chih-En Lin
2022-09-27 17:23   ` Nadav Amit
2022-09-27 17:36     ` Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 2/9] mm: pgtable: Add sysctl to enable COW PTE Chih-En Lin
2022-09-27 17:27   ` Nadav Amit
2022-09-27 18:05     ` Chih-En Lin
2022-09-27 21:22   ` John Hubbard
2022-09-28  8:36     ` Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 3/9] mm, pgtable: Add ownership to PTE table Chih-En Lin
2022-09-27 17:30   ` Nadav Amit
2022-09-27 18:23     ` Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 4/9] mm: Add COW PTE fallback functions Chih-En Lin
2022-09-27 17:51   ` Nadav Amit
2022-09-27 19:00     ` Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 5/9] mm, pgtable: Add a refcount to PTE table Chih-En Lin
2022-09-27 17:59   ` Nadav Amit
2022-09-27 19:07     ` Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 6/9] mm, pgtable: Add COW_PTE_OWNER_EXCLUSIVE flag Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 7/9] mm: Add the break COW PTE handler Chih-En Lin
2022-09-27 18:15   ` Nadav Amit
2022-09-27 19:23     ` Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 8/9] mm: Handle COW PTE with reclaim algorithm Chih-En Lin
2022-09-27 16:29 ` [RFC PATCH v2 9/9] mm: Introduce Copy-On-Write PTE table Chih-En Lin
2022-09-27 18:38   ` Nadav Amit
2022-09-27 19:53     ` Chih-En Lin
2022-09-27 21:26       ` John Hubbard
2022-09-28  8:52         ` Chih-En Lin
2022-09-28 14:03       ` David Hildenbrand [this message]
2022-09-29 13:38         ` Chih-En Lin
2022-09-29 13:49           ` Chih-En Lin
2022-09-29 17:24           ` David Hildenbrand
2022-09-29 18:29             ` Chih-En Lin
2022-09-29 18:38               ` David Hildenbrand
2022-09-29 18:57                 ` Chih-En Lin
2022-09-29 19:00                   ` David Hildenbrand
2022-09-29 18:40               ` Nadav Amit
2022-09-29 19:02                 ` Chih-En Lin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=c12f848d-cb54-2998-8650-2c2a5707932d@redhat.com \
    --to=david@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=arnd@arndb.de \
    --cc=bigeasy@linutronix.de \
    --cc=christophe.leroy@csgroup.eu \
    --cc=fenghua.yu@intel.com \
    --cc=foxhoundsk.tw@gmail.com \
    --cc=jserv@ccns.ncku.edu.tw \
    --cc=keescook@chromium.org \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=kunyu@nfschina.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=mcgrof@kernel.org \
    --cc=minchan@kernel.org \
    --cc=namit@vmware.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peng301@purdue.edu \
    --cc=peterx@redhat.com \
    --cc=pfonseca@purdue.edu \
    --cc=shiyn.lin@gmail.com \
    --cc=shy828301@gmail.com \
    --cc=song@kernel.org \
    --cc=surenb@google.com \
    --cc=tglx@linutronix.de \
    --cc=tongtiangen@huawei.com \
    --cc=vbabka@suse.cz \
    --cc=william.kucharski@oracle.com \
    --cc=willy@infradead.org \
    --cc=yzaikin@google.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox