linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hugh Dickins <hughd@google.com>
To: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>, Hugh Dickins <hughd@google.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	 Vasily Gorbik <gor@linux.ibm.com>,
	Mike Kravetz <mike.kravetz@oracle.com>,
	 Mike Rapoport <rppt@kernel.org>,
	 "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
	 Matthew Wilcox <willy@infradead.org>,
	David Hildenbrand <david@redhat.com>,
	 Suren Baghdasaryan <surenb@google.com>,
	 Qi Zheng <zhengqi.arch@bytedance.com>,
	Yang Shi <shy828301@gmail.com>,
	 Mel Gorman <mgorman@techsingularity.net>,
	Peter Xu <peterx@redhat.com>,
	 Peter Zijlstra <peterz@infradead.org>,
	Will Deacon <will@kernel.org>,  Yu Zhao <yuzhao@google.com>,
	Alistair Popple <apopple@nvidia.com>,
	 Ralph Campbell <rcampbell@nvidia.com>,
	Ira Weiny <ira.weiny@intel.com>,
	 Steven Price <steven.price@arm.com>,
	SeongJae Park <sj@kernel.org>,
	 Lorenzo Stoakes <lstoakes@gmail.com>,
	Huang Ying <ying.huang@intel.com>,
	 Naoya Horiguchi <naoya.horiguchi@nec.com>,
	 Christophe Leroy <christophe.leroy@csgroup.eu>,
	 Zack Rusin <zackr@vmware.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	 Anshuman Khandual <anshuman.khandual@arm.com>,
	 Pasha Tatashin <pasha.tatashin@soleen.com>,
	 Miaohe Lin <linmiaohe@huawei.com>,
	Minchan Kim <minchan@kernel.org>,
	 Christoph Hellwig <hch@infradead.org>,
	Song Liu <song@kernel.org>,
	 Thomas Hellstrom <thomas.hellstrom@linux.intel.com>,
	 Russell King <linux@armlinux.org.uk>,
	 "David S. Miller" <davem@davemloft.net>,
	 Michael Ellerman <mpe@ellerman.id.au>,
	 "Aneesh Kumar K.V" <aneesh.kumar@linux.ibm.com>,
	 Heiko Carstens <hca@linux.ibm.com>,
	 Christian Borntraeger <borntraeger@linux.ibm.com>,
	 Claudio Imbrenda <imbrenda@linux.ibm.com>,
	 Alexander Gordeev <agordeev@linux.ibm.com>,
	Jann Horn <jannh@google.com>,
	 Vishal Moola <vishal.moola@gmail.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	 linux-arm-kernel@lists.infradead.org,
	sparclinux@vger.kernel.org,  linuxppc-dev@lists.ozlabs.org,
	linux-s390@vger.kernel.org,  linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page
Date: Thu, 29 Jun 2023 23:00:07 -0700 (PDT)	[thread overview]
Message-ID: <edaa96f-80c1-1252-acbb-71c4f045b035@google.com> (raw)
In-Reply-To: <20230629175645.7654d0a8@thinkpad-T15>

On Thu, 29 Jun 2023, Gerald Schaefer wrote:
> On Thu, 29 Jun 2023 12:22:24 -0300
> Jason Gunthorpe <jgg@ziepe.ca> wrote:
> > On Wed, Jun 28, 2023 at 10:08:08PM -0700, Hugh Dickins wrote:
> > > On Wed, 28 Jun 2023, Gerald Schaefer wrote:  
> > > > 
> > > > As discussed in the other thread, we would rather go with less complexity,
> > > > possibly switching to an approach w/o the list and fragment re-use in the
> > > > future. For now, as a first step in that direction, we can try with not
> > > > adding fragments back only for pte_free_defer(). Here is an adjusted
> > > > version of your patch, copying most of your pte_free_defer() logic and
> > > > also description, tested with LTP and all three of your patch series applied:  
> > > 
> > > Thanks, Gerald: I don't mind abandoning my 13/12 SLAB_TYPESAFE_BY_RCU
> > > patch (posted with fewer Cc's to the s390 list last week), and switching
> > > to your simpler who-cares-if-we-sometimes-don't-make-maximal-use-of-page
> > > patch.
> > > 
> > > But I didn't get deep enough into it today to confirm it - and disappointed
> > > that you've found it necessary to play with pt_frag_refcount in addition to
> > > _refcount and HH bits.  No real problem with that, but my instinct says it
> > > should be simpler.  
> 
> Yes, I also found it a bit awkward, but it seemed "good and simple enough",
> to have something to go forward with, while my instinct was in line with yours.
> 
> > 
> > Is there any reason it should be any different at all from what PPC is
> > doing?
> > 
> > I still think the right thing to do here is make the PPC code common
> > (with Hugh's proposed RCU modification) and just use it in both
> > arches....
> 
> With the current approach, we would not add back fragments _only_ for
> the new pte_free_defer() path, while keeping our cleverness for the other
> paths. Not having a good overview of the negative impact wrt potential
> memory waste, I would rather take small steps, if possible.
> 
> If we later switch to never adding back fragments, of course we should
> try to be in line with PPC implementation.

I find myself half-agreeing with everyone.

I agree with Gerald that s390 should keep close to what it is already
doing (except for adding pte_free_defer()): that changing its strategy
and implementation to be much more like powerpc, is a job for some other
occasion (and would depend on gathering data about how well each does).

But I agree with Jason that the powerpc solution we ended up with cut
out a lot of unnecessary complication: it shifts the RCU delay from
when pte_free_defer() is called, to when the shared page comes to be
freed; which may be a lot later, and might not be welcome in a common
path, but is quite okay for the uncommon pte_free_defer().

And I agree with Alexander that pte_free_lower() and pte_free_upper()
are better names than pte_free_now0() and pte_free_now1(): I was going
to make that change, except all those functions disappear if we follow
Jason's advice and switch the call_rcu() to when freeing the page.

(Lower and upper seem unambiguous to me: Gerald, does your confusion
come just from the way they are shown the wrong way round in the PP AA
diagram?  I corrected that in my patch, but you reverted it in yours.)

I've grown to dislike the (ab)use of pt_frag_refcount even more, to the
extent that I've not even tried to verify it; but I think I do get the
point now, that we need further info than just PPHHAA to know whether
the page is on the list or not.  But I think that if we move where the
call_rcu() is done, then the page can stay on or off the list by same
rules as before (but need to check HH bits along with PP when deciding
whether to allocate, and whether to list_add_tail() when freeing).

So, starting from Gerald's but cutting it down, I was working on the
patch which follows those ideas.  But have run out of puff for tonight,
and would just waste all our time (again) if I sent anything out now.

Hugh


  reply	other threads:[~2023-06-30  6:00 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-20  7:35 [PATCH v2 00/12] mm: free retracted page table by RCU Hugh Dickins
2023-06-20  7:40 ` [PATCH v2 01/12] mm/pgtable: add rcu_read_lock() and rcu_read_unlock()s Hugh Dickins
2023-06-20  7:42 ` [PATCH v2 02/12] mm/pgtable: add PAE safety to __pte_offset_map() Hugh Dickins
2023-06-20  7:43 ` [PATCH v2 03/12] arm: adjust_pte() use pte_offset_map_nolock() Hugh Dickins
2023-06-20  7:45 ` [PATCH v2 04/12] powerpc: assert_pte_locked() " Hugh Dickins
2023-06-20  7:47 ` [PATCH v2 05/12] powerpc: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-06-20 11:45   ` Jason Gunthorpe
2023-06-20 19:54     ` Hugh Dickins
2023-06-20 23:52       ` Jason Gunthorpe
2023-06-22  2:36         ` Hugh Dickins
2023-06-27 17:01           ` Jason Gunthorpe
2023-06-27 20:53             ` Hugh Dickins
2023-06-20  7:49 ` [PATCH v2 06/12] sparc: add pte_free_defer() for pte_t *pgtable_t Hugh Dickins
2023-06-20  7:51 ` [PATCH v2 07/12] s390: add pte_free_defer() for pgtables sharing page Hugh Dickins
2023-06-28 19:16   ` Gerald Schaefer
2023-06-29  5:08     ` Hugh Dickins
2023-06-29 15:22       ` Jason Gunthorpe
2023-06-29 15:56         ` Gerald Schaefer
2023-06-30  6:00           ` Hugh Dickins [this message]
2023-07-02  4:32             ` Hugh Dickins
2023-07-04 13:40               ` Alexander Gordeev
2023-07-04 16:03                 ` Hugh Dickins
2023-07-04 15:19               ` Gerald Schaefer
2023-07-04 17:03                 ` Hugh Dickins
2023-07-05 12:55                   ` Gerald Schaefer
2023-07-06  1:20                     ` Hugh Dickins
2023-07-06 15:02                       ` Gerald Schaefer
2023-07-06 19:45                         ` Hugh Dickins
2023-07-10 17:21                     ` Jason Gunthorpe
2023-07-05  6:46               ` Alexander Gordeev
2023-07-06  0:52                 ` Hugh Dickins
2023-07-07 14:37                   ` Gerald Schaefer
2023-07-03 16:10             ` Gerald Schaefer
2023-06-29 13:59     ` Alexander Gordeev
2023-06-29 15:43       ` Gerald Schaefer
2023-06-30 13:38   ` Claudio Imbrenda
2023-06-30 15:28     ` Hugh Dickins
2023-06-30 16:25       ` Claudio Imbrenda
2023-06-30 19:22         ` Hugh Dickins
2023-07-03 11:00           ` Claudio Imbrenda
2023-07-03 21:29             ` Jason Gunthorpe
2023-06-20  7:53 ` [PATCH v2 08/12] mm/pgtable: add pte_free_defer() for pgtable as page Hugh Dickins
2023-06-20  7:54 ` [PATCH v2 09/12] mm/khugepaged: retract_page_tables() without mmap or vma lock Hugh Dickins
2023-06-20  7:56 ` [PATCH v2 10/12] mm/khugepaged: collapse_pte_mapped_thp() with mmap_read_lock() Hugh Dickins
2023-06-20  8:04   ` [PATCH mm " Hugh Dickins
2023-06-20  7:58 ` [PATCH v2 11/12] mm/khugepaged: delete khugepaged_collapse_pte_mapped_thps() Hugh Dickins
2023-06-20  7:59 ` [PATCH v2 12/12] mm: delete mmap_write_trylock() and vma_try_start_write() Hugh Dickins

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=edaa96f-80c1-1252-acbb-71c4f045b035@google.com \
    --to=hughd@google.com \
    --cc=agordeev@linux.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.ibm.com \
    --cc=anshuman.khandual@arm.com \
    --cc=apopple@nvidia.com \
    --cc=axelrasmussen@google.com \
    --cc=borntraeger@linux.ibm.com \
    --cc=christophe.leroy@csgroup.eu \
    --cc=davem@davemloft.net \
    --cc=david@redhat.com \
    --cc=gerald.schaefer@linux.ibm.com \
    --cc=gor@linux.ibm.com \
    --cc=hca@linux.ibm.com \
    --cc=hch@infradead.org \
    --cc=imbrenda@linux.ibm.com \
    --cc=ira.weiny@intel.com \
    --cc=jannh@google.com \
    --cc=jgg@ziepe.ca \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linmiaohe@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linux@armlinux.org.uk \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lstoakes@gmail.com \
    --cc=mgorman@techsingularity.net \
    --cc=mike.kravetz@oracle.com \
    --cc=minchan@kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=naoya.horiguchi@nec.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rcampbell@nvidia.com \
    --cc=rppt@kernel.org \
    --cc=shy828301@gmail.com \
    --cc=sj@kernel.org \
    --cc=song@kernel.org \
    --cc=sparclinux@vger.kernel.org \
    --cc=steven.price@arm.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=vbabka@suse.cz \
    --cc=vishal.moola@gmail.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    --cc=yuzhao@google.com \
    --cc=zackr@vmware.com \
    --cc=zhengqi.arch@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox