Re: [PATCH 0/2] execve scalability issues, part 1

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Dennis Zhou <dennis@kernel.org>
To: Jan Kara <jack@suse.cz>, Mateusz Guzik <mjguzik@gmail.com>
Cc: linux-kernel@vger.kernel.org, tj@kernel.org, cl@linux.com,
	akpm@linux-foundation.org, shakeelb@google.com,
	linux-mm@kvack.org
Subject: Re: [PATCH 0/2] execve scalability issues, part 1
Date: Wed, 23 Aug 2023 13:27:56 -0700	[thread overview]
Message-ID: <ZOZrzG/MgL8vw+lI@snowbird> (raw)
In-Reply-To: <20230823094915.ggv3spzevgyoov6i@quack3>

On Wed, Aug 23, 2023 at 11:49:15AM +0200, Jan Kara wrote:
> On Tue 22-08-23 16:24:56, Mateusz Guzik wrote:
> > On 8/22/23, Jan Kara <jack@suse.cz> wrote:
> > > On Tue 22-08-23 00:29:49, Mateusz Guzik wrote:
> > >> On 8/21/23, Mateusz Guzik <mjguzik@gmail.com> wrote:
> > >> > True Fix(tm) is a longer story.
> > >> >
> > >> > Maybe let's sort out this patchset first, whichever way. :)
> > >> >
> > >>
> > >> So I found the discussion around the original patch with a perf
> > >> regression report.
> > >>
> > >> https://lore.kernel.org/linux-mm/20230608111408.s2minsenlcjow7q3@quack3/
> > >>
> > >> The reporter suggests dodging the problem by only allocating per-cpu
> > >> counters when the process is going multithreaded. Given that there is
> > >> still plenty of forever single-threaded procs out there I think that's
> > >> does sound like a great plan regardless of what happens with this
> > >> patchset.
> > >>
> > >> Almost all access is already done using dedicated routines, so this
> > >> should be an afternoon churn to sort out, unless I missed a
> > >> showstopper. (maybe there is no good place to stuff a flag/whatever
> > >> other indicator about the state of counters?)
> > >>
> > >> That said I'll look into it some time this or next week.
> > >
> > > Good, just let me know how it went, I also wanted to start looking into
> > > this to come up with some concrete patches :). What I had in mind was that
> > > we could use 'counters == NULL' as an indication that the counter is still
> > > in 'single counter mode'.
> > >
> > 
> > In the current state there are only pointers to counters in mm_struct
> > and there is no storage for them in task_struct. So I don't think
> > merely null-checking the per-cpu stuff is going to cut it -- where
> > should the single-threaded counters land?
> 
> I think you misunderstood. What I wanted to do it to provide a new flavor
> of percpu_counter (sharing most of code and definitions) which would have
> an option to start as simple counter (indicated by pcc->counters == NULL
> and using pcc->count for counting) and then be upgraded by a call to real
> percpu thing. Because I think such counters would be useful also on other
> occasions than as rss counters.
> 

Kent wrote something similar and sent it out last year [1]. However, the
case slightly differs from what we'd want here, 1 -> 2 threads becomes
percpu vs update rate which a single thread might be able to trigger?

[1] https://lore.kernel.org/lkml/20230501165450.15352-8-surenb@google.com/

Thanks,
Dennis

> > Bonus problem, non-current can modify these counters and this needs to
> > be safe against current playing with them at the same time. (and it
> > would be a shame to require current to use atomic on them)
> 
> Hum, I didn't realize that. Indeed I can see that e.g. khugepaged can be
> modifying the counters for other processes. Thanks for pointing this out.
> 
> > That said, my initial proposal adds a union:
> > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> > index 5e74ce4a28cd..ea70f0c08286 100644
> > --- a/include/linux/mm_types.h
> > +++ b/include/linux/mm_types.h
> > @@ -737,7 +737,11 @@ struct mm_struct {
> > 
> >                 unsigned long saved_auxv[AT_VECTOR_SIZE]; /* for
> > /proc/PID/auxv */
> > 
> > -               struct percpu_counter rss_stat[NR_MM_COUNTERS];
> > +               union {
> > +                       struct percpu_counter rss_stat[NR_MM_COUNTERS];
> > +                       u64 *rss_stat_single;
> > +               };
> > +               bool    magic_flag_stuffed_elsewhere;
> > 
> >                 struct linux_binfmt *binfmt;
> > 
> > 
> > Then for single-threaded case an area is allocated for NR_MM_COUNTERS
> > countes * 2 -- first set updated without any synchro by current
> > thread. Second set only to be modified by others and protected with
> > mm->arg_lock. The lock protects remote access to the union to begin
> > with.
> 
> arg_lock seems a bit like a hack. How is it related to rss_stat? The scheme
> with two counters is clever but I'm not 100% convinced the complexity is
> really worth it. I'm not sure the overhead of always using an atomic
> counter would really be measurable as atomic counter ops in local CPU cache
> tend to be cheap. Did you try to measure the difference?
> 
> If the second counter proves to be worth it, we could make just that one
> atomic to avoid the need for abusing some spinlock.
> 
> > Transition to per-CPU operation sets the magic flag (there is plenty
> > of spare space in mm_struct, I'll find a good home for it without
> > growing the struct). It would be a one-way street -- a process which
> > gets a bunch of threads and goes back to one stays with per-CPU.
> 
> Agreed with switching to be a one-way street.
> 
> > Then you get the true value of something by adding both counters.
> > 
> > arg_lock is sparingly used, so remote ops are not expected to contend
> > with anything. In fact their cost is going to go down since percpu
> > summation takes a spinlock which also disables interrupts.
> > 
> > Local ops should be about the same in cost as they are right now.
> > 
> > I might have missed some detail in the above description, but I think
> > the approach is decent.
> 
> 								Honza
> -- 
> Jan Kara <jack@suse.com>
> SUSE Labs, CR

next prev parent reply	other threads:[~2023-08-23 20:28 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-21 20:28 Mateusz Guzik
2023-08-21 20:28 ` [PATCH 1/2] pcpcntr: add group allocation/free Mateusz Guzik
2023-08-22 13:37   ` Vegard Nossum
2023-08-22 14:06     ` Mateusz Guzik
2023-08-22 17:02   ` Dennis Zhou
2023-08-21 20:28 ` [PATCH 2/2] fork: group allocation of per-cpu counters for mm struct Mateusz Guzik
2023-08-21 21:20   ` Matthew Wilcox
2023-08-21 20:42 ` [PATCH 0/2] execve scalability issues, part 1 Matthew Wilcox
2023-08-21 20:44   ` [PATCH 1/7] mm: Make folios_put() the basis of release_pages() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 2/7] mm: Convert free_unref_page_list() to use folios Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 3/7] mm: Add free_unref_folios() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 4/7] mm: Use folios_put() in __folio_batch_release() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 5/7] memcg: Add mem_cgroup_uncharge_batch() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 6/7] mm: Remove use of folio list from folios_put() Matthew Wilcox (Oracle)
2023-08-21 20:44     ` [PATCH 7/7] mm: Use free_unref_folios() in put_pages_list() Matthew Wilcox (Oracle)
2023-08-21 21:07 ` [PATCH 0/2] execve scalability issues, part 1 Dennis Zhou
2023-08-21 21:39   ` Mateusz Guzik
2023-08-21 22:29     ` Mateusz Guzik
2023-08-22  9:51       ` Jan Kara
2023-08-22 14:24         ` Mateusz Guzik
2023-08-23  9:49           ` Jan Kara
2023-08-23 10:49             ` David Laight
2023-08-23 12:01               ` Mateusz Guzik
2023-08-23 12:13             ` Mateusz Guzik
2023-08-23 15:47               ` Jan Kara
2023-08-23 16:10                 ` Mateusz Guzik
2023-08-23 16:41                   ` Jan Kara
2023-08-23 17:12                     ` Mateusz Guzik
2023-08-23 20:27             ` Dennis Zhou [this message]
2023-08-24  9:19               ` Jan Kara
2023-08-26 18:33 ` Mateusz Guzik

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZOZrzG/MgL8vw+lI@snowbird \
    --to=dennis@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=jack@suse.cz \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mjguzik@gmail.com \
    --cc=shakeelb@google.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox