From: Marcelo Tosatti <mtosatti@redhat.com>
To: Christoph Lameter <cl@gentwo.de>
Cc: Frederic Weisbecker <frederic@kernel.org>,
atomlin@atomlin.com, tglx@linutronix.de, mingo@kernel.org,
peterz@infradead.org, pauld@redhat.com, neelx@redhat.com,
oleksandr@natalenko.name, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v13 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies
Date: Wed, 11 Jan 2023 14:07:47 -0300 [thread overview]
Message-ID: <Y77s4x5yC4O1OxTQ@tpad> (raw)
In-Reply-To: <60183179-3a28-6bf9-a6ab-8a8976f283d@gentwo.de>
On Wed, Jan 11, 2023 at 09:42:18AM +0100, Christoph Lameter wrote:
> On Tue, 10 Jan 2023, Marcelo Tosatti wrote:
>
> > > The basic primitives add a lot of weight.
> >
> > Can't see any alternative given the necessity to avoid interruption
> > by the work to sync per-CPU vmstats to global vmstats.
>
> this_cpu operations are designed to operate on a *single* value (a counter) and can
> be run on an arbitrary cpu, There is no preemption or interrupt
> disable required since the counters of all cpus will be added up at the
> end.
>
> You want *two* values (the counter and the dirty flag) to be modified
> together and want to use the counters/flag to identify the cpu where
> these events occurred. this_cpu_xxx operations are not suitable for that
> purpose. You would need a way to ensure that both operations occur on the
> same cpu.
Which is either preempt_disable (CONFIG_HAVE_CMPXCHG_LOCAL case), or
local_irq_disable (!CONFIG_HAVE_CMPXCHG_LOCAL case).
> > > > And the pre cpu atomic updates operations require the modification
> > > of multiple values. The operation
> > > cannot be "atomic" in that sense anymore and we need some other form of
> > > synchronization that can
> > > span multiple instructions.
> >
> > So use this_cpu_cmpxchg() to avoid the overhead. Since we can no longer
> > count on preremption being disabled we still have some minor issues.
> > The fetching of the counter thresholds is racy.
> > A threshold from another cpu may be applied if we happen to be
> > rescheduled on another cpu. However, the following vmstat operation
> > will then bring the counter again under the threshold limit.
> >
> > Those small issues are gone, OTOH.
>
> Well you could use this_cpu_cmpxchg128 to update a 64 bit counter and a
> flag at the same time.
But then you transform the "per-CPU vmstat is dirty" bit (bool) into a
number of flags that must be scanned (when returning to userspace).
Which increases the overhead of a fast path (return to userspace).
> Otherwise you will have to switch off preemption or
> interrupts when incrementing the counters and updating the dirty flag.
>
> Thus you do not really need the this_cpu operations anymore. It would
> best to use a preempt_disable section and uuse C operators -- ++ for the
> counter and do regular assignment for the flag.
OK, can replace this_cpu operations with this_cpu_ptr + standard C operators
(and in fact can do that for interrupt disabled functions as well, that
is CONFIG_HAVE_CMPXCHG_LOCAL not defined).
Is that it?
next prev parent reply other threads:[~2023-01-11 17:08 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-05 12:52 [PATCH v13 0/6] Ensure quiet_vmstat() is called when returning to userpace and when idle tick is stopped Marcelo Tosatti
2023-01-05 12:52 ` [PATCH v13 1/6] mm/vmstat: Add CPU-specific variable to track a vmstat discrepancy Marcelo Tosatti
2023-01-10 11:58 ` Christoph Lameter
2023-01-10 12:12 ` Frederic Weisbecker
2023-01-05 12:52 ` [PATCH v13 2/6] mm/vmstat: Use vmstat_dirty to track CPU-specific vmstat discrepancies Marcelo Tosatti
2023-01-10 12:06 ` Christoph Lameter
2023-01-10 12:18 ` Frederic Weisbecker
2023-01-10 13:39 ` Christoph Lameter
2023-01-10 20:09 ` Marcelo Tosatti
2023-01-11 8:42 ` Christoph Lameter
2023-01-11 17:07 ` Marcelo Tosatti [this message]
2023-01-16 9:51 ` Christoph Lameter
2023-01-16 16:11 ` Marcelo Tosatti
2023-01-17 12:52 ` Christoph Lameter
2023-01-05 12:52 ` [PATCH v13 3/6] mm/vmstat: manage per-CPU stats from CPU context when NOHZ full Marcelo Tosatti
2023-01-05 12:52 ` [PATCH v13 4/6] tick/nohz_full: Ensure quiet_vmstat() is called on exit to user-mode when the idle tick is stopped Marcelo Tosatti
2023-01-05 12:52 ` [PATCH v13 5/6] tick/sched: Ensure quiet_vmstat() is called when the idle tick was stopped too Marcelo Tosatti
2023-01-05 12:52 ` [PATCH v13 6/6] mm/vmstat: avoid queueing work item if cpu stats are clean Marcelo Tosatti
[not found] ` <20230106001244.4463-1-hdanton@sina.com>
2023-01-06 12:51 ` [PATCH v13 3/6] mm/vmstat: manage per-CPU stats from CPU context when NOHZ full Marcelo Tosatti
2023-01-06 15:01 ` Hillf Danton
2023-01-06 18:16 ` Marcelo Tosatti
2023-01-07 0:15 ` Hillf Danton
2023-01-09 14:12 ` Marcelo Tosatti
2023-01-10 2:43 ` Hillf Danton
2023-01-10 11:50 ` Marcelo Tosatti
2023-01-10 15:19 ` Hillf Danton
2023-01-10 16:12 ` Frederic Weisbecker
2023-01-10 23:58 ` Hillf Danton
2023-01-11 0:09 ` Frederic Weisbecker
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y77s4x5yC4O1OxTQ@tpad \
--to=mtosatti@redhat.com \
--cc=atomlin@atomlin.com \
--cc=cl@gentwo.de \
--cc=frederic@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mingo@kernel.org \
--cc=neelx@redhat.com \
--cc=oleksandr@natalenko.name \
--cc=pauld@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox