From: Johannes Weiner <hannes@cmpxchg.org>
To: Waiman Long <longman@redhat.com>
Cc: David Finkel <davidf@vimeo.com>,
Muchun Song <muchun.song@linux.dev>, Tejun Heo <tj@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
core-services@vimeo.com, Jonathan Corbet <corbet@lwn.net>,
Michal Hocko <mhocko@kernel.org>,
Shakeel Butt <shakeel.butt@linux.dev>,
Shuah Khan <shuah@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
cgroups@vger.kernel.org, linux-doc@vger.kernel.org,
linux-mm@kvack.org, linux-kselftest@vger.kernel.org
Subject: Re: [PATCH 1/2] mm, memcg: cg2 memory{.swap,}.peak write handlers
Date: Wed, 24 Jul 2024 07:49:05 -0400 [thread overview]
Message-ID: <20240724114905.GB389003@cmpxchg.org> (raw)
In-Reply-To: <22a95c76-4e9e-482e-b95d-cb0f01971d98@redhat.com>
On Tue, Jul 23, 2024 at 09:55:19PM -0400, Waiman Long wrote:
> Could you use the "-v <n>" option of git-format-patch to add a version
> number to the patch title? Without that, it can be confusing as to
> whether the patch is new or a resend of the previous one.
+1
> > @@ -775,6 +775,11 @@ struct cgroup_subsys {
> >
> > extern struct percpu_rw_semaphore cgroup_threadgroup_rwsem;
> >
> > +struct cgroup_of_peak {
> > + long value;
> > + struct list_head list;
> > +};
> The name "cgroup_of_peak" is kind of confusing. Maybe local_peak?
It's the peak associated with an 'of' (which is a known concept in
cgroup code), and it pairs up nicely with of_peak(). I'd prefer
keeping that over local_peak.
> > @@ -26,6 +26,7 @@ struct page_counter {
> > atomic_long_t children_low_usage;
> >
> > unsigned long watermark;
> > + unsigned long local_watermark; /* track min of fd-local resets */
> track "min"? I thought it is used to track local maximum after a reset.
Yeah, the comment doesn't sound quite right.
However, I think we'd be hard-pressed to explain correctly and
comprehensively what this thing does in <40 characters.
I'd just remove the comment tbh.
> > @@ -78,7 +79,10 @@ int page_counter_memparse(const char *buf, const char *max,
> >
> > static inline void page_counter_reset_watermark(struct page_counter *counter)
> > {
> > - counter->watermark = page_counter_read(counter);
> > + unsigned long usage = page_counter_read(counter);
> > +
> > + counter->watermark = usage;
> > + counter->local_watermark = usage;
> > }
> >
>
> Could you set the local_watermark first before setting watermark? There
> is a very small time window that the invariant "local_watermark <=
> watermark" is not true.
Does it matter? Only cgroup1 supports global resets; only cgroup2
supports local peaks watching. This doesn't add anything to the race
that already exists between reset and global watermark update on cg1.
> > @@ -3950,12 +3955,90 @@ static u64 memory_current_read(struct cgroup_subsys_state *css,
> > return (u64)page_counter_read(&memcg->memory) * PAGE_SIZE;
> > }
> >
> > -static u64 memory_peak_read(struct cgroup_subsys_state *css,
> > - struct cftype *cft)
> > +static int peak_show(struct seq_file *sf, void *v, struct page_counter *pc)
> > {
> > - struct mem_cgroup *memcg = mem_cgroup_from_css(css);
> > + struct cgroup_of_peak *ofp = of_peak(sf->private);
> > + s64 fd_peak = ofp->value, peak;
> > +
> > + /* User wants global or local peak? */
> > + if (fd_peak == -1)
> > + peak = pc->watermark;
> > + else
> > + peak = max(fd_peak, (s64)pc->local_watermark);
> Should you save the local_watermark value into ofp->value if
> local_watermark is bigger? This will ensure that each successive read of
> the fd is monotonically increasing. Otherwise the value may go up or
> down if there are multiple resets in between.
The reset saves local_watermark into ofp->value if it's bigger..?
I do see another problem, though. The compiler might issue multiple
reads to ofp->value in arbitrary order. We could print max(-1, ...)
which is nonsense. Saving ofp->value into a local variable is the
right idea, but the compiler might still issue two reads anyway. It
needs a READ_ONCE() to force a single read.
I'd use unsigned long for fd_peak. This way the "specialness" is on
the -1UL comparison. The max() must be between two positive numbers,
so the (s64) there is confusing.
next prev parent reply other threads:[~2024-07-24 11:49 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-23 23:31 David Finkel
2024-07-23 23:31 ` [PATCH 1/2] " David Finkel
2024-07-24 1:55 ` Waiman Long
2024-07-24 11:49 ` Johannes Weiner [this message]
2024-07-24 16:11 ` David Finkel
2024-07-26 20:22 ` Tejun Heo
2024-07-29 13:37 ` David Finkel
2024-07-23 23:31 ` [PATCH 2/2] mm, memcg: cg2 memory{.swap,}.peak write tests David Finkel
-- strict thread matches above, loose matches on Subject: below --
2024-07-22 23:55 mm, memcg: cg2 memory{.swap,}.peak write handlers David Finkel
2024-07-22 23:55 ` [PATCH 1/2] " David Finkel
2024-07-23 14:29 ` Johannes Weiner
2024-07-23 19:45 ` David Finkel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240724114905.GB389003@cmpxchg.org \
--to=hannes@cmpxchg.org \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=core-services@vimeo.com \
--cc=davidf@vimeo.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kselftest@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=longman@redhat.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeel.butt@linux.dev \
--cc=shuah@kernel.org \
--cc=tj@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox