From: Yosry Ahmed <yosryahmed@google.com>
To: Michal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <muchun.song@linux.dev>,
Andrew Morton <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH v3] mm: memcg: use rstat for non-hierarchical stats
Date: Wed, 2 Aug 2023 15:02:55 -0700 [thread overview]
Message-ID: <CAJD7tkb17x=qwoO37uxyYXLEUVp15BQKR+Xfh7Sg9Hx-wTQ_=w@mail.gmail.com> (raw)
In-Reply-To: <CAJD7tkY4hTTCfqSGa_XexbH=WSTJ4WXWeMXSU+6KW8qfr7agfQ@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3388 bytes --]
On Wed, Aug 2, 2023 at 1:11 AM Yosry Ahmed <yosryahmed@google.com> wrote:
>
> On Wed, Aug 2, 2023 at 12:40 AM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Tue 01-08-23 10:29:39, Yosry Ahmed wrote:
> > > On Tue, Aug 1, 2023 at 9:39 AM Yosry Ahmed <yosryahmed@google.com> wrote:
> > [...]
> > > > > Have you measured any potential regression for cgroup v2 which collects
> > > > > all this data without ever using it (AFAICS)?
> > > >
> > > > I did not. I did not expect noticeable regressions given that all the
> > > > extra work is done during flushing, which should mostly be done by the
> > > > asynchronous worker, but can also happen in the stats reading context.
> > > > Let me run the same script on cgroup v2 just in case and report back.
> > >
> > > A few runs on mm-unstable with this patch:
> > >
> > > # time cat /sys/fs/cgroup/cg*/memory.stat > /dev/null
> >
> > Is this really representative test to make? I would have expected the
> > overhead would be mostly in mem_cgroup_css_rstat_flush (if it is visible
> > at all of course). This would be more likely visible in all cpus busy
> > situation (you can try heavy parallel kernel build from tmpfs for
> > example).
>
>
> I see. You are more worried about asynchronous flushing eating cpu
> time rather than the synchronous flushing being slower. In fact, my
> test is actually not representative at all because probably most of
> the cgroups either do not have updates or the asynchronous flusher got
> to them first.
>
> Let me try a workload that is more parallel & cpu intensive and report
> back. I am thinking of parallel reclaim/refault loops since both
> reclaim and refault paths invoke stat updates and stat flushing.
>
I am back with more data.
So I wrote a small reclaim/refault stress test that creates (NR_CPUS *
2) cgroups, assigns them limits, runs a worker process in each cgroup
that allocates tmpfs memory equal to quadruple the limit (to invoke
reclaim) continuously, and then reads back the entire file (to invoke
refaults). All workers are run in parallel, and zram is used as a
swapping backend. Both reclaim and refault have conditional stats
flushing. I ran this on a machine with 112 cpus, once on mm-unstable,
and once on mm-unstable with this patch reverted. The script is
attached.
(1) A few runs without this patch:
# time ./stress_reclaim_refault.sh
real 0m9.949s
user 0m0.496s
sys 14m44.974s
# time ./stress_reclaim_refault.sh
real 0m10.049s
user 0m0.486s
sys 14m55.791s
# time ./stress_reclaim_refault.sh
real 0m9.984s
user 0m0.481s
sys 14m53.841s
(2) A few runs with this patch:
# time ./stress_reclaim_refault.sh
real 0m9.885s
user 0m0.486s
sys 14m48.753s
# time ./stress_reclaim_refault.sh
real 0m9.903s
user 0m0.495s
sys 14m48.339s
# time ./stress_reclaim_refault.sh
real 0m9.861s
user 0m0.507s
sys 14m49.317s
I do not see any regressions from this patch. There is actually a very
slight improvement. If I have to guess, maybe it's because we avoid
the percpu loop in count_shadow_nodes() when calling
lruvec_page_state_local(), but I could not prove this using perf, it's
probably in the noise.
Let me know if the testing is satisfactory for you. I can send an
updated commit log accordingly with a summary of this conversation.
> > --
> > Michal Hocko
> > SUSE Labs
[-- Attachment #2: stress_reclaim_refault.sh --]
[-- Type: text/x-sh, Size: 1070 bytes --]
#!/bin/bash
NR_CPUS=$(getconf _NPROCESSORS_ONLN)
NR_CGROUPS=$(( NR_CPUS * 2 ))
TEST_MB=50
TOTAL_MB=$((TEST_MB * NR_CGROUPS))
TMPFS=$(mktemp -d)
ROOT="/sys/fs/cgroup/"
ZRAM_DEV="/mnt/devtmpfs/zram0"
cleanup() {
umount $TMPFS
rm -rf $TMPFS
for i in $(seq $NR_CGROUPS); do
cgroup="$ROOT/cg$i"
rmdir $cgroup
done
swapoff $ZRAM_DEV
echo 1 > "/sys/block/zram0/reset"
}
trap cleanup INT QUIT EXIT
# Setup zram
echo $((TOTAL_MB << 20)) > "/sys/block/zram0/disksize"
mkswap $ZRAM_DEV
swapon $ZRAM_DEV
echo "Setup zram done"
# Create cgroups, set limits
echo "+memory" > "$ROOT/cgroup.subtree_control"
for i in $(seq $NR_CGROUPS); do
cgroup="$ROOT/cg$i"
mkdir $cgroup
echo $(( (TEST_MB << 20) / 4)) > "$cgroup/memory.max"
done
echo "Setup cgroups done"
# Start workers to allocate tmpfs memory
mount -t tmpfs none $TMPFS
for i in $(seq $NR_CGROUPS); do
cgroup="$ROOT/cg$i"
f="$TMPFS/tmp$i"
(echo 0 > "$cgroup/cgroup.procs" &&
dd if=/dev/zero of=$f bs=1M count=$TEST_MB status=none &&
cat $f > /dev/null)&
done
# Wait for workers
wait
next prev parent reply other threads:[~2023-08-02 22:03 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-26 15:32 [PATCH] " Yosry Ahmed
2023-07-26 15:32 ` [PATCH v3] " Yosry Ahmed
2023-08-01 14:30 ` Michal Hocko
2023-08-01 16:39 ` Yosry Ahmed
2023-08-01 17:29 ` Yosry Ahmed
2023-08-02 7:40 ` Michal Hocko
2023-08-02 8:11 ` Yosry Ahmed
2023-08-02 22:02 ` Yosry Ahmed [this message]
2023-08-03 14:55 ` Michal Hocko
2023-08-03 18:52 ` Yosry Ahmed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJD7tkb17x=qwoO37uxyYXLEUVp15BQKR+Xfh7Sg9Hx-wTQ_=w@mail.gmail.com' \
--to=yosryahmed@google.com \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=shakeelb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox