From: Yosry Ahmed <yosryahmed@google.com>
To: David Hildenbrand <david@redhat.com>
Cc: 贺中坤 <hezhongkun.hzk@bytedance.com>, "Yu Zhao" <yuzhao@google.com>,
minchan@kernel.org, senozhatsky@chromium.org, mhocko@suse.com,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Fabian Deutsch" <fdeutsch@redhat.com>
Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup
Date: Fri, 16 Jun 2023 01:39:53 -0700 [thread overview]
Message-ID: <CAJD7tkY7CMLFS7Kv-DYnPwO9cGVWZmQZzkgOfVJMkH2pO8Kt9Q@mail.gmail.com> (raw)
In-Reply-To: <dede2f5b-2ae5-6fa3-c0d5-3ce7fba11694@redhat.com>
On Fri, Jun 16, 2023 at 1:37 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 16.06.23 10:04, Yosry Ahmed wrote:
> > On Fri, Jun 16, 2023 at 12:57 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 16.06.23 09:37, Yosry Ahmed wrote:
> >>> On Thu, Jun 15, 2023 at 9:41 PM 贺中坤 <hezhongkun.hzk@bytedance.com> wrote:
> >>>>
> >>>>> Thanks Fabian for tagging me.
> >>>>>
> >>>>> I am not familiar with #1, so I will speak to #2. Zhongkun, There are
> >>>>> a few parts that I do not understand -- hopefully you can help me out
> >>>>> here:
> >>>>>
> >>>>> (1) If I understand correctly in this patch we set the active memcg
> >>>>> trying to charge any pages allocated in a zspage to the current memcg,
> >>>>> yet that zspage will contain multiple compressed object slots, not
> >>>>> just the one used by this memcg. Aren't we overcharging the memcg?
> >>>>> Basically the first memcg that happens to allocate the zspage will pay
> >>>>> for all the objects in this zspage, even after it stops using the
> >>>>> zspage completely?
> >>>>
> >>>> It will not overcharge. As you said below, we are not using
> >>>> __GFP_ACCOUNT and charging the compressed slots to the memcgs.
> >>>>
> >>>>>
> >>>>> (2) Patch 3 seems to be charging the compressed slots to the memcgs,
> >>>>> yet this patch is trying to charge the entire zspage. Aren't we double
> >>>>> charging the zspage? I am guessing this isn't happening because (as
> >>>>> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so
> >>>>> this patch may be NOP, and the actual charging is coming from patch 3
> >>>>> only.
> >>>>
> >>>> YES, the actual charging is coming from patch 3. This patch just
> >>>> delivers the BIO page's memcg to the current task which is not the
> >>>> consumer.
> >>>>
> >>>>>
> >>>>> (3) Zswap recently implemented per-memcg charging of compressed
> >>>>> objects in a much simpler way. If your main interest is #2 (which is
> >>>>> what I understand from the commit log), it seems like zswap might be
> >>>>> providing this already? Why can't you use zswap? Is it the fact that
> >>>>> zswap requires a backing swapfile?
> >>>>
> >>>> Thanks for your reply and review. Yes, the zswap requires a backing
> >>>> swapfile. The I/O path is very complex, sometimes it will throttle the
> >>>> whole system if some resources are short , so we hope to use zram.
> >>>
> >>> Is the only problem with zswap for you the requirement of a backing swapfile?
> >>>
> >>> If yes, I am in the early stages of developing a solution to make
> >>> zswap work without a backing swapfile. This was discussed in LSF/MM
> >>> [1]. Would this make zswap usable in for your use case?
> >>
> >> Out of curiosity, are there any other known pros/cons when using
> >> zswap-without-swap instead of zram?
> >>
> >> I know that zram requires sizing (size of the virtual block device) and
> >> consumes metadata, zswap doesn't.
> >
> > We don't use zram in our data centers so I am not an expert about
> > zram, but off the top of my head there are a few more advantages to
> > zswap:
>
> Thanks!
>
> > (1) Better memcg support (which this series is attempting to address
> > in zram, although in a much more complicated way).
>
> Right. I think this patch also misses to update apply the charging in the recompress
> case. (only triggered by user space IIUC)
>
> >
> > (2) We internally have incompressible memory handling on top of zswap,
> > which is something that we would like to upstream when
> > zswap-without-swap is supported. Basically if a page does not compress
> > well enough to save memory we reject it from zswap and make it
> > unevictable (if there is no backing swapfile). The existence of zswap
> > in the MM layer helps with this. Since zram is a block device from the
> > MM perspective, it's more difficult to do something like this.
> > Incompressible pages just sit in zram AFAICT.
>
> I see. With ZRAM_HUGE we still have to store the uncompressed page
> (because, it's a block device and has to hold that data).
Right.
>
> >
> > (3) Writeback support. If you're running out of memory to store
> > compressed pages you can add a swapfile in runtime and zswap will
> > start writing to it freeing up space to compress more pages. This
> > wouldn't be possible in the same way in zram. Zram supports writing to
> > a backing device but in a more manual way (userspace has to write to
> > an interface to tell zram to write some pages).
>
> Right, that zram backing device stuff is really sub-optimal and only useful
> in corner cases (most probably not datacenters).
>
> What one can do with zram is to add a second swap device with lower priority.
> Looking at my Fedora machine:
>
> $ cat /proc/swaps
> Filename Type Size Used Priority
> /dev/dm-2 partition 16588796 0 -2
> /dev/zram0 partition 8388604 0 100
>
>
> Guess the difference here is that you won't be writing out the compressed
> data to the disk, but anything the gets swapped out afterwards will
> end up on the disk. I can see how the zswap behavior might be better in that case
> (instead of swapping out some additional pages you relocate the
> already-swapped-out-to-zswap pages to the disk).
Yeah I am hoping we can enable the use of zswap without a backing
swapfile, and I keep seeing use cases that would benefit from that.
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2023-06-16 8:40 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-15 3:48 Zhongkun He
2023-06-15 4:59 ` Yu Zhao
2023-06-15 8:57 ` Fabian Deutsch
2023-06-15 10:00 ` [External] " 贺中坤
2023-06-15 12:14 ` Fabian Deutsch
2023-06-16 1:39 ` Yosry Ahmed
2023-06-16 4:40 ` [External] " 贺中坤
2023-06-16 7:37 ` Yosry Ahmed
2023-06-16 7:57 ` David Hildenbrand
2023-06-16 8:04 ` Yosry Ahmed
2023-06-16 8:37 ` David Hildenbrand
2023-06-16 8:39 ` Yosry Ahmed [this message]
2023-06-15 9:32 ` Fabian Deutsch
2023-06-15 9:41 ` [External] " 贺中坤
2023-06-15 9:27 ` David Hildenbrand
2023-06-15 11:15 ` [External] " 贺中坤
2023-06-15 11:19 ` David Hildenbrand
2023-06-15 12:19 ` 贺中坤
2023-06-15 12:56 ` David Hildenbrand
2023-06-15 13:40 ` 贺中坤
2023-06-15 14:46 ` David Hildenbrand
2023-06-16 3:44 ` 贺中坤
2023-06-15 9:35 ` Michal Hocko
2023-06-15 11:58 ` [External] " 贺中坤
2023-06-15 12:16 ` Michal Hocko
2023-06-15 13:09 ` 贺中坤
2023-06-15 13:27 ` Michal Hocko
2023-06-15 14:13 ` 贺中坤
2023-06-15 14:20 ` Michal Hocko
2023-06-16 3:31 ` 贺中坤
2023-06-16 6:40 ` Michal Hocko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJD7tkY7CMLFS7Kv-DYnPwO9cGVWZmQZzkgOfVJMkH2pO8Kt9Q@mail.gmail.com \
--to=yosryahmed@google.com \
--cc=aarcange@redhat.com \
--cc=david@redhat.com \
--cc=fdeutsch@redhat.com \
--cc=hezhongkun.hzk@bytedance.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=senozhatsky@chromium.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox