From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B8243EB64D7 for ; Fri, 16 Jun 2023 08:40:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B5776B0075; Fri, 16 Jun 2023 04:40:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43CD36B0078; Fri, 16 Jun 2023 04:40:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 304AE8E0001; Fri, 16 Jun 2023 04:40:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 209606B0075 for ; Fri, 16 Jun 2023 04:40:34 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E26D740B9B for ; Fri, 16 Jun 2023 08:40:33 +0000 (UTC) X-FDA: 80907964746.20.B3F9B4F Received: from mail-ej1-f48.google.com (mail-ej1-f48.google.com [209.85.218.48]) by imf07.hostedemail.com (Postfix) with ESMTP id 17AA840009 for ; Fri, 16 Jun 2023 08:40:31 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=5Y1FW0H3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686904832; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6hUmPtWFh5/EWum1zehJNPIUaxqrN5E+ZkPwZ+4G0HY=; b=8cd76bVgJxyfnap2uiv0qxM7vx8HdF2zXAKE3Tlnm5kRP1jHJ63pPIzz6q5yLipqCz6hiM rOum2aNnRDZkwLoqOov8mTcQGc/qcVR6kwDgT/GXr5P9hAglorD17Ofhd/UYyjogveOl0O xXJtjhNQQzm1iv0Q/Vu9EG983eQIJwE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=5Y1FW0H3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.48 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686904832; a=rsa-sha256; cv=none; b=RNfZaDw1ffowFzY2/FVHXHPHZe0B0jLwkY/7zRiqUasWZp6dslVLulawy6Fc4UtvEiK330 X0KA6s53Dyc+7oHSAwgNPaznJibWpGjdNOGdVt1f9jk1jHJO3RPb9CcdvAfEmHiONrnTu8 y/lv2GJn9zXsqIBUb7bEayQ3J0g/cSQ= Received: by mail-ej1-f48.google.com with SMTP id a640c23a62f3a-970028cfb6cso64158366b.1 for ; Fri, 16 Jun 2023 01:40:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686904830; x=1689496830; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6hUmPtWFh5/EWum1zehJNPIUaxqrN5E+ZkPwZ+4G0HY=; b=5Y1FW0H3JrfD9KIMsSvaqegsfT9Lim5Lc3nlqwXDCgMSk9svoe9Q1G9YkYCsR1HilR n+I6XjUPAZuUyEk50NIdkPrUZ4v4fyI1lafXc+lL4LzvxG6rbkRaXJ6umA6mmpq4eRTm tyVR+YtTZj/CAsMLUTL+xHMrdTRCjEKWVEhypARIarat710jwIsDM4vEk/U/8U/vAABf qMVKW9VGCT/mflUwFa7c++19hQ+q33hkQyvykhUsdFPoN9kDCne8DEtJy5GvyKCb+IeQ whYPKLSpFCR4Az1PzxZQRpjuUjct1DBQYVwfkUjN56izsf6pkCk8tQVsTk1dpdYiIaMo 0QHQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686904830; x=1689496830; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6hUmPtWFh5/EWum1zehJNPIUaxqrN5E+ZkPwZ+4G0HY=; b=Lyg+W041/ej+ECsWo3aLBJRKY2W4ooz2RsJ31oEiHxLSPXtkqoaJGnRSRk/6yo4jkJ 0Zz3opqJnfIbgAtKtQZyzDUz3TjU8VNDW7ffMBnYuhZ18RMrulkGDzxN6kNjjpzUGBE3 Wzl4x0GC5R+fK73LF9DBFSTfB1qyUAUgpTzb6QHSGu1+lliQSAIeSPC8/L65SIerXdIi iDRK1sx+Or+UuBsuGdKRM6GS4+6KiYe8mYMM3rH+DfwKLokEjEh3XD9DrekbTUlFlVQ9 sJFryhm3JpBsH7f1hK3Uc6P4NDf2afE7RSDa3suM35ImqTc0y0ovqTfk+Bxb4P71I+hZ yZGg== X-Gm-Message-State: AC+VfDx7jOBpb+kgC2jAuBbrbrqC7AUYdoVQdQPttYJY02vFfbB14/RK 8apRlA9BHXjntI8Pg7b5y5ODcVDdcr+REAxoJHzBKA== X-Google-Smtp-Source: ACHHUZ7LpQYRuPKU0I5fZ/djL4rUcBBH38tpNgQsSNflvXkvfezNU2XZOyksY2Swnn4B1pUUGZenJychifsP2nmMw7s= X-Received: by 2002:a17:907:80c:b0:974:1d8b:ca5f with SMTP id wv12-20020a170907080c00b009741d8bca5fmr1184408ejb.9.1686904830264; Fri, 16 Jun 2023 01:40:30 -0700 (PDT) MIME-Version: 1.0 References: <20230615034830.1361853-1-hezhongkun.hzk@bytedance.com> <576b7ba6-4dcd-48c9-3917-4e2a25aaa823@redhat.com> In-Reply-To: From: Yosry Ahmed Date: Fri, 16 Jun 2023 01:39:53 -0700 Message-ID: Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup To: David Hildenbrand Cc: =?UTF-8?B?6LS65Lit5Z2k?= , Yu Zhao , minchan@kernel.org, senozhatsky@chromium.org, mhocko@suse.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , Fabian Deutsch Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 17AA840009 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: 5im9h7hsn1bqx6nkhhbe81krzs6gqy11 X-HE-Tag: 1686904831-878938 X-HE-Meta: U2FsdGVkX1/ofIn6MFPaNcQOiClbP8jhf4NVQ/6PDAIl3FHuyttbvBBxckcxNOUp/c2bQUe+AT9b+VZhRsKNmNKOTAND4woPH0e0Bw11PJUr5JSfw2mFrbpGKDSKqlTt/E1fATatHzNJZ2rhz77LPi8jLYPRTJhbX/QZyUGO82gAfdj2mxX6u7xHEgESB+d/ygE5vaNmWqW9MAFpny/Wl37hZ7NWXlb1kRkaxxgQZ3/LTmyU/m0II7xWw8DIxdbjjiN24MdrIfD839VQM4ZT3DRc74R5VIw5sXj7Cg2bdYky4gsgG+u3aOpYLqRgB9pTNeR8CEUExTJ7NvWNubw3OIQRVaPVta6g50tRXiYCvI+OZHMT3ZhRefXH9kvEhHs3K+GbiDuJ7l5VDyeTbrrSJ/ITW3/TJw6gzxXQqFNtA5h4Vsnnas294qa2yPz2f17r9l9w6uZIxCiJ+cDGYBdRbjqA1yiOWLzx6yB5Mx5sfqfXpcGqdeiJCQylxu/sbKgm6scMqX+VRnfYUWA6c581acpzuU7l0PtGv3R+4WbD5WiHWTvycv4XGG6hyDTyoXUE1Z388cfMBzZfixQXuWxB85Y+XR/w9KB7i4bYZHm/h31l2I/N+b5KXA1nPFWA6GyTshKr+uSe2VF1vmnp++bFkpkViqdTpJZJtFjbVZfTZGREnPGGPdwb7PWljrTiF7zp+xsvz5iRazDI78+wFcNUE5xwCrH5g5hkcH6/B770WGnImfWhwNZjGtt3tGtvJOIPDiKxt+/3FyjKfFoMGyfNEXX5yx/ig4R9C9nvtVxOsFfduc/dxDDOnr4aaceASKd/kcWc7OSsT7QhONnYk8NUUgp3J5C8NFoMbgb11Ul+5OfDfTeTKpApSJIV+Br8UrJK1RAY3AfqE4uccUMMeoBfwoE4q0K8P3vSLkqhAKrk5Bk2lDYDciDcCjxSV+PVHQIavkAK7lXLNHl6ydE9DeS dst6+JZw zX2T2lIQgROKT3x/zWTXw0jpdMsB1O//B3u3bOaDY7AbJJER41bnXE0xC6HQHSEfD4ty+dHNi4ljeqYitCucaKyHynM8ttf63lraWIB9WqWA2z9Kjv/d0aIerFk7wazABxuP/5q6evDlfg+fG4YGv1bnucBn46/xy7ftbsFaCCZBG6kV3/m19vb9EnhaZgQvd/067iE31dl3Tzr0qBIvN6KU3jUuN4V+zXcSJ84g5i9OuEwxQXKCa2c4xgGV8J8VT4dwvzQdlpjLSH7KWakA7lruNOj1dC5FYRzzyqVU/eqAd5pFsQvdbtchLzA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Jun 16, 2023 at 1:37=E2=80=AFAM David Hildenbrand wrote: > > On 16.06.23 10:04, Yosry Ahmed wrote: > > On Fri, Jun 16, 2023 at 12:57=E2=80=AFAM David Hildenbrand wrote: > >> > >> On 16.06.23 09:37, Yosry Ahmed wrote: > >>> On Thu, Jun 15, 2023 at 9:41=E2=80=AFPM =E8=B4=BA=E4=B8=AD=E5=9D=A4 <= hezhongkun.hzk@bytedance.com> wrote: > >>>> > >>>>> Thanks Fabian for tagging me. > >>>>> > >>>>> I am not familiar with #1, so I will speak to #2. Zhongkun, There a= re > >>>>> a few parts that I do not understand -- hopefully you can help me o= ut > >>>>> here: > >>>>> > >>>>> (1) If I understand correctly in this patch we set the active memcg > >>>>> trying to charge any pages allocated in a zspage to the current mem= cg, > >>>>> yet that zspage will contain multiple compressed object slots, not > >>>>> just the one used by this memcg. Aren't we overcharging the memcg? > >>>>> Basically the first memcg that happens to allocate the zspage will = pay > >>>>> for all the objects in this zspage, even after it stops using the > >>>>> zspage completely? > >>>> > >>>> It will not overcharge. As you said below, we are not using > >>>> __GFP_ACCOUNT and charging the compressed slots to the memcgs. > >>>> > >>>>> > >>>>> (2) Patch 3 seems to be charging the compressed slots to the memcgs= , > >>>>> yet this patch is trying to charge the entire zspage. Aren't we dou= ble > >>>>> charging the zspage? I am guessing this isn't happening because (as > >>>>> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so > >>>>> this patch may be NOP, and the actual charging is coming from patch= 3 > >>>>> only. > >>>> > >>>> YES=EF=BC=8C the actual charging is coming from patch 3. This patch = just > >>>> delivers the BIO page's memcg to the current task which is not the > >>>> consumer. > >>>> > >>>>> > >>>>> (3) Zswap recently implemented per-memcg charging of compressed > >>>>> objects in a much simpler way. If your main interest is #2 (which i= s > >>>>> what I understand from the commit log), it seems like zswap might b= e > >>>>> providing this already? Why can't you use zswap? Is it the fact tha= t > >>>>> zswap requires a backing swapfile? > >>>> > >>>> Thanks for your reply and review. Yes, the zswap requires a backing > >>>> swapfile. The I/O path is very complex, sometimes it will throttle t= he > >>>> whole system if some resources are short , so we hope to use zram. > >>> > >>> Is the only problem with zswap for you the requirement of a backing s= wapfile? > >>> > >>> If yes, I am in the early stages of developing a solution to make > >>> zswap work without a backing swapfile. This was discussed in LSF/MM > >>> [1]. Would this make zswap usable in for your use case? > >> > >> Out of curiosity, are there any other known pros/cons when using > >> zswap-without-swap instead of zram? > >> > >> I know that zram requires sizing (size of the virtual block device) an= d > >> consumes metadata, zswap doesn't. > > > > We don't use zram in our data centers so I am not an expert about > > zram, but off the top of my head there are a few more advantages to > > zswap: > > Thanks! > > > (1) Better memcg support (which this series is attempting to address > > in zram, although in a much more complicated way). > > Right. I think this patch also misses to update apply the charging in the= recompress > case. (only triggered by user space IIUC) > > > > > (2) We internally have incompressible memory handling on top of zswap, > > which is something that we would like to upstream when > > zswap-without-swap is supported. Basically if a page does not compress > > well enough to save memory we reject it from zswap and make it > > unevictable (if there is no backing swapfile). The existence of zswap > > in the MM layer helps with this. Since zram is a block device from the > > MM perspective, it's more difficult to do something like this. > > Incompressible pages just sit in zram AFAICT. > > I see. With ZRAM_HUGE we still have to store the uncompressed page > (because, it's a block device and has to hold that data). Right. > > > > > (3) Writeback support. If you're running out of memory to store > > compressed pages you can add a swapfile in runtime and zswap will > > start writing to it freeing up space to compress more pages. This > > wouldn't be possible in the same way in zram. Zram supports writing to > > a backing device but in a more manual way (userspace has to write to > > an interface to tell zram to write some pages). > > Right, that zram backing device stuff is really sub-optimal and only usef= ul > in corner cases (most probably not datacenters). > > What one can do with zram is to add a second swap device with lower prior= ity. > Looking at my Fedora machine: > > $ cat /proc/swaps > Filename Type Size U= sed Priority > /dev/dm-2 partition 16588796 0= -2 > /dev/zram0 partition 8388604 0= 100 > > > Guess the difference here is that you won't be writing out the compressed > data to the disk, but anything the gets swapped out afterwards will > end up on the disk. I can see how the zswap behavior might be better in t= hat case > (instead of swapping out some additional pages you relocate the > already-swapped-out-to-zswap pages to the disk). Yeah I am hoping we can enable the use of zswap without a backing swapfile, and I keep seeing use cases that would benefit from that. > > -- > Cheers, > > David / dhildenb >