From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1B8BEB64DB for ; Fri, 16 Jun 2023 08:37:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7267C6B0074; Fri, 16 Jun 2023 04:37:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D6406B0075; Fri, 16 Jun 2023 04:37:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 576FA8E0001; Fri, 16 Jun 2023 04:37:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 485516B0074 for ; Fri, 16 Jun 2023 04:37:19 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 1882880B82 for ; Fri, 16 Jun 2023 08:37:19 +0000 (UTC) X-FDA: 80907956598.26.29C9F5F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf10.hostedemail.com (Postfix) with ESMTP id C536BC0004 for ; Fri, 16 Jun 2023 08:37:16 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=foODECk1; spf=pass (imf10.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686904636; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Y8IfjRY0f8BdXm8pFP8WTJ/+9yrpb0bBo4i74RqlpNw=; b=MgZF5wgHJNVqRo5qE/zbu+CnVCP6IK0hX6EoBMfnaYhif/c2GOrg9I9QfMkjRd8erp4u4x GbtR2+R2l5SemLtigs9Iq0Df6q8dcu+wYsCu9eq2M+er0Onoe689d9BYE093ogmrOgzGYL IrsJhb1VCzO+qN2xozwo19huaFjLuRs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686904636; a=rsa-sha256; cv=none; b=KnQXHBV61FlffUzQsTQiQTcKYoyVUu27sEGh4TooE76dQsz8JvUao6yEVrmMvY1xEGKnnq 3h+b8+kwD3xVbe4rhyevR+MfjvCq2UrJsnrklVrattEll1Vzkge9eLSkiHMkgx64/Rr8b0 c+eqUTYhMfdWQr1abeItVBm8Wl0PMeU= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=foODECk1; spf=pass (imf10.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1686904636; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Y8IfjRY0f8BdXm8pFP8WTJ/+9yrpb0bBo4i74RqlpNw=; b=foODECk1hWfR/omjkY4E2XNLT9o10n5f9AbPy1OKtQfLYBc0Vbmidy+WvnlJuOczPk9VH7 700/S2iFHBWP4BhHVdEAH/fhqhtj5rLiLvusWlriytYlNCANS7yv+4Pxwg/FMoy7VqR4L/ 9yUVQK8eXEygDtR1NcJhjaySaCjRlh0= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-127-42P81d_pMVSpNFxzMc9J7Q-1; Fri, 16 Jun 2023 04:37:12 -0400 X-MC-Unique: 42P81d_pMVSpNFxzMc9J7Q-1 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-30fa3ea38bcso138237f8f.1 for ; Fri, 16 Jun 2023 01:37:12 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686904631; x=1689496631; h=content-transfer-encoding:in-reply-to:subject:organization:from :references:cc:to:content-language:user-agent:mime-version:date :message-id:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Y8IfjRY0f8BdXm8pFP8WTJ/+9yrpb0bBo4i74RqlpNw=; b=Bm2EVs6DwKP5SSpmcoFOMdG07rX0Ck+fap5bzbT0PlG9+ByidVcWzgSz1JNg46rbve hA8snlnXGtGev2w0Juj+GtVkcSN9om2rlF4uZhaqCUx9bxz5mhJX3/gsZOMQGimn4O98 wGGDWf6do30qTe9E9RK5HcJ/4Z2pWjymJN6HZ3GBKTKNRVO0GZTJG7+AHh/7sYochVYl PybZZMmr9KhTiZrcFn2i7BFjSeXbt8Ek7vfbSmh73lPktYKiP/2Iq0a0m6Gbgy2JgdsD +u39f7OQHIyzozVV1kbrZX+JswMTwzeFRGHzOc+YXTO9ZIY6O/RAKJn7hIrlo/3XQsz+ laaw== X-Gm-Message-State: AC+VfDwwxWL2JDRbuhqJwZNVV0brHY3TJ8W3YG9v+mAgyWwciLCIOnPS 8YmReAyypq7TYgTMf9rdnBDV2/xqWrlF9S99jSd/0UZTlCUwKCUqpCMKdZ6HMysFQMPfAQ8v3cT sOzYmFPiNUdo= X-Received: by 2002:adf:ded1:0:b0:30f:c54a:711c with SMTP id i17-20020adfded1000000b0030fc54a711cmr609147wrn.15.1686904631533; Fri, 16 Jun 2023 01:37:11 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ7YkCdEf3nZqMA+yMqs/owMpyTWMc8hEG+1DxNTU/2scTVebxwZFD7BuiM4IDJ7c75YdrqRQw== X-Received: by 2002:adf:ded1:0:b0:30f:c54a:711c with SMTP id i17-20020adfded1000000b0030fc54a711cmr609132wrn.15.1686904631095; Fri, 16 Jun 2023 01:37:11 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:9800:59ba:1006:9052:fb40? (p200300cbc707980059ba10069052fb40.dip0.t-ipconnect.de. [2003:cb:c707:9800:59ba:1006:9052:fb40]) by smtp.gmail.com with ESMTPSA id h14-20020adff4ce000000b0030c40e2cf42sm22876104wrp.116.2023.06.16.01.37.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 16 Jun 2023 01:37:10 -0700 (PDT) Message-ID: Date: Fri, 16 Jun 2023 10:37:09 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 To: Yosry Ahmed Cc: =?UTF-8?B?6LS65Lit5Z2k?= , Yu Zhao , minchan@kernel.org, senozhatsky@chromium.org, mhocko@suse.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrea Arcangeli , Fabian Deutsch References: <20230615034830.1361853-1-hezhongkun.hzk@bytedance.com> <576b7ba6-4dcd-48c9-3917-4e2a25aaa823@redhat.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [External] Re: [RFC PATCH 1/3] zram: charge the compressed RAM to the page's memcgroup In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: z95456ujixfak81h1djdyyai9oa8f689 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: C536BC0004 X-Rspam-User: X-HE-Tag: 1686904636-387062 X-HE-Meta: U2FsdGVkX1/kv6NrCqc2DyjXNwXAymmyuniz3a8hHtsMSKE42QfbGaZC6LGg7QkC3NxB7Lc5kR4m8ZoWEIX0lvLUV7n7TKQXXNr4Pq3yCOucMBt3TLU/RK0y3Nkvn2iv9II8ThoviB0Q1b2Wn4HrKk/fdL9+mX9eaZN0SCDp2L/x+UBGYDb69Ul9tIS8CYsu5dN5CaIVrY/+0QOMwM+34tVma2nSUP+J5bbaS0cBqzjcFoebZ/UXPruq0+BfCpqG4LYO+A4/l5MIPASDpQ91YiahiTM02/atFhQcN4mb7zTov7kTDSFGU0owukGMbobwrSNQBD1AGUzE3XkAoTKCodpBbNGU/19xv1G/KT9ym3PFvpUKuTtFep84wTDuR2rcvhD3pv387xQvTCtpNeRCTclr1r55q3aYpE2EsRJ9n8rRhrGQpwLcoELuCAfFQLuEHmBUyvQG3/U7fh2z3SwFd7kLZIaSrSvbYn3aSzWOQGprR3VchUa+y7Or6pquBMM6inyPm63b5bd/zSmTz+2ZJwh3NHilGlBA18YmtAYxlooU7Qwkq17x+yrnrMs+94nshXwZvw+EhiOCKZpzQa9/jwUkQH0pKDgjE6WScOw92EKZQ/PpbEKgflLVOmGQJetgduEsKhlfRHQWdU58bURgabYRm75gcfy3FuacSCEfz5xyMo5HhBqN6ZfDM+xShtECt1SoVwajxHT+oZR4vfIml7m8ufFOieqqDmAozufyEB9Ih2y3PqgoxlLivjxNYC6eqYoJv//eGAqteGlSzU51Ck7PJrOZUQkFgKmzqwKPTSCZRJVJsgxKz209RsSJ7ba9NasTsmJpaV1ZzgZvt4PlIKWVu6xaa9BLcDssIsdU8LB7IcB6q63L1IhOkZGesdDA4OSpuBFh6L8Y/SkAQYisjXBp7s9tYRnaM8bKMMqtBGsf6Z5lbY/lGe+Doeeb9j2N7SD5ciNcziNXWNGc1bf z0TcCbGs WIR123pOtRCDsXdafDPqEyEL8FpojVC/RdR2qs8KDB0baJFtq4kApmslVJv6n/FK5HG6/tYQ6A6COzZ1aHcAKVsrm0vkMJQTmYpp+gEPY4uZc5WygUw9BY0eYJARRWOxMLfvkhDfnO8lDifHf4N5mohtsBe52aOxfOBtzKGejDfr4xfYxE8i81RoVnCEm0fw6YuK9G7VUX/7XcL7MwrBt9+x7NpGHw9pI6TRnVnaFSuBA8xAokiHpHBHGg1eXv8hkT4SQspOLYCN0DA0tWei/an/RZq2BVYJRTg2j6nzS1asQBEnuFGzqqz97I6pHnd5imSFlX0ainGycVgRcXqfXlzKB1seVCOz7Xp5iNs2LfhIiLlg68ygAkFJHvteCt6Nh140uKAgpluOl/nlXyZXLJMpSTgcd+4ToHZdmDtZGJpTc1HTFNDPrhyhYEL6uG+PYV+KBPmkGwAkdw63ksLbr4D1uWA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 16.06.23 10:04, Yosry Ahmed wrote: > On Fri, Jun 16, 2023 at 12:57 AM David Hildenbrand wrote: >> >> On 16.06.23 09:37, Yosry Ahmed wrote: >>> On Thu, Jun 15, 2023 at 9:41 PM 贺中坤 wrote: >>>> >>>>> Thanks Fabian for tagging me. >>>>> >>>>> I am not familiar with #1, so I will speak to #2. Zhongkun, There are >>>>> a few parts that I do not understand -- hopefully you can help me out >>>>> here: >>>>> >>>>> (1) If I understand correctly in this patch we set the active memcg >>>>> trying to charge any pages allocated in a zspage to the current memcg, >>>>> yet that zspage will contain multiple compressed object slots, not >>>>> just the one used by this memcg. Aren't we overcharging the memcg? >>>>> Basically the first memcg that happens to allocate the zspage will pay >>>>> for all the objects in this zspage, even after it stops using the >>>>> zspage completely? >>>> >>>> It will not overcharge. As you said below, we are not using >>>> __GFP_ACCOUNT and charging the compressed slots to the memcgs. >>>> >>>>> >>>>> (2) Patch 3 seems to be charging the compressed slots to the memcgs, >>>>> yet this patch is trying to charge the entire zspage. Aren't we double >>>>> charging the zspage? I am guessing this isn't happening because (as >>>>> Michal pointed out) we are not using __GFP_ACCOUNT here anyway, so >>>>> this patch may be NOP, and the actual charging is coming from patch 3 >>>>> only. >>>> >>>> YES, the actual charging is coming from patch 3. This patch just >>>> delivers the BIO page's memcg to the current task which is not the >>>> consumer. >>>> >>>>> >>>>> (3) Zswap recently implemented per-memcg charging of compressed >>>>> objects in a much simpler way. If your main interest is #2 (which is >>>>> what I understand from the commit log), it seems like zswap might be >>>>> providing this already? Why can't you use zswap? Is it the fact that >>>>> zswap requires a backing swapfile? >>>> >>>> Thanks for your reply and review. Yes, the zswap requires a backing >>>> swapfile. The I/O path is very complex, sometimes it will throttle the >>>> whole system if some resources are short , so we hope to use zram. >>> >>> Is the only problem with zswap for you the requirement of a backing swapfile? >>> >>> If yes, I am in the early stages of developing a solution to make >>> zswap work without a backing swapfile. This was discussed in LSF/MM >>> [1]. Would this make zswap usable in for your use case? >> >> Out of curiosity, are there any other known pros/cons when using >> zswap-without-swap instead of zram? >> >> I know that zram requires sizing (size of the virtual block device) and >> consumes metadata, zswap doesn't. > > We don't use zram in our data centers so I am not an expert about > zram, but off the top of my head there are a few more advantages to > zswap: Thanks! > (1) Better memcg support (which this series is attempting to address > in zram, although in a much more complicated way). Right. I think this patch also misses to update apply the charging in the recompress case. (only triggered by user space IIUC) > > (2) We internally have incompressible memory handling on top of zswap, > which is something that we would like to upstream when > zswap-without-swap is supported. Basically if a page does not compress > well enough to save memory we reject it from zswap and make it > unevictable (if there is no backing swapfile). The existence of zswap > in the MM layer helps with this. Since zram is a block device from the > MM perspective, it's more difficult to do something like this. > Incompressible pages just sit in zram AFAICT. I see. With ZRAM_HUGE we still have to store the uncompressed page (because, it's a block device and has to hold that data). > > (3) Writeback support. If you're running out of memory to store > compressed pages you can add a swapfile in runtime and zswap will > start writing to it freeing up space to compress more pages. This > wouldn't be possible in the same way in zram. Zram supports writing to > a backing device but in a more manual way (userspace has to write to > an interface to tell zram to write some pages). Right, that zram backing device stuff is really sub-optimal and only useful in corner cases (most probably not datacenters). What one can do with zram is to add a second swap device with lower priority. Looking at my Fedora machine: $ cat /proc/swaps Filename Type Size Used Priority /dev/dm-2 partition 16588796 0 -2 /dev/zram0 partition 8388604 0 100 Guess the difference here is that you won't be writing out the compressed data to the disk, but anything the gets swapped out afterwards will end up on the disk. I can see how the zswap behavior might be better in that case (instead of swapping out some additional pages you relocate the already-swapped-out-to-zswap pages to the disk). -- Cheers, David / dhildenb