From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A180BC3DA6E for ; Wed, 20 Dec 2023 10:21:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3CE5E8D0006; Wed, 20 Dec 2023 05:21:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 37F198D0001; Wed, 20 Dec 2023 05:21:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 245A88D0006; Wed, 20 Dec 2023 05:21:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0F4ED8D0001 for ; Wed, 20 Dec 2023 05:21:50 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id C70DC120AE4 for ; Wed, 20 Dec 2023 10:21:49 +0000 (UTC) X-FDA: 81586805538.15.DED9A2C Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf30.hostedemail.com (Postfix) with ESMTP id D7BC98001B for ; Wed, 20 Dec 2023 10:21:47 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HCQwZT+g; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703067708; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=DjP4BivF3NvMA8KjGdZtlr0xWlFt2nGJ42O2nCWwR7A=; b=O7NojGXCkRmR456DjxnFeOTATAeGXI7KiE4H6PeO3Fr2BuIMAj+EwqwOv5L4sUXY6SWrl5 aDK+iRIxnFnuFbqi0gueg8XIlxmKZXvodk9RToYdYDres+AO5shzezdVwBSNc/R2LHxfWF EiZdYvJlb5IIFcstLVMR8b7I6tjLH4Q= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=HCQwZT+g; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703067708; a=rsa-sha256; cv=none; b=l//tqtz3IpxoUg0CdkzXuIXlVMvtrymq1tg8rhnlAg5VPkIcIrfdByCr2mVgK9ULK5jo94 yu0C9eqxOFkILDY+rSNTxvxonc2IkqCNgRxCeQqAk5WLJQzyOPgHHHWA0NHT0uGPNuS1Hw jzDudDlXjYxsAeLLA2vMM1hNupz3Hw4= Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-2cc7ba7d12eso22968911fa.3 for ; Wed, 20 Dec 2023 02:21:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1703067706; x=1703672506; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=DjP4BivF3NvMA8KjGdZtlr0xWlFt2nGJ42O2nCWwR7A=; b=HCQwZT+gCjk9OVHokljuSbsW0J7SoVwiTV9I88jQidgtaJYz5YQQD211jgh8sdT2r9 pkG1btkEmaXCoPECjlcv+pJyur46wE0u5aL3mutCU5oENo602hjRxB5IKxYxRiYmYjry X1LizzN8yMDeXdnpOvYlOukLD9qBNkg4+QR040n3BXv4TU2jKwZH8vK8jTqAhytuUYxh sYsOe+Fu28MFUC94MthL7Lp8ljsEveOl74oAdYN/f/uXmsUGqlZD6JqVIYy0sej4yPqH dzbRAbZaTZ2c4Ftl6Ov3EI8L06AyYjkkXYsP7pOWSf55KEBwkG53DBZ3w/aSk/U+W0rj wwqA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1703067706; x=1703672506; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=DjP4BivF3NvMA8KjGdZtlr0xWlFt2nGJ42O2nCWwR7A=; b=bTWSYJevODRYSUGlGE2kctlkRa5qeTM5YIomNCEKsq7oNTIYntEUTH0nrk96cQzTz3 KBx/su2jGp8/lCthmXJpnYHQiVooUksda4mCeURxZVeohL/wxONTHnY0hY1JRrITLn8b e9NB15FsP3sqjpFOz9snmOMbHf6FEmpZ8PdAeThtqjYHgxOjRQLQmHQr2wISJafMiBdu znBfIKJfGfzi63xtySBC+WJ+4QGO0yLj+6krosYZH6UhgLHc/jPzb9ocp0HhlFx3HTSw Yvoohx/Po4zNt28uciXsuHUoqI8hflthz2O46w0o2JdJnijjcZddRbUfmzDY+QsoHK8c gXzQ== X-Gm-Message-State: AOJu0Yxy+goWHGpdw0hVK9Q29ld7V4CAd+LC8TVSJBCJcpbJmgt9kObp lY9BwmOBHGjSkCjDmySwajFYDgTXLjgltwMFIW8= X-Google-Smtp-Source: AGHT+IHAeWZiDakq7KAXRyOAa2WwdZ/7tMmIYpc1yJtLL44QdSxo63hpyp7uSyhltJOH/voGvf6b3wES+bo5VcSeCBk= X-Received: by 2002:a2e:9ec4:0:b0:2ca:1a1:c286 with SMTP id h4-20020a2e9ec4000000b002ca01a1c286mr6138374ljk.74.1703067705684; Wed, 20 Dec 2023 02:21:45 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> In-Reply-To: From: Kairui Song Date: Wed, 20 Dec 2023 18:21:28 +0800 Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Chris Li Cc: Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Minchan Kim , Zhongkun He Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: D7BC98001B X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: q3x4ij98irn6hq3bwcoifmg5ozmqpg1a X-HE-Tag: 1703067707-661822 X-HE-Meta: U2FsdGVkX1/ibO03yWF5ljeiKYiGQAH3b7uEhNPy637jmvvU5Nv8+vp9K1+Dw7WtrmsYRGemiuo5l00eAECh12g/yrIbRkMkf1GGBQ0wjo8DYV4lPUWKXo2djjmjfNt2Cvp9x/XYCar8ljZhPyNiKKSXFsjW3EaGHOf7GiplCCKEvvq7LSlCBXGCvXsBNLOnM1usK6z1nbdwNh8tZy7QpOpOmzsD+Tj9KvU9CNp69k6u3au/3ISiEvceERwqfCFrMIg0ttph6r93j8gHuHggJEFWRFiVclw19XkYlZGaUZySjxYjlHb5uVASkGg6kOVGyI5JsnYO+rrbJyZ8dUCSvymaf4Jx3O/0RJ9AI4Q06XOiIGSRyNSnndIYCetSWWmaD8P1YSOkhWFQjrZsb6Vwp93DA1ReuPnZrrFQXFYjPH53s14oySm6YEioh8q27FC/UONBImggO8okExa1DO6SjbCrLbtWZ8PU2tVISOts1k14JxwzFt4g3eqQS+ClXSbnB09tubBSUea3xkA7QMsbXAHFQbKVtJ1/RM0D16+9ztVAcb4+qEdTccZlGnVi+sRpj7BDF/TCBM6v4s859WW98XJaKo4Pcq6fQB30U7jVvQKcLynX9/qTfvCe/E/UEKQqXvOXaujlFB1voAkYcm/yUGNwO1SUkiSSXTnk5F2oiEnDvied0HC7LKSLVxYAnqIuS+Nz4xZbxPp55qkJRu4TzD6Wm0/isu2QXZgTNd6aZuKFVeGPPpZndiMQeF4aQerw1ohQbq9dx9Wiwlxze+IrkN92aNUN3hbKEM3MH6j6Gj90YrHpS2wBHeRQmBELha56Ca7EiIDFZXLtu8egKE5zm5XcfkypyVgqidOqNnfusQiJ6fiJOJ8mkixPjE+2bt86xxSgye2s7hoAesPnPUUJ6wzwye88MehLWiAhWzgh4b+EfpB8wVhIUn35+rFRVZxWZVsUN/TCuZT1E4sscb/ KtFHURa9 frjUs/Wn23y5Pq/1cdFWUoXP71P3O/57iVH+Hg4IyDMI7lCMufXRKF3onmFiM+iO8kwIPJpQlQFQFaUW1cgYhtcthGQL4UCEz1AS4zzyujYKadLDCWGAHxZ4y7lPGmz80m2yXbrXi3phomcg2TfxC31tw+RaYPZ9/NZYbLLQ2unXjJ4uFHat2hvRLqBHR7tFdBs49WVRKuzBo3XowmwDCxDIoVoJZNMUX6cugDDJRQUXmWL5+ZGLW2sX390WqfsjYgT3pu855H+CdoDwKoV/bd3MVSyPIv2D21dsNm/qVpD2VRMkuOwKJ44LKHvhhKNQjVtFkbfjacpkDxoTl/jlFJliIy1G69/6+RlZ1bKMkNg1Dgv49Cd32JfFNBKSPaFnCAtuCl0Pj06JHCg+29gJVt5M6AmFLBRT+yJTKUXoOurXTy/7jsWXsIP/rRw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Chris Li =E4=BA=8E2023=E5=B9=B412=E6=9C=8813=E6=97=A5= =E5=91=A8=E4=B8=89 07:39=E5=86=99=E9=81=93=EF=BC=9A > > Hi Kairui, > > Thanks for sharing the information on how you use swap. Hi Chris, > > On Mon, Dec 11, 2023 at 1:31=E2=80=AFAM Kairui Song wr= ote: > > > 2) As indicated by this discussion, Tencent has a usage case for SSD > > > and hard disk swap as overflow. > > > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmail.= com/ > > > +Kairui > > > > Yes, we are not using zswap. We are using ZRAM for swap since we have > > many different varieties of workload instances, with a very flexible > > storage setup. Some of them don't have the ability to set up a > > swapfile. So we built a pack of kernel infrastructures based on ZRAM, > > which so far worked pretty well. > > This is great. The usage case is actually much more than I expected. > For example, I never thought of zram as a swap tier. Now you mention > it. I am considering whether it makes sense to add zram to the > memory.swap.tiers as well as zswap. > > > > > The concern from some teams is that ZRAM (or zswap) can't always free > > up memory so they may lead to higher risk of OOM compared to a > > physical swap device, and they do have suitable devices for doing swap > > on some of their machines. So a secondary swap support is very helpful > > in case of memory usage peak. > > > > Besides this, another requirement is that different containers may > > have different priority, some containers can tolerate high swap > > overhead while some cannot, so swap tiering is useful for us in many > > ways. > > > > And thanks to cloud infrastructure the disk setup could change from > > time to time depending on workload requirements, so our requirement is > > to support ZRAM (always) + SSD (optional) + HDD (also optional) as > > swap backends, while not making things too complex to maintain. > > Just curious, do you use ZRAM + SSD + HDD all enabled? Do you ever > consider moving data from ZRAM to SSD, or from SSD to HDD? If you do, > I do see the possibility of having more general swap tiers support and > sharing the shrinking code between tiers somehow. Granted there are > many unanswered questions and a lot of infrastructure is lacking. > Gathering requirements, weight in the priority of the quirement is the > first step towards a possible solution. Sorry for the late response. Yes, it's our plan to use ZRAM + SSD + HDD all enabled when possible. Alghouth currently only ZRAM + SSD is expected. I see this discussion is still going one so just add some info here... We have some test environments which have a kernel worker enabled to move data from ZRAM to SSD, and from SSD to HDD too, to free up space for higher tier swap devices. The kworker is simple, it maintains a swap entry LRU for every swap device (maybe worth noting here, there is currently no LRU bases writeback for ZRAM, and ZRAM writeback require a fixed block device on init, and a swap device level LRU is also helpful for migrating entry from SSD to HDD). It walks the page table to swap in coldest swap entry then swap out immediately to a lower tier, doing this page by page periodically. Overhead and memory footprint is minimal with limited moving rate, but the efficiency for large scaled data moving is terrible so it only has very limited usage. I was trying to come up with a better design but am currently not working on it.