From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6F61CC4167B for ; Tue, 12 Dec 2023 23:40:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 10DF26B010A; Tue, 12 Dec 2023 18:40:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BE7E8D0009; Tue, 12 Dec 2023 18:40:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EEEEC6B015D; Tue, 12 Dec 2023 18:40:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id DFFCE6B010A for ; Tue, 12 Dec 2023 18:40:09 -0500 (EST) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B444FA1EA3 for ; Tue, 12 Dec 2023 23:40:09 +0000 (UTC) X-FDA: 81559786938.24.B0F8F6D Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf15.hostedemail.com (Postfix) with ESMTP id D0889A000C for ; Tue, 12 Dec 2023 23:40:07 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=mlJ6vIEL; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702424407; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zXHQF7Ay6V2DwGSP7PW43cHZ4MHQlH5fsUBGMwcWepg=; b=UMww0FyOlJhEv7B+whg2y5Row2Y6lu9Mr6qVh0EhHLPM8kDrGI5JXyqD0/lGMe3V2Eor77 U8uxFspqu3bmqDpZF0S+l94iCke8bCUf2Qf5IwAN5B7vSJoVC5kmSVwyZEPfSTqC1uCr0h U6tj90VXZtzb/GzXppl4/xyI3cK0YEc= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=mlJ6vIEL; dmarc=pass (policy=none) header.from=kernel.org; spf=pass (imf15.hostedemail.com: domain of chrisl@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=chrisl@kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702424407; a=rsa-sha256; cv=none; b=YfogAlcQGaNzFud7DnmnmMCcRwt5B+d2bNztjeWeF4jBShw80GqrpDDiB2k6PxIJH/MZAe ao8wg+Q+X+gVJSr83rtEY7Bm1+MS85XCTBxd0m1Mzmx1x3UOlN/ajS+Bie4y+3/mEaoXwP 2Z6Nixqr4dE11rzkxdWsRnxLV0XMq5A= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id DDA1061A2D for ; Tue, 12 Dec 2023 23:40:06 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8BB6FC433C8 for ; Tue, 12 Dec 2023 23:40:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1702424406; bh=zXHQF7Ay6V2DwGSP7PW43cHZ4MHQlH5fsUBGMwcWepg=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=mlJ6vIEL907Id3r0ZRlAzIlJLsmxIue/J0/U1PTry/966Ffk2IoD4AhVEEKDC21zM 1o6aEl1oSGIEwIAekJJ8LMsj40t5T4WZghjUT3vj+frKNt4dD5oIDy/300+vWyg4Mi IDxrN76NFMXCP/OoMsCVg7PLM2z/2LIal3EfsXAbV5klFS4M4CLCdBgOm/Bw22/8f0 sVxX68RF9VsUELRcKhDpepB/j4ZByk9DebWcYk7bdlTaFkIODMSjqAfJJkwsL0yzIL xF1Gxc47pJiBqsncUvTE0AO3u5h3L+20Cg4WOXgDYUPdPO3PXMoIP685o+YRHbQopQ CTjhrq6pB1PSw== Received: by mail-pf1-f173.google.com with SMTP id d2e1a72fcca58-6ce6dd83945so5710152b3a.3 for ; Tue, 12 Dec 2023 15:40:06 -0800 (PST) X-Gm-Message-State: AOJu0YysTYF+SG+tQk9rKrcm4oISe3PGOiU5QVFZvHw38/4Ghv5O7BaJ HxUfxoN2rJK80X01vRq4tAT2jjNJNZ2yMrpgrN5/Xg== X-Google-Smtp-Source: AGHT+IEmqfa4ep60xPurty5z6OFDCdyneMw1aoInldpxnnUT16j6Ovr9QuBYq9vrMR1uZBLj+lHa8r4Xxm1cJx1AwG0= X-Received: by 2002:a17:90a:510f:b0:28a:d858:b6ba with SMTP id t15-20020a17090a510f00b0028ad858b6bamr673250pjh.42.1702424385003; Tue, 12 Dec 2023 15:39:45 -0800 (PST) MIME-Version: 1.0 References: <20231207192406.3809579-1-nphamcs@gmail.com> In-Reply-To: From: Chris Li Date: Tue, 12 Dec 2023 15:39:33 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling To: Kairui Song Cc: Nhat Pham , akpm@linux-foundation.org, tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, cerasuolodomenico@gmail.com, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, mhocko@kernel.org, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, hughd@google.com, corbet@lwn.net, konrad.wilk@oracle.com, senozhatsky@chromium.org, rppt@kernel.org, linux-mm@kvack.org, kernel-team@meta.com, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, david@ixit.cz, Minchan Kim , Zhongkun He Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D0889A000C X-Stat-Signature: bjhrdwa9msfre66tx54k1ji93t7u98a5 X-HE-Tag: 1702424407-997686 X-HE-Meta: U2FsdGVkX1/TdHdHoupad6iwsVhA/wN+ISPJPcQXEWnt8+BMozJbvpHfA6XToYyvUBHsBHpiPtrbMJHx9pqh71ILXCcQcQm5Fu8rCPXBRXSvJuqyHPBel33QHc0ZV6toZz0M3KMFtkKQxvg8tHP1WtZDmm0PN3ojkWt/Jj23EQD/0EyecRNMl2Qs92pzVxbcb637hcXLuBNy/rxnCdcdJ6BzjgOZU/+iyN5wmG4OXnCBv9ZAY6t23PymeINa4QAOzB/z5xQRjuvNKPOX/Ze0e+wucZRsbjdxyqKSGNoNl5W4smxMhapwg5aB5LDjcoYjq/b+fgNFYUHlv9ouddwlMICiTtHzKn9vSNzwJQkCSYSxx+rBafVqDJ18IYW4F85hGM4tCv63fhQdz0UWP3HXKTKJ0hdaK1XTfLdAlNbsOY97wE5NQbjU5AvgtOScFe1/DmHKyQyeJS4GhXFuFgIIv0UiCfhXX4tv0lo2HFpZi6n67SQtTBMCHDG4q54wJL7jtmmDXwLAAbN8twIhyeP+hTTnFZFcfKXz0q6U0BT2yLRqzFhVhppsTx7Tkf7g+eoFhRgrDtWNrfGFW1ZUO6jxZf1HGlIkMh/j3KPW/SIGn76Cs1fOjsnTX21lmjwjvKYh+b4ksCN5F1pi59bXlGdQR9a8nuhs3YBbDtP71ouYh1tqDvuVQCcGZbBHJ+Nh/Tgf9D2S+Ni5hnSK7dhs7KPpUYkt8xI9ygEdUfJLqirTsJ6+PBt6qQbZOWnw+vepIWwbkCzeQCrqEV2KYGmevOBtkMUpcDLDeXTrQ8qxJI+gTpmF8rGcnMu6MzRjN6JZGn2Wagke7cY2PUv9rn+1EvukmTSa5djSHG87Ytoa96vSSGDYbaH9BucWUBTPfxXqWejtqi7VhvwzhVHyqUTIH9+D+2ZELW/p48X8i8KeD96RbC5khwqeiJBQATMJUi6CseoaBSBl9979eoeRwkO0ckn kzZ19LMN O7rjVpQmYGdVPOfHlgu3nja0vs3EYD9r+zmX387DWmmjCVTQaNg2NN8YaO+ke1FisYF3FPM969TgBxTsbIjQMj7j5Fe3OerM9vGkK93p+FZHb+gdfwfYUSBV3gtodjIBdI61x2yNB9Z3+yCo6BwJlml8dyIISX7n1Z0Rc2V6hPtmx5c1mlp0tTopIRIsneh4qokBk1eTeFIdMQu1NlmgVPFS9ZJBTHeEDHGX7lFi3wOTqjWKWQfImcyavh1ygP4zCwoLML+IMvzqTgZFkOflSR2LIuV3vZKVGwnmohVz0/cTl4jQ= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Kairui, Thanks for sharing the information on how you use swap. On Mon, Dec 11, 2023 at 1:31=E2=80=AFAM Kairui Song wrot= e: > > 2) As indicated by this discussion, Tencent has a usage case for SSD > > and hard disk swap as overflow. > > https://lore.kernel.org/linux-mm/20231119194740.94101-9-ryncsn@gmail.co= m/ > > +Kairui > > Yes, we are not using zswap. We are using ZRAM for swap since we have > many different varieties of workload instances, with a very flexible > storage setup. Some of them don't have the ability to set up a > swapfile. So we built a pack of kernel infrastructures based on ZRAM, > which so far worked pretty well. This is great. The usage case is actually much more than I expected. For example, I never thought of zram as a swap tier. Now you mention it. I am considering whether it makes sense to add zram to the memory.swap.tiers as well as zswap. > > The concern from some teams is that ZRAM (or zswap) can't always free > up memory so they may lead to higher risk of OOM compared to a > physical swap device, and they do have suitable devices for doing swap > on some of their machines. So a secondary swap support is very helpful > in case of memory usage peak. > > Besides this, another requirement is that different containers may > have different priority, some containers can tolerate high swap > overhead while some cannot, so swap tiering is useful for us in many > ways. > > And thanks to cloud infrastructure the disk setup could change from > time to time depending on workload requirements, so our requirement is > to support ZRAM (always) + SSD (optional) + HDD (also optional) as > swap backends, while not making things too complex to maintain. Just curious, do you use ZRAM + SSD + HDD all enabled? Do you ever consider moving data from ZRAM to SSD, or from SSD to HDD? If you do, I do see the possibility of having more general swap tiers support and sharing the shrinking code between tiers somehow. Granted there are many unanswered questions and a lot of infrastructure is lacking. Gathering requirements, weight in the priority of the quirement is the first step towards a possible solution. > Currently we have implemented a cgroup based ZRAM compression > algorithm control, per-cgroup ZRAM accounting and limit, and a > experimental kernel worker to migrate cold swap entry from high > priority device to low priority device at very small scale (lack of > basic mechanics to do this at large scale, however due to the low IOPS > of slow device and cold pages are rarely accessed, this wasn't too > much of a problem so far but kind of ugly). The rest of swapping (eg. > secondary swap when ZRAM if full) will depend on the kernel's native > ability. Thanks for confirming usage needs of per cgroup ZRAM enable and flushing between swap devices. I was hoping the swap.tiers can support some thing like that. > So far it works, not in the best form, need more patches to make it > work better (eg. the swapin/readahead patch I sent previously). Some > of our design may also need to change in the long term, and we also > want a well built interface and kernel mechanics to manage multi tier > swaps, I'm very willing to talk and collaborate on this. > Great. Let's continue this discussion in a new thread and start gathering some requirements and priorities from everyone one. The output of this discussion should be some one pager document listing the swap tiers requirement and rate the priorities between different requirements. Once we have that nail down, we can then discuss what are the incremental milestones to get there. I am very interested in this topic and willing to spend time on it as well. Chris