From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB14BCDB46E for ; Thu, 12 Oct 2023 18:08:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E4FD8D0139; Thu, 12 Oct 2023 14:08:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 493CA8D0002; Thu, 12 Oct 2023 14:08:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 35B188D0139; Thu, 12 Oct 2023 14:08:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 21E358D0002 for ; Thu, 12 Oct 2023 14:08:44 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id DC834140471 for ; Thu, 12 Oct 2023 18:08:43 +0000 (UTC) X-FDA: 81337594926.19.8114F90 Received: from mail-io1-f47.google.com (mail-io1-f47.google.com [209.85.166.47]) by imf17.hostedemail.com (Postfix) with ESMTP id 114A640024 for ; Thu, 12 Oct 2023 18:08:41 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Hbi/bXhQ"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.47 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697134122; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pbiB7IDPfah5L2y1Sjju6t9s6q0wXECDy1+qTIk2YZM=; b=8WzxY60VhsH92gVBvSe1LI/ROqRfePSjJOOBVW+ls8TWu6mYUIfGxlLAlNJNYpQdi7RoTj 4CzBWFfgIMAFqdiM6fX5XRRiyq3YWjQ732vNXerU+qrr/pBTLeB7uAz3pbTGx9DvVpSqYm 8krTGbQXXOmdBnxkWRW2n2a2/wt9OYw= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Hbi/bXhQ"; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf17.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.47 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697134122; a=rsa-sha256; cv=none; b=Q3Wv0ZpfA4f+e5jZ6M0XVgJx44VjqLCoB+9c+mdy+M/gfqLS+BOM1BmvhFrscSEYdbe2oD ThKuYekR5/kkSLFqSJVAt685bD1nSpCJlQsCffBPQ2B6ftMf2eVz5HLqFeuFNNEhvd390j eWUmMp6KOOePNKYKLdCkTYeHMIinuQk= Received: by mail-io1-f47.google.com with SMTP id ca18e2360f4ac-79fca042ec0so48967439f.3 for ; Thu, 12 Oct 2023 11:08:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1697134121; x=1697738921; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=pbiB7IDPfah5L2y1Sjju6t9s6q0wXECDy1+qTIk2YZM=; b=Hbi/bXhQVNbscQNzcIF1TWwrjh0mUdDtEvKu4HqNqAWCDsv0Qjj3By0O6qisNj2RFz eBN2lrO5rWwj9+HS4tRseLAyK+s9TCFBxXkQ5tKrDVJGUsZMQHYEPjn8tgjjRUfET8rz FeKap3J0kGxDn8/rfx3ndpje/2Kz0/fAS7OUymWESFPsgcEU9n7Z5cKbiUEKB3hExu1w pC3GRakNgnv3D1jJL2nRo/tFbxSZ4HH0qKiCycXPLqcVTaj3e0aKbbub16yzkzanMuf0 67J5RScbf1uNeZAoNxSu79KGjmHUshiAu4e069JZvgVEzNJYcP+HeS1TzKYpb259ZzzQ rE6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1697134121; x=1697738921; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pbiB7IDPfah5L2y1Sjju6t9s6q0wXECDy1+qTIk2YZM=; b=WZIvmLJ841JNpitnCRh3sNqO9ZZQWWMOrLs775QC3dtD7U5E+2vf2LigCBbk+QVvQb Ztwt+DR2WZTD/X4hlhimRWoKdVhUFFKHEG3TI7GqekLt5OKQuG0W4P482Z7KEFE4Tc9a rz1C1OXqg6Wc88gc7NflR2TqZ7zgRzjg8c4P4RU7NoXCn6trqpBIolZ9ObSA+L9BCBFm Cqi0KKnnvnQ2Wgf1Khta57/eMxj6Uyckgp6oX43m3olcz5PF3IUA0eYRW4/En0aINLzw fySyfWDFC9c7S9fcUO5QB0QwmMWUoVXz1LxkunJ/iyNJr/ih9DUKwZOV8uSqzYHLTqEC pIyQ== X-Gm-Message-State: AOJu0YzOuHKqlbdnqXrnKg7NEDA+X3BO4+GED46ktEXecFU6bdGx0Mju xOb6bZ6rxJ+itO5dVefJmengG0W1Qu7R6Jxc2H4= X-Google-Smtp-Source: AGHT+IFSLEV7OEcYRQFGknsfYPQqbcrCozhl99CRBE2eSR2+COWbmttOhirKwuni2UaGZIAyIjUQVjLRc3JIVHp8I20= X-Received: by 2002:a92:cb50:0:b0:357:5dff:f5e4 with SMTP id f16-20020a92cb50000000b003575dfff5e4mr1181679ilq.23.1697134120883; Thu, 12 Oct 2023 11:08:40 -0700 (PDT) MIME-Version: 1.0 References: <20231011051117.2289518-1-hezhongkun.hzk@bytedance.com> In-Reply-To: From: Nhat Pham Date: Thu, 12 Oct 2023 11:08:29 -0700 Message-ID: Subject: Re: [External] Re: [RFC PATCH] zswap: add writeback_time_threshold interface to shrink zswap pool To: =?UTF-8?B?6LS65Lit5Z2k?= Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 114A640024 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: d4o1oo8fikd4r9dewgjfegdd55kw6akr X-HE-Tag: 1697134121-538762 X-HE-Meta: U2FsdGVkX19xhBofPAjf4G9j2quCtELI0Es2VT/i8JFDj0GWobMKDySSAAcofC6CHq5oZ1h3H8N20SXv1UfNQejHAIR+PKXvyPD/a39dn+U/odw1sfVIrLotNJ2lSRqS5CcOFYM5h2TTKb2uCXDgkjUl/T5VuSCCNA7fBscJ2xXbo9PcQ3a5GkL05OhHOhDi0qe4WnPk3A+3EiRQGlWF1ALUdataWjQ2G3HE76CGgoMy5JSNC+7q4W1VcYWXEWol4PziB3IJP+CMwEL2GW6NHB7B9yzDK2oKgZPiDSJMYkA9kB77U4Y7DpgE0aFA0FxaRz3kZ5NPDjN9O2ZLyoOdEC5Gua1JeRmU88yzQmVNp6NfVFJ/bvxLgCehSnTxWDurX/rbuo0FGJfMf4QFbrHhLe1Kg0yzFEfKIXltGcbMgXKGys9Ry+M5AIdWfdZiA+ydH5cUVK4xmXNg3WZ/QH8nQW6ViMWQ9a31a5mhdYmqNLW7BGf2UelYxWe9AJh9Dm2DaiQijCNgKF+5jETQ6qpJc9ArRr5lRKHAyZ1M/Gs9JoiAQKccaU0jBPwFtADCYeigWPeQXL66LDcCO284pDkxZqv1skzsiU3Meu6CywsmhK1PhpAGv6vXDPRSM2IvpkrjFFLeq9No8lh5w+kKkOS9FImrSDe6nllKgGsW4KYEwL9AqPdELolOyDKOrlMDu23H5KhyhANdVQvaaq8Xd4Yv5JXQieRzZEY30I1N1oGMVs817nPsfq9LVOA228rtzT6zBwhFobHCN+1TL9oM5AuHJX2aXRnAVgL5e08W5O8zUUs3952OZs9kff//RR5Nlw6Omt+BPxxDL/M/voCimRB5rEvuSUtaCQJ29lROsW+sFwgiVWQqY8sXS6OvEhd3O43bexfOEbt/rPOKxTZYQ0OIjfur/y9bmY4YpRmuQ9sq4pp7+9s1RSL8+Rz7/G53oX56/f38+q8n0aUjV/wWMX3 WvkXzaRy l52+YLO4LDSSFWlNgbhrIM0S0LzGk8TkLFjnRfDhD6EOWl22v+oTIeWKHkk3TW+xTFmBH2qIFSOzFy0iVhpFHTj+fa4trUJzJoXruA1eGutMe/7LF3NRs6JK/iueSP9WoKU+/sZ+SS1DCJ+LlUnLQyC+umpnF3peqrZNqYGKuNjJ5CzQ+HB40PG8FxF7gSKVeI9guvzAf405+LLuUI8kGycIAHG1zVzTaIIbWzuQCkKX7Vjj02ty11H0SHDIWTLx2oAonBOe8dxp3+BodVRFa2szhlGURJr+k9YFN7Xti5uFMv8YluHLAF183WGgecUm0rQpgGiXMSZYfoy3VeZ8f5qhBKa+wP4/zSq9lCsjN8uc38h9YxukPaK7tr/ZatmL6shSaoZiJUP0sdsC5l9cBf5r4eH9ZC9ANT5RQ6e6jJdogd23g9MZ7/dIblPkvRpAXRVdbk7UXzYZXShOQ0m8dadDxVJJ8JX5QAcUFH1ouwV/Yc/8ZUTTm9olnFBmsPoMpxrKz1w9N7EqIKu1ljAzJ4YulvQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.004631, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 12, 2023 at 7:13=E2=80=AFAM =E8=B4=BA=E4=B8=AD=E5=9D=A4 wrote: > > Hi Nhat, thanks for your detailed reply. > > > We're currently trying to solve this exact problem. Our approach is to > > add a shrinker that automatically shrinks the size of the zswap pool: > > > > https://lore.kernel.org/lkml/20230919171447.2712746-1-nphamcs@gmail.com= / > > > > It is triggered on memory-pressure, and can perform reclaim in a > > workload-specific manner. > > > > I'm currently working on v3 of this patch series, but in the meantime, > > could you take a look and see if it will address your issues as well? > > > > Comments and suggestions are always welcome, of course :) > > > > Thanks, I've seen both patches. But we hope to be able to reclaim memory > in advance, regardless of memory pressure, like memory.reclaim in memcg, > so we can offload memory in different tiers. As Johannes pointed out, with a zswap shrinker, we can just push on the memory.reclaim knob, and it'll automatically get pushed down the pipeline: memory -> swap -> zswap That seems to be a bit more natural and user-friendly to me than making the users manually decide to push zswap out to swap. My ideal vision of how all of this should go is that users provide an abstract declaration of requirement, and the specific decision of what to be done is left to the kernel to perform, as transparently to the user as possible. This philosophy extends to multi-tier memory management in general, not just the above 3-tier model. > > > > > My concern with this approach is that this value seems rather arbitrary= . > > I imagine that it is workload- and memory access pattern- dependent, > > and will have to be tuned. Other than a couple of big users, no one > > will have the resources to do this. > > > > And since this is a one-off knob, there's another parameter users > > will have to decide - frequency, i.e how often should the userspace > > agent trigger this reclaim action. This is again very hard to determine > > a priori, and most likely has to be tuned as well. > > > > I totally agree with you, this is the key point of this approach.It depen= ds > on how we define cold pages, which are usually measured in time, > such as not being accessed for 600 seconds, etc. So the frequency > should be greater than 600 seconds. I guess my main concern here is - how do you determine the value 600 seconds in the first place? And yes, the frequency should be greater than the oldness cutoff, but how much greater? We can run experiments to decide what cutoff will hurt performance the least (or improve the performance the most), but that value will be specific to our workload and memory access patterns. Other users might need a different value entirely, and they might not have the resources to find out. If it's just a binary decision (on or off), then at least it could be one A/B experiment (per workload/service). But the range here could vary wildly. Is there at least a default value that works decently well across workload/service, in your experience? > > > I think there might be some issues with just storing the store time her= e > > as well. IIUC, there might be cases where the zswap entry > > is accessed and brought into memory, but that entry (with the associate= d > > compressed memory) still hangs around. For e.g and more context, > > see this patch that enables exclusive loads: > > > > https://lore.kernel.org/lkml/20230607195143.1473802-1-yosryahmed@google= .com/ > > > > If that happens, this sto_time field does not tell the full story, righ= t? > > For instance, if an object is stored a long time ago, but has been > > accessed since, it shouldn't be considered a cold object that should be > > a candidate for reclaim. But the old sto_time would indicate otherwise. > > > > Thanks for your review=EF=BC=8Cwe should update the store time when it wa= s loaded. > But it confused me, there are two copies of the same page in memory > (compressed and uncompressed) after faulting in a page from zswap if > 'zswap_exclusive_loads_enabled' was disabled. I didn't notice any differe= nce > when turning that option on or off because the frontswap_ops has been rem= oved > and there is no frontswap_map anymore. Sorry, am I missing something? I believe Johannes has explained the case where this could happen. But yeah, this should be fixable with by updating the stored time field on access (maybe rename it to something a bit more fitting as well - last_accessed_time?) Regardless, it is incredibly validating to see that other parties share the same problems as us :) It's not a super invasive change as well. I just don't think it solves the issue that well for every zswap user.