From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10793C47077 for ; Thu, 4 Jan 2024 19:43:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 737606B0307; Thu, 4 Jan 2024 14:43:12 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6E6556B0308; Thu, 4 Jan 2024 14:43:12 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 588C56B0309; Thu, 4 Jan 2024 14:43:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 3614B6B0307 for ; Thu, 4 Jan 2024 14:43:12 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0C54D1A0A18 for ; Thu, 4 Jan 2024 19:43:12 +0000 (UTC) X-FDA: 81642652224.10.63CACBF Received: from mail-il1-f179.google.com (mail-il1-f179.google.com [209.85.166.179]) by imf11.hostedemail.com (Postfix) with ESMTP id 3A40C40014 for ; Thu, 4 Jan 2024 19:43:10 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CE2x2jye; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.179 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1704397390; a=rsa-sha256; cv=none; b=qyIYox1IhMM0UmYwyMl6CfEj2MLziPAOth9CDus4iKASSovDgTWdvMDlUNGajmxhWdua7g IKllzZHTWXWVKpjBuEoMlEQSnbJbNQeKMfnuCSTSdZda4WcbXJQ0x2RU7Rq1FiBxj7iB7+ LvINL+PJG59fWn8pXIMWbLF7d0kgQAw= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=CE2x2jye; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf11.hostedemail.com: domain of nphamcs@gmail.com designates 209.85.166.179 as permitted sender) smtp.mailfrom=nphamcs@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1704397390; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=usIKlCrnp0i3uK40dHHaJigAXDYBuDn6Lo+eszj4SqA=; b=RAyLA63eKeQjEhnJgEeS6QpzN9nZoXl+4UpAK2DGmCuiHlMR7Sj2g/1oV/bpbdUBAXeqxw stReNCLkjDqX70KI2Z8cQbpLvWDVLg6Y/rtP/4ZOePdnNeRLbJTsn5iRfCYqr3gXvbsg5c Wh5tT77zM7aIBM4CpHT2obrmNgoQR/A= Received: by mail-il1-f179.google.com with SMTP id e9e14a558f8ab-3606dff1938so209175ab.0 for ; Thu, 04 Jan 2024 11:43:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1704397389; x=1705002189; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=usIKlCrnp0i3uK40dHHaJigAXDYBuDn6Lo+eszj4SqA=; b=CE2x2jyeTIgf9bNgaxXhBZy7vF4PDLEaekhrF5bo2uioYG/sgH7Jw1js3QSKyV5Ix2 GKSaHQ/vtV2iQPoAXRBNfZUlU8BORBnObgSehFt2SWMSzHzjsD/ZHZK5FUAXspJDA/PP FizQhVNvYH++0VSkcE9wUpusQykvIF7kF9mq/XVcM/bP153SJaEDFCjqAEoDyDpggz18 vYFPjI+F6iltzCtT3M5G0vgiwVHoTuNau9F/D//tSO9K+S06KYBN77aa/xAbUVjRJcUG 6B/AE3qYrfqX+rB+5ICkvZeJ+n9Mf6RqDD6rDepYtAFyPe+92dAyg/ys3VCaO3ZHEEKb 9+WQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1704397389; x=1705002189; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=usIKlCrnp0i3uK40dHHaJigAXDYBuDn6Lo+eszj4SqA=; b=J5hd+EsrCxa2j3b3k94o9Vq6UfV4AeMCHK3+i6UNtXSPU595au6BAsMLY3umJTIPzn VLFUW2qWz3xwE8ZCtu0ClQQx/b6ioohHHyqcxOqX3mEkomMucCcGtO4frZpQbB8JasIk 7amOyHxdJ4udFAr/0uLNYq9Jx/n5qkxfwhzkdDggBQVquJLS7UKOD/dZqwnmBb3rfNtb hDdpmsWlAY9H9GVjtlmErihzSdG0rllEuwvOfz0MkkywTR1GillDLv4Lmm2opqWeW5av 17GqYMzysfaFtIZy52QyHl+zaw9XMKuxTKx6jt6fnC+za/P7ym6iEZ3/tfx9scAlsVR1 uUew== X-Gm-Message-State: AOJu0YzVMX85hxO1iDI/tWe2WJlpXWBQEAOAs+UhUEoJzo7G0zlWvULX td0KUcJZLAXCaJxhN7kB8V6Pit/Lz+59BYHSHBE= X-Google-Smtp-Source: AGHT+IH9abU05gCc5ulyR08ynDAmb1I3QyVbDaJ1OWegfX8aTjJxyomOQilQOOJo3Q7MMV9Vlo1bTOTiHbh+lAXew0s= X-Received: by 2002:a05:6e02:1525:b0:35d:61b6:a9e1 with SMTP id i5-20020a056e02152500b0035d61b6a9e1mr1049490ilu.54.1704397389200; Thu, 04 Jan 2024 11:43:09 -0800 (PST) MIME-Version: 1.0 References: <20231024142706.195517-1-hezhongkun.hzk@bytedance.com> In-Reply-To: From: Nhat Pham Date: Thu, 4 Jan 2024 11:42:58 -0800 Message-ID: Subject: Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry To: Zhongkun He Cc: akpm@linux-foundation.org, hannes@cmpxchg.org, yosryahmed@google.com, sjenning@redhat.com, ddstreet@ieee.org, vitaly.wool@konsulko.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Chris Li Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3A40C40014 X-Stat-Signature: 9wm4zr6urjaxu7dthpradofarsfhogc6 X-HE-Tag: 1704397390-794288 X-HE-Meta: U2FsdGVkX18uiyBsg/SDO7OjfOFY9XN/iVSLhNvuS0WTcEO1MsHNhnWN6Xrh6NtYxdD2qxXxBK7EDyncJMxfxrf7Xfbs0EKXxc+8eRStiENMnOThVkR+EQdfai2mGlYRhqHRVLJpA6ZXryFuo8OlRgayydfgg9Q4d0f/4h3WMd0FRuA/qYD2pMYvNhT8cZQa7aO5XG5anWk6CIfNObApbBiP3tJThkROPUUjwX3QCKFkRMQl6LsmrJLZXPodGy7lRf5NQ3FhQrGHKL59nggAmL32/jctprkDnaLUjok6tU3cO0Gxjmtw7Jqe7QfMWGkB1eR7VqI1DIu/eEdRxKrU5GOec285V4v6hgCdzbyrpSSIvtwgQzld4peA9uuxV5GjxAHuunof6QX8Dvg0HER5JsBMmp5xqQZjzIoyDvfn5eZ2l4EbPXmwp1A2hurdBoOK4rBNm4TPggznvcgYcszRMCsv974imhNvj9QpzJZl8Ln2REdbifiq9Rm1VEzRLm5qAGvweaFs1qKNJZUfrQGQMO4f2H0shdeTkX8Zd1g5x6vzSKixTCvIhFSaBzWBTXcd1j6J/HmQMAXJeriTpWU1uZJCKDxiRX0UogZ3I05t9IqxnGbmduJtHUDOrCMRgRZdCsVZgFNzXY1QMT3/SC2NrIjQmWWywO7UuCGtT0/R6MHrwc9ozpcoKCR4T8WGPeKawNwiLrM93mcEsHS8zWjlWlYk5GIMavXNZ+ZfuafZX00XkLdO2AMranpCAgGDW6fGP9Vobkyhz2sI4oDeiQkf5J1tEqokdwPlUt1HQJjtZHA2hCa1JOS9Yn3qGDQ4kHFMfLxpNYRFI54sPqAseQs/LRVS6266MafBOK6GO0liRQBVTGdk5UpVOxuwio34G3r/awE4DUn50JK0tUEDWRzNMWeyF8jW5pL1TaHAFj3mOTjFuED51LOXvGQ75/XpvNuNHG3r1FZy2Rj6L832M5R p5hWcYMR hbJss7Jc9DEY521CjF7j4mNzOQByb1EQIeK3p3CIK7egbxSAMNN0vi+K70Rvc4bew+wscteiNaA6Yq20+3/IURNhnSleoZlvmP/ObJiM/AIy93uEoRqqEiDwsgqwvjtOTiWz9dYWkwMfaSww1q985a9xdrT2WR5Q1qVv9nA92Tvj70dUzX1hsc1L9MBcTAgMA6LzZOv80d4v8Zzej6DXlv1jWP2NrbuxMud0xuYI+W1nBiz6/RBd2iQj0EFmF58m/WpfRWzr/3w4idZ0BmmFXxRCePlgyQmCe1Akf8ghlg9LoUG5G1n7TkqxKfh/MpeEiMB45La1aiIQryjiWZOX3exQyvnjRtB0b/oR2TTws2+Vlk/xVu2zj8uRy338OpDKTEarCngBE6HvIytqLZc2velGYXFLFu2Yn7EN7848k/NTm7gV94RkfLOfwF6gRC2Umb4pzJnnEQpBE8X4FYCThxaJyE5BDRgz3C/e0DndOuiafrs0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 3, 2024 at 6:12=E2=80=AFAM Zhongkun He wrote: > > > That's around 2.7% increase in real time, no? Admittedly, this > > micro-benchmark is too small to conclude either way, but the data > > doesn't seem to be in your favor. > > > > I'm a bit concerned about the overhead here, given that with this > > patch we will drain the per-cpu batch on every written-back entry. > > That's quite a high frequency, especially since we're moving towards > > more writeback (either with the new zswap shrinker, or your time > > threshold-based writeback mechanism). For instance, there seems to be > > some (local/per-cpu) locking involved, no? Could there be some form of > > lock contentions there (especially since with the new shrinker, you > > can have a lot of concurrent writeback contexts?) > > > > Furthermore, note that a writeback from zswap to swap saves less > > memory than a writeback from memory to swap, so the effect of the > > extra overhead will be even more pronounced. That is, the amount of > > extra work done (from this change) to save one unit of memory would be > > even larger than if we call lru_add_drain() every time we swap out a > > page (from memory -> swap). And I'm pretty sure we don't call > > lru_add_drain() every time we swap out a page - I believe we call > > lru_add_drain() every time we perform a shrink action. For e.g, in > > shrink_inactive_list(). That's much coarser in granularity than here. > > > > Also, IIUC, the more often we perform lru_add_drain(), the less > > batching effect we will obtain. IOW, the overhead of maintaining the > > batch will become higher relative to the performance gains from > > batching. > > > > Maybe I'm missing something - could you walk me through how > > lru_add_drain() is fine here, from this POV? Thanks! > > > > > > > > After writeback, we perform the following steps to release the memory= again > > > echo 1g > memory.reclaim > > > > > > Base: > > > total used recalim total = used > > > Mem: 38Gi 2.5Gi ----> 38Gi 1.= 5Gi > > > Swap: 5.0Gi 1.0Gi ----> 5Gi 1= .5Gi > > > used memory -1G swap +0.5g > > > It means that half of the pages are failed to move to the tail of lr= u list, > > > So we need to release an additional 0.5Gi anon pages to swap space. > > > > > > With this patch: > > > total used recalim total = used > > > Mem: 38Gi 2.6Gi ----> 38Gi 1.= 6Gi > > > Swap: 5.0Gi 1.0Gi ----> 5Gi 1= Gi > > > > > > used memory -1Gi, swap +0Gi > > > It means that we release all the pages which have been add to the tai= l of > > > lru list in zswap_writeback_entry() and folio_rotate_reclaimable(). > > > > > > > OTOH, this suggests that we're onto something. Swap usage seems to > > decrease quite a bit. Sounds like a real problem that this patch is > > tackling. > > (Please add this benchmark result to future changelog. It'll help > > demonstrate the problem). > > Yes > > > > > I'm inclined to ack this patch, but it'd be nice if you can assuage my > > concerns above (with some justification and/or larger benchmark). > > > > OK=EF=BC=8Cthanks. > > > (Or perhaps, we have to drain, but less frequently/higher up the stack?= ) > > > > I've reviewed the code again and have no idea. It would be better if > you have any suggestions. Hmm originally I was thinking of doing an (unconditional) lru_add_drain() outside of zswap_writeback_entry() - once in shrink_worker() and/or zswap_shrinker_scan(), before we write back any of the entries. Not sure if it would work/help here tho - haven't tested that idea yet. > > New test: > This patch will add the execution of folio_rotate_reclaimable(not execute= d > without this patch) and lru_add_drain,including percpu lock competition. > I bind a new task to allocate memory and use the same batch lock to compe= te > with the target process, on the same CPU. > context: > 1:stress --vm 1 --vm-bytes 1g (bind to cpu0) > 2:stress --vm 1 --vm-bytes 5g --vm-hang 0=EF=BC=88bind to cpu0=EF=BC=89 > 3:reclaim pages, and writeback 5G zswap_entry in cpu0 and node 0. > > Average time of five tests > > Base patch patch + compete > 4.947 5.0676 5.1336 > +2.4% +3.7% > compete means: a new stress run in cpu0 to compete with the writeback pro= cess. > PID USER %CPU %MEM TIME+ COMMAND P > 1367 root 49.5 0.0 1:09.17 bash =EF=BC=88writebac= k=EF=BC=89 0 > 1737 root 49.5 2.2 0:27.46 stress (use percpu > lock) 0 > > around 2.4% increase in real time,including the execution of > folio_rotate_reclaimable(not executed without this patch) and lru_add_dra= in,but > no lock contentions. Hmm looks like the regression is still there, no? > > around 1.3% additional increase in real time with lock contentions on th= e same > cpu. > > There is another option here, which is not to move the page to the > tail of the inactive > list after end_writeback and delete the following code in > zswap_writeback_entry(), > which did not work properly. But the pages will not be released first. > > /* move it to the tail of the inactive list after end_writeback */ > SetPageReclaim(page); Or only SetPageReclaim on pages on LRU? > > Thanks=EF=BC=8C > Zhongkun > > > Thanks, > > Nhat > > > > > > > > Thanks for your time Nhat and Andrew. Happy New Year!