From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0970BE677F5 for ; Sat, 2 Nov 2024 13:00:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3C56E6B0089; Sat, 2 Nov 2024 09:00:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 375716B008A; Sat, 2 Nov 2024 09:00:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23D1D6B008C; Sat, 2 Nov 2024 09:00:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 068216B0089 for ; Sat, 2 Nov 2024 09:00:00 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6D0CDADA2E for ; Sat, 2 Nov 2024 13:00:00 +0000 (UTC) X-FDA: 82741162266.14.2E399E8 Received: from mail-ua1-f41.google.com (mail-ua1-f41.google.com [209.85.222.41]) by imf16.hostedemail.com (Postfix) with ESMTP id 531E0180017 for ; Sat, 2 Nov 2024 12:59:29 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=moV+fI2+; spf=pass (imf16.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730552236; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s9eY34h26T5YqsifdWK+fuQaARCStH94lZ0P8RHpef8=; b=MYo9JF/bYEypuD+x8gGuxns5/M1DIvy2CFBSCK9gMBueBYpAPgd+t6scLnvDNeNQqCkfdw pSKvwbB0W2OlR57kR0mvEyrrZ6TVe0RWh5mKNE6Ph1HPi2d9VycTYXr25zeY8hYKnBLx5e 51lhZr3bFmMwSQBBgk0+efDDbS9XgZU= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=moV+fI2+; spf=pass (imf16.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.41 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730552236; a=rsa-sha256; cv=none; b=sjJSQlgGiUxs6C/9xMvmdR2xB5OyAmtIlJBxA0dgy5/GG2QtaJ8bWYCB3IfQb7xQfywpnS yKyj2xovOEdcFGjnBp6BN6JQfkIxEauSEndVpJ6zBTothYsDR443vTh61Zjq24l4lhmzOn dUMIpXgpTHwO15beryqR2DlutiaA0qM= Received: by mail-ua1-f41.google.com with SMTP id a1e0cc1a2514c-84fccf51df1so778030241.2 for ; Sat, 02 Nov 2024 05:59:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730552397; x=1731157197; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=s9eY34h26T5YqsifdWK+fuQaARCStH94lZ0P8RHpef8=; b=moV+fI2+UK+jTkqa82YNiYepyRlk4bkpNgWPJc1DMWrBFKjnd55mXPsqiLHy0IHEIy T2lVmehQtnZvxnuzHpB44OGVQgQEFZA657BsBMn3pU25eQXH6Xgfq66DXqAuQ7EMDH39 bVep1ZT94G/NHW6sfPisp/n8wGOokd+G8Wvl9nXSU04HHdxGDV1r4Sy3MVN/w/zYLiWz UryyE3umNKJeILc7n8ob9OkYMuA7fD8S/b5lMEe0RjkJ2aYcUcBr7kY1NPYR2izlbztX dOfv/S5GIdj8E9+ViEs8V9mImVU09rJt8o5rEZM4bg8w99QwOcXCjP4EzcoMz7jwW1M4 vLKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730552397; x=1731157197; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=s9eY34h26T5YqsifdWK+fuQaARCStH94lZ0P8RHpef8=; b=NK4H8hN7zsB267ZdfjEbJcMBXayliH5jFm+jkAC/CE9fi/FNbM4QtBG7nCTQzsX9vg mZ07clpl0KRM2oC+Md72TihhBV17LM8p5IczY9BVTX1YUR7/px/vyfRMs0qEuirs+IND TKceEY0MpikO0KDnZ05cGsRoNOk1cTfCXjjL5/AZNmWDzdZBxz/ZNcNt/EiBxjzlxMMv z785gMAbdAr7us7gTargIVRW/BrSAx4o4N8Yq1kGcemHdDJezit4vFMCkl0Q9GMxvcV4 OYP5Jy7qIRFJCwQt0asWrg7s/jf0GNsUDzTBc7MWgY2BBRj2xuvyHPErPqYFNFCL6zl4 QtTw== X-Forwarded-Encrypted: i=1; AJvYcCUzsoCyDrNImIACMZjO1k0c29MO+gDS/bUTsi7ItgZ9JJJvpNfzBZZuCy4rWIQojSaxepmgOZmYug==@kvack.org X-Gm-Message-State: AOJu0YyIIwQsK3hE2PbgfeJ1cgSNjTC8Zeje66Y8p2gB5qVqC0r9FH0m v2in8CpewHT8P1WxGxwweABiIaJyoS/u9hBpLgS74Klogxxv/MXrwoOfy5ybpYPhGQHYEANXRbM HpmYvogQopozHaWBMjE0/2e7khrg= X-Google-Smtp-Source: AGHT+IFd1mg+M9K1+toPlgwgdi4V28Fps9AKszqZUKFKsd6bMDxy3FFIoFwzWjoZIHM8wvWSUZnH9kGxjEd4Mh3lIYg= X-Received: by 2002:a05:6102:1610:b0:4a4:93d0:df13 with SMTP id ada2fe7eead31-4a8cfb8572amr26375078137.16.1730552397476; Sat, 02 Nov 2024 05:59:57 -0700 (PDT) MIME-Version: 1.0 References: <20241102101240.35072-1-21cnbao@gmail.com> <6c14ab2c-7917-489b-b51e-401d208067f3@gmail.com> In-Reply-To: <6c14ab2c-7917-489b-b51e-401d208067f3@gmail.com> From: Barry Song <21cnbao@gmail.com> Date: Sat, 2 Nov 2024 20:59:46 +0800 Message-ID: Subject: Re: [PATCH v2] mm: count zeromap read and set for swapout and swapin To: Usama Arif Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Yosry Ahmed , Nhat Pham , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 531E0180017 X-Stat-Signature: c3wtuumf5841p6yww9hkqi1st1gidbb8 X-Rspam-User: X-HE-Tag: 1730552369-459643 X-HE-Meta: U2FsdGVkX18toF4kdY5rGvaQLjMYZbr3tAyHzjG/loOTjsznm8hzUKrj1XAEri3PMjaqAzWg4LTX+mnQZuyEuxRg6s2nX5hgOgPxpXuFqxiIdlXRzDdeNqQmIJrLkM3uAJN28ZiqcV7QNEVqcFwh1TJFxbE9nyn7bZIAsUaW6N8mkXmVaL6wX6refcZ5xnwiGtx5enUpqxEvFRaahMelrDTePUd4dHZKFa8VKWdjjpQTifi38eT89NygW853Rb8kwrI1+r3gATQ4bEI5Win+VnUAvHrAkXD8SWLrmJnKPf3QA+DmnrB4opJkabZp1qDpoZ6bCNVVCQvw/i1YDezznlH8lqTdExNolptzONS8JylxwB7kFAe3YIvb24nN0I/uw/8ebAU2PrBSErBP7xw+jYlh28whBUK2V5VXPEaGFOp+JF8iDx77qCFBwLIl6iXI4fdnslxXoltSFYqsBeS35CVF4CpuTZkI/gd+BS0QPdOONUUt6XlE6ks7/aczOPKCkRFzxrSyTOC9zQ/mYhcXyYDHI4XoSL9hgsaBbNYY/DC2y7Zb00FAeY5XiHYY4g9ZQvB9+f3NBk+sBxTNMQXb75Npdlhyv6SCeTdKRGs6yn3BSb8eUdhcXfr0AMOTlQZQOwnp2phw71vvAmlQ1iIvQaBFWzL874XuvBG/esldR4yyc8G7Mo0OALED6zPZOaoEClbaF7PklsCpbQ6mvkWMaGIHX2f08BzwbeCxnXs+DPHqI3uUGHdDzq/wQs7Li4pxmRJF2uiN4rbRIZxzIc2Az3jcbQ3Qh8ksQxl1kfJ12xPPP94xKUldlDejuaOkwvoxzscPuqvNIA0dGQ9/6jVyfIGxx2dVD8VUowDJgYbjPYERReigDzJcHEasy4V5vlM2c7xH9iEb2Yvdj73UfCI0LSSWUF7D7+EvFmewXyB9joNMBZjAKPuuHjmVhjC32Fs5SclnFOdSxScg2tsiqU0 R1+M03Nd mwJGu3voF9vALIwUxFV7iRSX0ZUgVrBi8g6N93Zn/Xaz3Mjbt5wZEzl5VbQ3jXJrmGkyFqKnOw3Jv9Qk5fBzYdRiFxolbBDIE3EsKH7bwVJGSViokHcxyAWr1sLSHRgLZtNUQ8Dq1o6B6YmuCtxmpJXsQc7G5VmclZARFdqp7hkqPmzvMrMIrjeV49EJ8OE2Rkm130hFUEclhN6Wlg8I0nFRYCtBmmxh5vm14hTbQuK5M0ouBq0/WMPVexAsbBEEaMYTxR2yutMtGqnEnS79Y5GoiE1YwEabhthGBWOxn3KzgGdKiJLDUqawyZYPPWN0UO3d1iDYIuMNmXT5VL69Koq3x7yrLcH3s7mC4FcbBLOtxjbcZr8Op20Z9VH99bVD2p+g73km/4MvS86O9mpCei1cv2hL1IuWGXd/TLyWiArRMIs/1DZkswm6y2YJuLY5RvCtAeSdktKiXVGQd4NsfMd6yQbL963vhXMNZkjlHm/cIPsRfNHxdxkNAyDzBap94tjh4FBl3U9xR5xtO7hVy30i8R6ShuscpbMLH5re8iIBDCpzd9KP7Q+IhDIDa1DB/+Q3bvx2KK53ery/csphsTd0aDbDnfHxI29cUmrcOJ8/BbvVELf/uOf3O94SJD7DLE08+DhlBMDYeyBxaWzaynGzwEb/HkhsWYAB9nB56ECf6fAImEOCHK2lUTg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Nov 2, 2024 at 8:32=E2=80=AFPM Usama Arif = wrote: > > > > On 02/11/2024 10:12, Barry Song wrote: > > From: Barry Song > > > > When the proportion of folios from the zero map is small, missing their > > accounting may not significantly impact profiling. However, it=E2=80=99= s easy > > to construct a scenario where this becomes an issue=E2=80=94for example= , > > allocating 1 GB of memory, writing zeros from userspace, followed by > > MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out > > and swap-in counts seem to vanish into a black hole, potentially > > causing semantic ambiguity. > > > > We have two ways to address this: > > > > 1. Add a separate counter specifically for the zero map. > > 2. Continue using the current accounting, treating the zero map like > > a normal backend. (This aligns with the current behavior of zRAM > > when supporting same-page fills at the device level.) > > > > This patch adopts option 1 as pswpin/pswpout counters are that they > > only apply to IO done directly to the backend device (as noted by > > Nhat Pham). > > > > We can find these counters from /proc/vmstat (counters for the whole > > system) and memcg's memory.stat (counters for the interested memcg). > > > > For example: > > > > $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat > > swpin_zero 1648 > > swpout_zero 33536 > > > > $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.s= tat > > swpin_zero 3905 > > swpout_zero 3985 > > > > Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitma= p") > I don't think its a hotfix (or even a fix). It was discussed in the initi= al > series to add these as a follow up and Joshua was going to do this soon. > Its not fixing any bug in the initial series. I would prefer that all kernel versions with zeromap include this counter; otherwise, it could be confusing to determine where swap-in and swap-out have occurred= , as shown by the small program below: p =3Dmalloc(1g); write p to zero madvise_pageout read p; Previously, there was 1GB of swap-in and swap-out activity reported, but now nothing is shown. I don't mean to suggest that there's a bug in the zeromap code; rather, having this counter would help clear up any confusion. I didn't realize Joshua was handling it. Is he still planning to? If so, I can leave it with Joshua if that was the plan :-) > > > Cc: Usama Arif > > Cc: Chengming Zhou > > Cc: Yosry Ahmed > > Cc: Nhat Pham > > Cc: Johannes Weiner > > Cc: David Hildenbrand > > Cc: Hugh Dickins > > Cc: Matthew Wilcox (Oracle) > > Cc: Shakeel Butt > > Cc: Andi Kleen > > Cc: Baolin Wang > > Cc: Chris Li > > Cc: "Huang, Ying" > > Cc: Kairui Song > > Cc: Ryan Roberts > > Signed-off-by: Barry Song > > --- > > -v2: > > * add separate counters rather than using pswpin/out; thanks > > for the comments from Usama, David, Yosry and Nhat; > > * Usama also suggested a new counter like swapped_zero, I > > prefer that one be separated as an enhancement patch not > > a hotfix. will probably handle it later on. > > > I dont think either of them would be a hotfix. As mentioned above, this isn't about fixing a bug; it's simply to ensure that swap-related metrics don't disappear. > > > Documentation/admin-guide/cgroup-v2.rst | 10 ++++++++++ > > include/linux/vm_event_item.h | 2 ++ > > mm/memcontrol.c | 4 ++++ > > mm/page_io.c | 16 ++++++++++++++++ > > mm/vmstat.c | 2 ++ > > 5 files changed, 34 insertions(+) > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/ad= min-guide/cgroup-v2.rst > > index db3799f1483e..984eb3c9d05b 100644 > > --- a/Documentation/admin-guide/cgroup-v2.rst > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > @@ -1599,6 +1599,16 @@ The following nested keys are defined. > > pglazyfreed (npn) > > Amount of reclaimed lazyfree pages > > > > + swpin_zero > > + Number of pages moved into memory with zero content, mean= ing no > > + copy exists in the backend swapfile, allowing swap-in to = avoid > > + I/O read overhead. > > + > > + swpout_zero > > + Number of pages moved out of memory with zero content, me= aning no > > + copy is needed in the backend swapfile, allowing swap-out= to avoid > > + I/O write overhead. > > + > > Maybe zero-filled pages might be a better term in both. Do you mean dropping "with zero content" and replacing it by Number of zero-filled pages moved out of memory ? I'm fine with the change. > > > zswpin > > Number of pages moved in to memory from zswap. > > > > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_ite= m.h > > index aed952d04132..f70d0958095c 100644 > > --- a/include/linux/vm_event_item.h > > +++ b/include/linux/vm_event_item.h > > @@ -134,6 +134,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPO= UT, > > #ifdef CONFIG_SWAP > > SWAP_RA, > > SWAP_RA_HIT, > > + SWPIN_ZERO, > > + SWPOUT_ZERO, > > #ifdef CONFIG_KSM > > KSM_SWPIN_COPY, > > #endif > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index 5e44d6e7591e..7b3503d12aaf 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -441,6 +441,10 @@ static const unsigned int memcg_vm_event_stat[] = =3D { > > PGDEACTIVATE, > > PGLAZYFREE, > > PGLAZYFREED, > > +#ifdef CONFIG_SWAP > > + SWPIN_ZERO, > > + SWPOUT_ZERO, > > +#endif > > #ifdef CONFIG_ZSWAP > > ZSWPIN, > > ZSWPOUT, > > diff --git a/mm/page_io.c b/mm/page_io.c > > index 5d9b6e6cf96c..4b4ea8e49cf6 100644 > > --- a/mm/page_io.c > > +++ b/mm/page_io.c > > @@ -204,7 +204,9 @@ static bool is_folio_zero_filled(struct folio *foli= o) > > > > static void swap_zeromap_folio_set(struct folio *folio) > > { > > + struct obj_cgroup *objcg =3D get_obj_cgroup_from_folio(folio); > > struct swap_info_struct *sis =3D swp_swap_info(folio->swap); > > + int nr_pages =3D folio_nr_pages(folio); > > swp_entry_t entry; > > unsigned int i; > > > > @@ -212,6 +214,12 @@ static void swap_zeromap_folio_set(struct folio *f= olio) > > entry =3D page_swap_entry(folio_page(folio, i)); > > set_bit(swp_offset(entry), sis->zeromap); > > } > > + > > + count_vm_events(SWPOUT_ZERO, nr_pages); > > + if (objcg) { > > + count_objcg_events(objcg, SWPOUT_ZERO, nr_pages); > > + obj_cgroup_put(objcg); > > + } > > } > > > > static void swap_zeromap_folio_clear(struct folio *folio) > > @@ -507,6 +515,7 @@ static void sio_read_complete(struct kiocb *iocb, l= ong ret) > > static bool swap_read_folio_zeromap(struct folio *folio) > > { > > int nr_pages =3D folio_nr_pages(folio); > > + struct obj_cgroup *objcg; > > bool is_zeromap; > > > > /* > > @@ -521,6 +530,13 @@ static bool swap_read_folio_zeromap(struct folio *= folio) > > if (!is_zeromap) > > return false; > > > > + objcg =3D get_obj_cgroup_from_folio(folio); > > + count_vm_events(SWPIN_ZERO, nr_pages); > > + if (objcg) { > > + count_objcg_events(objcg, SWPIN_ZERO, nr_pages); > > + obj_cgroup_put(objcg); > > + } > > + > > folio_zero_range(folio, 0, folio_size(folio)); > > folio_mark_uptodate(folio); > > return true; > > diff --git a/mm/vmstat.c b/mm/vmstat.c > > index 22a294556b58..c8ef7352f9ed 100644 > > --- a/mm/vmstat.c > > +++ b/mm/vmstat.c > > @@ -1418,6 +1418,8 @@ const char * const vmstat_text[] =3D { > > #ifdef CONFIG_SWAP > > "swap_ra", > > "swap_ra_hit", > > + "swpin_zero", > > + "swpout_zero", > > #ifdef CONFIG_KSM > > "ksm_swpin_copy", > > #endif > Thanks Barry