From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7669CE677FF for ; Sat, 2 Nov 2024 14:43:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A36286B007B; Sat, 2 Nov 2024 10:43:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9E5AD6B0082; Sat, 2 Nov 2024 10:43:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AD686B008A; Sat, 2 Nov 2024 10:43:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 673E46B007B for ; Sat, 2 Nov 2024 10:43:13 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id CDBD6ADABA for ; Sat, 2 Nov 2024 14:43:12 +0000 (UTC) X-FDA: 82741422372.29.E3D85B3 Received: from mail-wr1-f46.google.com (mail-wr1-f46.google.com [209.85.221.46]) by imf07.hostedemail.com (Postfix) with ESMTP id 876D44001B for ; Sat, 2 Nov 2024 14:42:29 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ILndRPmH; spf=pass (imf07.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730558544; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6461Ib2TWfT+leWWAOGRQP9/S/9eoNSD9PKEf8t7n54=; b=5wh1cj7T9oUT+uq9SJauVNyikfzo8ofZq2kqHp8mzAaOD7LRxUY7HPrbFPeLVudouRyKM9 41kXt/kB0VtkGYi8yY+1ImZlQUOFQt5+4ev0PFMF8CHQF55rug+wu1ygDDv/6fEjUcg8ki nOJLQUtZkE1kdq2UDUzRRReaKXSIbRs= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730558544; a=rsa-sha256; cv=none; b=d0aTticN+AeXVUEyJ7UzRB5FmOkEOeRyhoiLMy7hlySu4AS+7gc99aBZuNLUxj6Atg+VKe mN0my273iKfblhSn7JQi7PHweIte8YFCDhdt8ItU6hkvuVfGXqNS9UjbBAuQm35LSdEX2m PwmtSSL0IfokKx7s+Wpq2mqjhdl/xvE= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ILndRPmH; spf=pass (imf07.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.221.46 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wr1-f46.google.com with SMTP id ffacd0b85a97d-37d808ae924so1576291f8f.0 for ; Sat, 02 Nov 2024 07:43:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730558589; x=1731163389; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=6461Ib2TWfT+leWWAOGRQP9/S/9eoNSD9PKEf8t7n54=; b=ILndRPmHqSa6D270Vu5XBYyiE48nlYaFQiNL8wBq2Yn0iY5xO1Hl/5uwWY6W8oHTds kQsz/3mSA/x6IzWh+A6R32GmfqSz0RRWUnPFF4p4/ntD76nKVN63j9sEPVpbsq4Tqz1C jtM68hb2qE9Bwg2rbOzsl7qLXQQp4Kw+Dv/yLmOHHJSsULA34oPZZ5b7qfVNKq84v9td q1kb0t4fgH6rsq2iqpscq/XLTZMIFSBhMesuiLQqmGGsNarG1+73JIgpzYnJuzcqiLQq 9jW0x7NoDg5h1SLa+HANg68FSAUn3/sA4yUTE971b0sZ4xAfW7Xvw52BVgbgKLvK/dgX Gm+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730558589; x=1731163389; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=6461Ib2TWfT+leWWAOGRQP9/S/9eoNSD9PKEf8t7n54=; b=nIgP2BpRI5zkzD4czA7/o9Mq6Fz3Mat0SLrGY9FvU/eEWmr7FKSSsReybBxSAp2jBm g6mUrBRAE82z2N8hYOxVh9RBeUa2ht//FnoL9ICygj4qCdVYkrhAhmmEbeLapHZGON+B h/pf6L9sduR3n7fhyi+vS5Kys9yQWpb4oT+CnYm7IsGab3kMnGZ02dGwot31bCHWE9kJ ekjX6Z0R7HmuGiXWg2vHUFEEo0z9+DiTr2MGWmXlRuMnifuZO9BXg00epixFrQ3yk4AS /Hw10Sd29MtK7Id+rQC7fTsBbolr03Z2O1TnC1btoRlkIQ6r6PQ6e8NFmTmAFUGxY1Zg k8vA== X-Forwarded-Encrypted: i=1; AJvYcCXTTnFKGHmT3VGtVLfYK9PkMT+tBWjtUyEN7tpxSH7NxeRxMtySAC8cA0fEdDMpVUdvn3VLbSngeA==@kvack.org X-Gm-Message-State: AOJu0YzjzMugIhanYTu/QVG5mDnVvR6CATGplC1HMgCBss0kQz7lQklQ azrKjYKob32S4Od7E4qCPmF211VdfjqZ39Ch1hLL9b+uYAmfu5lt X-Google-Smtp-Source: AGHT+IH8B0EHxpxEjoM8PL5Z/sinx5Pjyum2EgXWirFSZJ0YUzDAuxrrytOacUv/gAFacY2wcwe6Bg== X-Received: by 2002:adf:e84f:0:b0:37d:4956:b0c2 with SMTP id ffacd0b85a97d-3806122f97emr18171728f8f.58.1730558588919; Sat, 02 Nov 2024 07:43:08 -0700 (PDT) Received: from ?IPV6:2a02:6b67:d751:7400:c2b:f323:d172:e42a? ([2a02:6b67:d751:7400:c2b:f323:d172:e42a]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-431bd947c03sm126345435e9.28.2024.11.02.07.43.07 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sat, 02 Nov 2024 07:43:08 -0700 (PDT) Message-ID: <2d73b4cc-47a1-44a2-b50a-0f67d25b3e22@gmail.com> Date: Sat, 2 Nov 2024 14:43:07 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2] mm: count zeromap read and set for swapout and swapin To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Yosry Ahmed , Nhat Pham , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts References: <20241102101240.35072-1-21cnbao@gmail.com> <6c14ab2c-7917-489b-b51e-401d208067f3@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: gdgwp3ewyjg1yo4g3ifa5osirz8n75dd X-Rspam-User: X-Rspamd-Queue-Id: 876D44001B X-Rspamd-Server: rspam02 X-HE-Tag: 1730558549-546406 X-HE-Meta: U2FsdGVkX195W5STOLseWbmk2SrJ51sghe/0xnZOeEtbowzRN/AHHP5gyL2jUyo16up8nzVo/tt65K3EDdYU56VEim9NpLAhyJgqLolIdaqh0/U2a2RaumkOL/UUnJnbTQfKnZyVebA2UAxaLR4e4d340zlzlTN+PmMbjue2nfz3DQIaa1zcvINFo9UdpoE6LY3IE9MtJdD5AzOEQXhcfJNYXX9oUHsQmdIgdIZvva/cz/YDa3D31AWHM8nZt8IVzOEvHTZBefqiiqlBUhfB9DXFPrVJ7v/Bp7lKVJ+KratLKoDEmOsD9NSMXc+WRFLtnl686Wm1Vd4mqZqwZs3xVfVGvOtlSBRX00Ay2xE4gEINTJXHbk9P4UnZTzrJxsjrZ/ppC5ayUetBwH2RYhlxkd1ymHpS7ftSS13gx4wb+NPziMtWhBYoswjpbKAOqoJEftvAkfOB+9hsquFNFle6jCy+o7A/ZLqnF8jn0HYqa6LMEjbua7KiW5V/FN0j4B/PDeF0DTdL/3g5pE8YIXplgs5/IUNDYEyX8mN66G55gd89H6oAi3+Pxj6/YSOWTUwI3PqGtA9ZgN/sJ4PR0TCMfwGlBEbemXoVH63epMo/myh929YVJQKtXCp3lrGwDw+6CXYUr2OfwXSrzOuxSXIdDNx8dFuULPslLb3/Ul5+W7XNQrm3J4lKyAltJ/Tcm694X4RwacoDNW8kOvyEmKHSGp9P/VREeA4fKGiMOuaP2zzTo2QnSbKE7tnw0xQetFu/CD7eHkVUvQiYBOB65y7Wp+oDKlXHtxEzQj57vlberksJWa4mxR0+dJn30mUCGC7P7FWkV0RSA0/Yghg5XUl4iy04jloJv+JbbQxPYBk4w1syhwkPgJkbNFOiqHUVhFPhKfE+yQ1AibRvexWA3BvFJJPCHI9fdnWu0gbSZULuQTSBnxcU3RgH+LynFgSdsOe9D7qaDydAgniGVtqJf4e e1yOzA5w cRDqL7O8CTMnzM0Qa2sADfFzqHiSj975Fv5ReFW66zQkhbUCt4Ju8WBV7mLVhNrzEyns8oQSywSk2/Ye2601CPJiIldjjhfm38uslzMLTeEbExMvQmfMvSj3WpKEbN4XoLP+ketuQxRNSxXD+Ks+3m+nwGUKsN4DEJ//rHd3fMz4ZA8ar9/AVqOv7vLTsp93Dz0ElEAIoeCU8VVxhxA0N/tUNfO9VfdZojpi6fbFNz7dztlDKJ478AAS1OtqqyeYmZADxk5udtQ8ty3PqR2ACH65EEmoIYTmgrNJZ213ZxooBf0dm7fEcz0Vsl8hvtaZH75M18eQBAl1MRrfpn9Q4PHJDCnyD31+sw/5ktKui9fcVprEF9pMot2kfSPKjt5kuLMSC54sFlO4Dm0f9Yuf1tan+F0755Ed95jVCgzpAyJF+9ipUraMwWxIh0oWYlEgYxuWxckm3AB1hOlor9UHNcMy3ynCtrMD61E2qW1dAR7LNW0jmoJeFS9p1ce5HO83EFDhl+ieyqGXLk5TjmFUyVCgNtKk1lSeXRsii8xRG55gP5heKxwbCyxJ+JnDz6sgg/sXZA+D83WyZxnaITNyQL83UwPd5xftkmR5lxh5j//82MxiZhEUIgKwmmxhyjZ72wuIRlNQWCHT5XnTDyvgPggKlp8QkjE/GZm2JxBluWI4IDVfVz/kRRNqURkzhm/nssa4rihk4+Glm/EEBLjOkzuW9OudgLKPHTz7NWpz+1rGRDLjbdSVySHdRa/5nd1C3l7umJEY7hNgKZGlJCCC8N8znEow0pJ1DaEoB2EFZnNYHZgXdYnl7B4QWQZTrkX5vIFXg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 02/11/2024 12:59, Barry Song wrote: > On Sat, Nov 2, 2024 at 8:32 PM Usama Arif wrote: >> >> >> >> On 02/11/2024 10:12, Barry Song wrote: >>> From: Barry Song >>> >>> When the proportion of folios from the zero map is small, missing their >>> accounting may not significantly impact profiling. However, it’s easy >>> to construct a scenario where this becomes an issue—for example, >>> allocating 1 GB of memory, writing zeros from userspace, followed by >>> MADV_PAGEOUT, and then swapping it back in. In this case, the swap-out >>> and swap-in counts seem to vanish into a black hole, potentially >>> causing semantic ambiguity. >>> >>> We have two ways to address this: >>> >>> 1. Add a separate counter specifically for the zero map. >>> 2. Continue using the current accounting, treating the zero map like >>> a normal backend. (This aligns with the current behavior of zRAM >>> when supporting same-page fills at the device level.) >>> >>> This patch adopts option 1 as pswpin/pswpout counters are that they >>> only apply to IO done directly to the backend device (as noted by >>> Nhat Pham). >>> >>> We can find these counters from /proc/vmstat (counters for the whole >>> system) and memcg's memory.stat (counters for the interested memcg). >>> >>> For example: >>> >>> $ grep -E 'swpin_zero|swpout_zero' /proc/vmstat >>> swpin_zero 1648 >>> swpout_zero 33536 >>> >>> $ grep -E 'swpin_zero|swpout_zero' /sys/fs/cgroup/system.slice/memory.stat >>> swpin_zero 3905 >>> swpout_zero 3985 >>> >>> Fixes: 0ca0c24e3211 ("mm: store zero pages to be swapped out in a bitmap") >> I don't think its a hotfix (or even a fix). It was discussed in the initial >> series to add these as a follow up and Joshua was going to do this soon. >> Its not fixing any bug in the initial series. > > I would prefer that all kernel versions with zeromap include this > counter; otherwise, > it could be confusing to determine where swap-in and swap-out have occurred, > as shown by the small program below: > > p =malloc(1g); > write p to zero > madvise_pageout > read p; > > Previously, there was 1GB of swap-in and swap-out activity reported, but > now nothing is shown. > > I don't mean to suggest that there's a bug in the zeromap code; rather, > having this counter would help clear up any confusion. > > I didn't realize Joshua was handling it. Is he still planning to? If > so, I can leave it > with Joshua if that was the plan :-) > Please do continue with this patch, I think he was going to look at the swapped_zero version that we discussed earlier anyways. Will let Joshua comment on it. >> >>> Cc: Usama Arif >>> Cc: Chengming Zhou >>> Cc: Yosry Ahmed >>> Cc: Nhat Pham >>> Cc: Johannes Weiner >>> Cc: David Hildenbrand >>> Cc: Hugh Dickins >>> Cc: Matthew Wilcox (Oracle) >>> Cc: Shakeel Butt >>> Cc: Andi Kleen >>> Cc: Baolin Wang >>> Cc: Chris Li >>> Cc: "Huang, Ying" >>> Cc: Kairui Song >>> Cc: Ryan Roberts >>> Signed-off-by: Barry Song >>> --- >>> -v2: >>> * add separate counters rather than using pswpin/out; thanks >>> for the comments from Usama, David, Yosry and Nhat; >>> * Usama also suggested a new counter like swapped_zero, I >>> prefer that one be separated as an enhancement patch not >>> a hotfix. will probably handle it later on. >>> >> I dont think either of them would be a hotfix. > > As mentioned above, this isn't about fixing a bug; it's simply to ensure > that swap-related metrics don't disappear. > >> >>> Documentation/admin-guide/cgroup-v2.rst | 10 ++++++++++ >>> include/linux/vm_event_item.h | 2 ++ >>> mm/memcontrol.c | 4 ++++ >>> mm/page_io.c | 16 ++++++++++++++++ >>> mm/vmstat.c | 2 ++ >>> 5 files changed, 34 insertions(+) >>> >>> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst >>> index db3799f1483e..984eb3c9d05b 100644 >>> --- a/Documentation/admin-guide/cgroup-v2.rst >>> +++ b/Documentation/admin-guide/cgroup-v2.rst >>> @@ -1599,6 +1599,16 @@ The following nested keys are defined. >>> pglazyfreed (npn) >>> Amount of reclaimed lazyfree pages >>> >>> + swpin_zero >>> + Number of pages moved into memory with zero content, meaning no >>> + copy exists in the backend swapfile, allowing swap-in to avoid >>> + I/O read overhead. >>> + >>> + swpout_zero >>> + Number of pages moved out of memory with zero content, meaning no >>> + copy is needed in the backend swapfile, allowing swap-out to avoid >>> + I/O write overhead. >>> + >> >> Maybe zero-filled pages might be a better term in both. > > Do you mean dropping "with zero content" and replacing it by > Number of zero-filled pages moved out of memory ? I'm fine > with the change. Yes, mainly because if you do swapout of memory that was memset 0 its still content, just zero-filled. Thanks, Usama > >> >>> zswpin >>> Number of pages moved in to memory from zswap. >>> >>> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h >>> index aed952d04132..f70d0958095c 100644 >>> --- a/include/linux/vm_event_item.h >>> +++ b/include/linux/vm_event_item.h >>> @@ -134,6 +134,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, >>> #ifdef CONFIG_SWAP >>> SWAP_RA, >>> SWAP_RA_HIT, >>> + SWPIN_ZERO, >>> + SWPOUT_ZERO, >>> #ifdef CONFIG_KSM >>> KSM_SWPIN_COPY, >>> #endif >>> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >>> index 5e44d6e7591e..7b3503d12aaf 100644 >>> --- a/mm/memcontrol.c >>> +++ b/mm/memcontrol.c >>> @@ -441,6 +441,10 @@ static const unsigned int memcg_vm_event_stat[] = { >>> PGDEACTIVATE, >>> PGLAZYFREE, >>> PGLAZYFREED, >>> +#ifdef CONFIG_SWAP >>> + SWPIN_ZERO, >>> + SWPOUT_ZERO, >>> +#endif >>> #ifdef CONFIG_ZSWAP >>> ZSWPIN, >>> ZSWPOUT, >>> diff --git a/mm/page_io.c b/mm/page_io.c >>> index 5d9b6e6cf96c..4b4ea8e49cf6 100644 >>> --- a/mm/page_io.c >>> +++ b/mm/page_io.c >>> @@ -204,7 +204,9 @@ static bool is_folio_zero_filled(struct folio *folio) >>> >>> static void swap_zeromap_folio_set(struct folio *folio) >>> { >>> + struct obj_cgroup *objcg = get_obj_cgroup_from_folio(folio); >>> struct swap_info_struct *sis = swp_swap_info(folio->swap); >>> + int nr_pages = folio_nr_pages(folio); >>> swp_entry_t entry; >>> unsigned int i; >>> >>> @@ -212,6 +214,12 @@ static void swap_zeromap_folio_set(struct folio *folio) >>> entry = page_swap_entry(folio_page(folio, i)); >>> set_bit(swp_offset(entry), sis->zeromap); >>> } >>> + >>> + count_vm_events(SWPOUT_ZERO, nr_pages); >>> + if (objcg) { >>> + count_objcg_events(objcg, SWPOUT_ZERO, nr_pages); >>> + obj_cgroup_put(objcg); >>> + } >>> } >>> >>> static void swap_zeromap_folio_clear(struct folio *folio) >>> @@ -507,6 +515,7 @@ static void sio_read_complete(struct kiocb *iocb, long ret) >>> static bool swap_read_folio_zeromap(struct folio *folio) >>> { >>> int nr_pages = folio_nr_pages(folio); >>> + struct obj_cgroup *objcg; >>> bool is_zeromap; >>> >>> /* >>> @@ -521,6 +530,13 @@ static bool swap_read_folio_zeromap(struct folio *folio) >>> if (!is_zeromap) >>> return false; >>> >>> + objcg = get_obj_cgroup_from_folio(folio); >>> + count_vm_events(SWPIN_ZERO, nr_pages); >>> + if (objcg) { >>> + count_objcg_events(objcg, SWPIN_ZERO, nr_pages); >>> + obj_cgroup_put(objcg); >>> + } >>> + >>> folio_zero_range(folio, 0, folio_size(folio)); >>> folio_mark_uptodate(folio); >>> return true; >>> diff --git a/mm/vmstat.c b/mm/vmstat.c >>> index 22a294556b58..c8ef7352f9ed 100644 >>> --- a/mm/vmstat.c >>> +++ b/mm/vmstat.c >>> @@ -1418,6 +1418,8 @@ const char * const vmstat_text[] = { >>> #ifdef CONFIG_SWAP >>> "swap_ra", >>> "swap_ra_hit", >>> + "swpin_zero", >>> + "swpout_zero", >>> #ifdef CONFIG_KSM >>> "ksm_swpin_copy", >>> #endif >> > > Thanks > Barry