From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BA2ED5B156 for ; Mon, 28 Oct 2024 21:24:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B097E6B00AB; Mon, 28 Oct 2024 17:24:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC35B6B00AC; Mon, 28 Oct 2024 17:24:42 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 959376B00AD; Mon, 28 Oct 2024 17:24:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7751D6B00AB for ; Mon, 28 Oct 2024 17:24:42 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id EDAA2AC88A for ; Mon, 28 Oct 2024 21:24:41 +0000 (UTC) X-FDA: 82724290152.29.0DFB169 Received: from mail-wm1-f47.google.com (mail-wm1-f47.google.com [209.85.128.47]) by imf22.hostedemail.com (Postfix) with ESMTP id 9126FC000E for ; Mon, 28 Oct 2024 21:24:10 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dfNur8lU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730150625; a=rsa-sha256; cv=none; b=tPCX7uax65104CsH29WugBD7oYjhecS1tQQtwZUpPaEmeyYXcFeVmPNJsJ414QHhmAnI4t ImQyl3GCKqr3UpsX9DjMTVtrD8lIoUoJvqU7HIDUqHBnkSyfVVgBL0lmzpzyV3UXzdV5NE tR478BTR0dL/cSm4ykiIS07LQ4ddvWk= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=dfNur8lU; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.128.47 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730150625; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Oed+vVceGHLTzayqS+WrKDwhcbGWW3zACgltsulU49Q=; b=dddNriVmKGYP28roHYClMu8HzVybTE23AsZXI5Hzue8kR4JzWOX0qqQxeiaZHfN+4tfY8t 1qx/zmUJJ3EwALNV5PEpIcH2+8CYAWDeTFYvE7m3BI5iFnLOpIstklfEPdWlwdvQSRSSg5 ZmeDyUHN2ay62n4ohUbjBYJOP/h0vxM= Received: by mail-wm1-f47.google.com with SMTP id 5b1f17b1804b1-4315f24a6bbso46625505e9.1 for ; Mon, 28 Oct 2024 14:24:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730150678; x=1730755478; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=Oed+vVceGHLTzayqS+WrKDwhcbGWW3zACgltsulU49Q=; b=dfNur8lUk4otvPxdIr8cK2S3vnMR7LoVgjQp07UnL034hBykQnUcdosrpRq4b6g6d+ nZwyCo6iMSxbem8g1U3ONA/ml+TeWvJ550goLJ6RuKrzrEKTWmGgt1BgH0OU+JwN6Ii5 0H/+8sKs5NyMwMmvvby/fP9HIXotzFh2avtEKEVeNbjef2rPel0L4kK8jlvdLuHE00ck v964VjE/dz+Oo51aqZql5wamSLzXbS8L9Isn6TR2ma/1bOY2quHicxf15jrcGtQhd3Bk 8qgCzr6tVcrPnBqLbJwO5SXPmW2NqDauIvrAvzNYIho6onsi2dJlaM/P9evcFILGttZT bxhA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730150678; x=1730755478; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Oed+vVceGHLTzayqS+WrKDwhcbGWW3zACgltsulU49Q=; b=MY3zff+AEKG600/WTD5riwJFgm/EmxEb0SN783Git1PdbtQnnx1IDSTdsOsJqsOxaa VTg5ktXQN8h8iS5bX041wVn61ELjOc4D83WOjJrbNZO02FFAQNVXQICfHNL3OKvZww9v cmrPtzni7M281BhY/V9bU5jujJhLWrsUpf/Ls9QeuNXFJQd7UwrZW+qGLMbV9CghXhMY n6jp0b/u2QAh7qyjY8y+Bzm4Yrb9mzAxddk/XVpg8dWU74X4hwgrLxtYI1GlfgQ/MAvK lQYypXMjC0gZQYnqAyY0dnblqwx0EznCxoHqs+t23jzKv5DZYYemYAh72xcfQFEjl99L lEGg== X-Forwarded-Encrypted: i=1; AJvYcCWzVV0E/hzQjhy/MD6sWCv6DtcZPCmRie5XYSo+KKoDoHwv21YzWHknPUgucLYdEtfZm6Y7gJuozw==@kvack.org X-Gm-Message-State: AOJu0YzBQe1OfD92tPxglN3mFxJ0wmTvud8Twgpxuw33RP150zyM0eiH ebt59Xu6XAYC0zZhIYLD8mMWGYP7Zt+v+G1l7SZ4Vt4hzUAootG5 X-Google-Smtp-Source: AGHT+IHm9bK89VA/vyBQ6S6INiWqxPY/CUvqv1nLEHdrOQY2ODnDT1xf4fdBzl+QueUySBvoJNJXOQ== X-Received: by 2002:a05:600c:1d01:b0:42c:a6da:a149 with SMTP id 5b1f17b1804b1-4319ad048cdmr85905405e9.25.1730150678285; Mon, 28 Oct 2024 14:24:38 -0700 (PDT) Received: from ?IPV6:2a02:6b67:d751:7400:c2b:f323:d172:e42a? ([2a02:6b67:d751:7400:c2b:f323:d172:e42a]) by smtp.gmail.com with ESMTPSA id 5b1f17b1804b1-431935f74f9sm124399075e9.31.2024.10.28.14.24.37 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 28 Oct 2024 14:24:38 -0700 (PDT) Message-ID: <228c428d-d116-4be1-9d0d-0591667b7ccb@gmail.com> Date: Mon, 28 Oct 2024 21:24:37 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin To: Barry Song <21cnbao@gmail.com> Cc: Yosry Ahmed , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , joshua.hahnjy@gmail.com References: <20241027011959.9226-1-21cnbao@gmail.com> <678a1e30-4962-48de-b5cb-03a1b4b9db1b@gmail.com> <6303e3c9-85d5-40f5-b265-70ecdb02d5ba@gmail.com> <64f12abd-dde3-41a4-b694-cc42784217fb@gmail.com> <882008b6-13e0-41d8-91fa-f26c585120d8@gmail.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: 9126FC000E X-Rspamd-Server: rspam01 X-Stat-Signature: nz9dnxqa6aku8u44bkt5utchbs86em81 X-HE-Tag: 1730150650-738211 X-HE-Meta: U2FsdGVkX1/JizDzaPUwk++EdzB7rz099ViQLMDS+jSkM/6aPQjwtiK+9vX7VfgbMR5fPoi+32hCyXCKN98vGucOMd2bd1P0nGghF0Iy++XisTNHVMvu96TU26x3StCrZpvOChtwLhGOpqAu8h7omBS6hdPRQvXQ2q9/lysJTwCsiGBG1t0YehfcwmSWRZnrQCIFYv9NMWnm4Snh0++eC8xIvOZSmU9DlGiPCkOnNm34IUAOoBAT9h9b27rtxVQ8ndd2BBkTQWJ6ISEpnRdn4+rKb5cvO73QtrZj4PfUaGt2Q55KFPeU/tvQJmFQv6QykaRDdjcR1anLRZ0J65hDOrxDKOZXQRa8ghcrgpMyUMOuAEwxhSdrwSiXgRr0GpsBxhRcOmgYTAR/vMvycDyIYYk/8Z+ZZcPYAQ8RW7nFa9e0SiTLohSOKBUCDb/xzrisgBKlH+5UmcxgoOTFDrl6vDWEZWOCeFoILkn2JQPo3VZbTAl17cG3fxwQ2qANMelDsIoi1zvR3JglHcmKRbSyo0Ee6v5NXFRIqqurvolc+3bh6qeXjnh8gX8JG6i+1JFhANCJvJ8lagAnxVgY9coIZfUvh7A/Bgq4yeTXAVJdOtKzKpaB1XcL69ti8clmgFOcBFYPspu8kXysbuvmIwnngLWYqq+Rrnz5CVVKleZujN2BPULonaRS14olX1jrod6+l8fe6vCau/O58dnlEYw5dDXgiCnH51hhjP1WGq78+jwDmavxufgb15NPEh8ELYazqvaPaZbYU6lROwgbb1VBeApgQpzl9L3jjUO2xirSnqFZtg20eq59+8heIw0q2wRzLJiQAeQE3F+JvxuKTeofhT6D4wpERv3XFIldpo+ScC4s5ype89z8qxDd0qTqdgjRRimy1AeLmXM2ySJaPq8iVgugRwNQkByiBxA5TZLAJksjg22yhCYJH7kw0qeKXw87udkRtGO1LK59AV433kU dZMIbEjY Kjc+F2NUC0gvvFMMOmpLE0vkTMmp9KBmy2P5fR8YC+GAJD8TafPeCmXpun0uSUMTboL9BVU8W5ldKFc50Gq8dgK9Npv94SKOkMtqeA+DMUTsKPZCnGWLmmIi5EwZUFWXzfiF8O0xb+3GDoAXazmBCtN2SQLssufeihkSTG+aId34cbApiS+aR3vnv/60N7oiJ11uhi84vKsPOJrNMP1akQe/rle3b2rAA7L8IelHEJX0eZWRQC4OIcfrSsN7YHVflhR/AB5OyxYDJLWUmdeDEhicSpbd7JA6EwLkn2++PxaWOM+XqffT/4+qZhap4kOl+D8+ql8/vRH+GeTfFuTp4wVoW0wArQtOKdLMpRsH4k1ly0gnjzm/FOnTYUdAlIkA1FZ/V62cE3wzHgMVHsV5O0ftdSk0rfSPHg2H+pOejmfMzsf0DkC9h/bMb2N4PKbYw7KJaDl9dluVvRk8sLpRZJcj3STKJYWhr/kt0cgYlkMf00G0xoL9u/pFtLT9Prs9J+B5VEN0+QkyJkkjvKOTfLbc1ox+1l7Dp+aC14ZHCyEiUH7YX7IVxFUSLjA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 28/10/2024 21:15, Barry Song wrote: > On Tue, Oct 29, 2024 at 4:51 AM Usama Arif wrote: >> >> >> >> On 28/10/2024 20:42, Barry Song wrote: >>> On Tue, Oct 29, 2024 at 4:00 AM Usama Arif wrote: >>>> >>>> >>>> >>>> On 28/10/2024 19:54, Barry Song wrote: >>>>> On Tue, Oct 29, 2024 at 1:20 AM Usama Arif wrote: >>>>>> >>>>>> >>>>>> >>>>>> On 28/10/2024 17:08, Yosry Ahmed wrote: >>>>>>> On Mon, Oct 28, 2024 at 10:00 AM Usama Arif wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On 28/10/2024 16:33, Nhat Pham wrote: >>>>>>>>> On Mon, Oct 28, 2024 at 5:23 AM Usama Arif wrote: >>>>>>>>>> >>>>>>>>>> I wonder if instead of having counters, it might be better to keep track >>>>>>>>>> of the number of zeropages currently stored in zeromap, similar to how >>>>>>>>>> zswap_same_filled_pages did it. It will be more complicated then this >>>>>>>>>> patch, but would give more insight of the current state of the system. >>>>>>>>>> >>>>>>>>>> Joshua (in CC) was going to have a look at that. >>>>>>>>> >>>>>>>>> I don't think one can substitute for the other. >>>>>>>> >>>>>>>> Yes agreed, they have separate uses and provide different information, but >>>>>>>> maybe wasteful to have both types of counters? They are counters so maybe >>>>>>>> dont consume too much resources but I think we should still think about >>>>>>>> it.. >>>>>>> >>>>>>> Not for or against here, but I would say that statement is debatable >>>>>>> at best for memcg stats :) >>>>>>> >>>>>>> Each new counter consumes 2 longs per-memcg per-CPU (see >>>>>>> memcg_vmstats_percpu), about 16 bytes, which is not a lot but it can >>>>>>> quickly add up with a large number of CPUs/memcgs/stats. >>>>>>> >>>>>>> Also, when flushing the stats we iterate all of them to propagate >>>>>>> updates from per-CPU counters. This is already a slowpath so adding >>>>>>> one stat is not a big deal, but again because we iterate all stats on >>>>>>> multiple CPUs (and sometimes on each node as well), the overall flush >>>>>>> latency becomes a concern sometimes. >>>>>>> >>>>>>> All of that is not to say we shouldn't add more memcg stats, but we >>>>>>> have to be mindful of the resources. >>>>>> >>>>>> Yes agreed! Plus the cost of incrementing similar counters (which ofcourse is >>>>>> also not much). >>>>>> >>>>>> Not trying to block this patch in anyway. Just think its a good point >>>>>> to discuss here if we are ok with both types of counters. If its too wasteful >>>>>> then which one we should have. >>>>> >>>>> Hi Usama, >>>>> my point is that with all the below three counters: >>>>> 1. PSWPIN/PSWPOUT >>>>> 2. ZSWPIN/ZSWPOUT >>>>> 3. SWAPIN_SKIP/SWAPOUT_SKIP or (ZEROSWPIN, ZEROSWPOUT what ever) >>>>> >>>>> Shouldn't we have been able to determine the portion of zeromap >>>>> swap indirectly? >>>>> >>>> >>>> Hmm, I might be wrong, but I would have thought no? >>>> >>>> What if you swapout a zero folio, but then discard it? >>>> zeromap_swpout would be incremented, but zeromap_swapin would not. >>> >>> I understand. It looks like we have two issues to tackle: >>> 1. We shouldn't let zeromap swap in or out anything that vanishes into >>> a black hole >>> 2. We want to find out how much I/O/memory has been saved due to zeromap so far >>> >>> From my perspective, issue 1 requires a "fix", while issue 2 is more >>> of an optimization. >> >> Hmm I dont understand why point 1 would be an issue. >> >> If its discarded thats fine as far as I can see. > > it is fine to you and probably me who knows zeromap as well :-) but > any userspace code > as below might be entirely confused: > > p = malloc(1G); > write p to 0; or write part of p to 0 > madv_pageout(p, 1g) > read p to swapin. > > The entire procedure used to involve 1GB of swap out and 1GB of swap in by any > means. Now, it has recorded 0 swaps counted. > > I don't expect userspace is as smart as you :-) > Ah I completely agree, we need to account for it in some metric. I probably misunderstood when you said "We shouldn't let zeromap swap in or out anything that vanishes into a black hole", by we should not have the zeromap optimization for those cases. What I guess you meant is we need to account for it in some metric. >> >> As a reference, memory.stat.zswapped != memory.stat.zswapout - memory.stat.zswapin. >> Because zswapped would take into account swapped out anon memory freed, MADV_FREE, >> shmem truncate, etc as Yosry said about zeromap, But zswapout and zswapin dont. > > I understand. However, I believe what we really need to focus on is > this: if we’ve > swapped out, for instance, 100GB in the past hour, how much of that 100GB is > zero? This information can help us assess the proportion of zero data in the > workload, along with the potential benefits that zeromap can provide for memory, > I/O space, or read/write operations. Additionally, having the second count > can enhance accuracy when considering MADV_DONTNEED, FREE, TRUNCATE, > and so on. > Yes completely agree! I think we can look into adding all three metrics, zeromap_swapped, zeromap_swpout, zeromap_swpin (or whatever name works). >> >> >>> >>> I consider issue 1 to be more critical because, after observing a phone >>> running for some time, I've been able to roughly estimate the portion >>> zeromap can >>> help save using only PSWPOUT, ZSWPOUT, and SWAPOUT_SKIP, even without a >>> SWPIN counter. However, I agree that issue 2 still holds significant value >>> as a separate patch. >>> > > Thanks > Barry