From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9912AD5B154 for ; Mon, 28 Oct 2024 21:15:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2C9FA8D0002; Mon, 28 Oct 2024 17:15:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2797F6B00AC; Mon, 28 Oct 2024 17:15:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 168328D0002; Mon, 28 Oct 2024 17:15:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id EEBE46B00AB for ; Mon, 28 Oct 2024 17:15:50 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 86FE8A088F for ; Mon, 28 Oct 2024 21:15:50 +0000 (UTC) X-FDA: 82724267640.11.DB24521 Received: from mail-vs1-f52.google.com (mail-vs1-f52.google.com [209.85.217.52]) by imf24.hostedemail.com (Postfix) with ESMTP id 493FD18001B for ; Mon, 28 Oct 2024 21:15:45 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=byI6wVgz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730150094; a=rsa-sha256; cv=none; b=WyL/G6ii/SgKd3z5GE2p3ervZEALZmEiNXGzW0gsocWagYRqwgsPfAxUB7FeQrLV1BxPa1 8SyalPpZkqf0ZhQQZqo/DwPG53bRZzMYTUqUwKMtYFLGPs8oTHDTxmVqQq3d3ZeRD0XMAN QPqzM6t7cpur3DkD5GDmL7iQcsWwjRo= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=byI6wVgz; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf24.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.52 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730150094; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kq2tEiK0l3TgU9HmbeUL45Km2z3ZsaUVG9pn1JTmUs0=; b=E9GLjQjLU1Fy26FNZy238LnNkxMv2HrO83NnN0Zijb8e0U9PG/5QaN1/ERjEsDvq7NWI75 HhuD9sZ3cjCrYZ+Wo91wQtxKH6neOnBUHyGZ5B81gCwlOAunl/H7UVH/MIaDYlmT3ueMd7 FFC3bQwlh2lPZ4yHJhBB8nvBHriBfEk= Received: by mail-vs1-f52.google.com with SMTP id ada2fe7eead31-4a47cdaf158so1606932137.0 for ; Mon, 28 Oct 2024 14:15:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730150148; x=1730754948; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=kq2tEiK0l3TgU9HmbeUL45Km2z3ZsaUVG9pn1JTmUs0=; b=byI6wVgzxALNcUVZ5julmlltRLKFzz3SwqZPJLBg1OMACvYJsL1eJUxm+KOwhZe+5n /6frl2ksbJkfmr66NNR3xImE8UiTgh5a0w0o8tMbpytOgr0Ozhx19VrbxgIjBX9wWSq+ tOGbPkwbMc+2G6zJmsg8K2YzOiDGLBtnFx+mTMwN654LN2Gz6Z0cs25L/fnQvgGKdJRI DJrVze4w2F3ysft4bmTEbh3qfoIeKIUA9zxPNrar67g0DoPFe1INdXeCvDuregf8aW7Y VdR4mnrIFdYQfmTjkLehiqRUp9JfDEHQ0Us9Rr0wD5eb1JY4nwq/mgF8Mk6F+0G5kciC CSnA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730150148; x=1730754948; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=kq2tEiK0l3TgU9HmbeUL45Km2z3ZsaUVG9pn1JTmUs0=; b=GIJwOzy3aqcX9mgxyiSepkt5aGKFU1BZqWzqftTdS0MykIIEZkdSwObv+mB3uZK7Ow YxXb/65WGu/I8+v0p9UFDKASbXIhk9MFQrNYmiwa824KsMx78uoEy85Aiyvq3dpJgio5 feONMGxq3Aldrv511ZXjorld/Yclmtbjcm5FlEpmvg6AB7ydMd7UdUmQuattfkc3K06f nf9WAn4tUf4qPKk/MfjfTbHDLl05yfMsBZAc1PD9yQ38iKslsQxp9IoR5KkRbOxYoUOr GUhkI6iQvursLXG0qGIfCWd7g1AGoa3DRsa7CPc3E5eXbd/8h4XEdjjKUXycns7EGXFF 9RoQ== X-Forwarded-Encrypted: i=1; AJvYcCV5MqKwBfBHajb6xo+6T5L0Fivsu6xHep6ajlNeOl0ifDEKQU6Li/XnpYuQ4YLUsY0XgOZH4mkXHg==@kvack.org X-Gm-Message-State: AOJu0YyixjyHwYgrsoZBqqeBqAbFrFq4yhY0U/OdPzFE2zfCIyTYA09d KkZ3S5XU68AYrdu8D+c1bWfva3yZLUoeXmrpJC2GLjYTYRqsIKHU7VsrQefK3/e23mtK+qNs4tu OWhrXhylixweGs7qFg3AreRbnUSs= X-Google-Smtp-Source: AGHT+IEG/Uuyrczwyf5WLTJPgbRl1hvGIiRh8dG5l0lRnNLaf+CGNsr8PiCwy92WDAG1Pn6+14XQjINk/mZXbIyEmUo= X-Received: by 2002:a05:6102:5109:b0:4a7:4900:4b39 with SMTP id ada2fe7eead31-4a8cfb4451dmr7818180137.4.1730150147697; Mon, 28 Oct 2024 14:15:47 -0700 (PDT) MIME-Version: 1.0 References: <20241027011959.9226-1-21cnbao@gmail.com> <678a1e30-4962-48de-b5cb-03a1b4b9db1b@gmail.com> <6303e3c9-85d5-40f5-b265-70ecdb02d5ba@gmail.com> <64f12abd-dde3-41a4-b694-cc42784217fb@gmail.com> <882008b6-13e0-41d8-91fa-f26c585120d8@gmail.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Tue, 29 Oct 2024 05:15:36 +0800 Message-ID: Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin To: Usama Arif Cc: Yosry Ahmed , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , joshua.hahnjy@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 493FD18001B X-Rspamd-Server: rspam01 X-Stat-Signature: idfdfaist8oqyx568b4dexmphpuh9k84 X-HE-Tag: 1730150145-906833 X-HE-Meta: U2FsdGVkX19rSLg4qnzwWnXBCMqk9BStk6E3TYXDDuskn6mb7ji63h6R8v7tKS6q6uj8Y1fTKExz1plKHdAea4rcgEKIdxnNS/jDrmVJz4TBUjddN+AbFTnYwwfPBhE/7fZsywhO/ItkHs8b9ZV7Tkeopxm9/oJbgjOsGZ/A1dNk3LedDRUeAweZCuGibKL/EQEAHsamYrGC8JjKXbw4tJ0H0J14ZttndbXkmox0pPO8ooFcgm2ZhWmF+dwB9mck0LwefcPRVHP3ru8ph7VfFt/Cayb0YK1ofUCqGbX5pvK6OJ9oDrpJf/wPT0464J3LdKwJpFGAKrmIlcAAhJA7JbhGYSrb/+9Qu9GYiYotKXZN9MoIwYSQrYh0qAdxZ067IE6n7/EHwVvLjo9cyS11un4+N2M+GFYuD5gRxSxerssbd/lMd+w19eCYKRlnbRTqZBzC3OE9aeB2B4gTIpQKORRhOOFIG+ub4300dXA57WBvoyBqn7pAh+8p2lj4jLB/WE0BZMYQQrJt48ocLLgW6zQALBqUNM683ZPt8ZO0nMA/UNI1M6QPAAvIUEJuUbFB4nsfWYefowiv43lsUm1WBz+o9A6/t0ZEK1Zqlh4inR/4myST+7hAwEqapNzODd9s4HkVXIMKf86caOvtgbEobYqfF4A9vpOpBY5NsG6rJkZVWIFfEdCWvUoq4v7xSD2miKzVQ2KjCYcnao2cbDTaA4zeoCQHWedlj/0r02FnbaAA73J+GRjkZO4a3Dxk0R0wt6C4y2OTGHeTeU7oK+Om7MtvLZ2o8BotozkICHbsEtzHas8cbwThWGZ8pj8Rc1W4+sZKayZwivvA7RXKoDQSfaONxeD8wiJXPZbCR16NnLYm4jMn+ZfrzB6JOnUwNRMfyaDv6SJIP1OvaY9nsc40C2ad3Tay78AaZ+3qShvm4DYFr7KbEwmi9Usz9nq33hSiSEKD9o7LTN4rUsgrH6j 3rcCvaNg kq9OysNFx2K/0fpgWMTAV0JGfNhYifjqsQtQdwDas4PutgY+aJg30hR1H5Zb6pHF9aor4xMotdU4kYCdK4SQux7ncZLf7vgkFrupeVSDNb4jWL8nKOZRQnCcQtOtYS4no/L3LmA37bKYKxLPNzfcdoZe4HHYyDIdqnkS3y+BSPNaFVm9gbmIZvSsC41T3ubA56TlAAop0LfDFO2XYZvBnZkEZXJIdcbLQOl5mieugUA1zkbhuSTKf8DXec5+WpTis6qcE/90qlHRRloohLN7rbSXGnZpmyA7fLhJKWs56MexAw4d32jYsmzOIG1MoG0mieHzv8jdojZdTGGY7RobnVhDKxuRT47ONCfOXZvAyrH0GGdBvj3Lg1/mGJOHKQcXCVC2CJDSGcC+22zNB0di6fLg6HvdmCK2XjWmh X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 29, 2024 at 4:51=E2=80=AFAM Usama Arif = wrote: > > > > On 28/10/2024 20:42, Barry Song wrote: > > On Tue, Oct 29, 2024 at 4:00=E2=80=AFAM Usama Arif wrote: > >> > >> > >> > >> On 28/10/2024 19:54, Barry Song wrote: > >>> On Tue, Oct 29, 2024 at 1:20=E2=80=AFAM Usama Arif wrote: > >>>> > >>>> > >>>> > >>>> On 28/10/2024 17:08, Yosry Ahmed wrote: > >>>>> On Mon, Oct 28, 2024 at 10:00=E2=80=AFAM Usama Arif wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 28/10/2024 16:33, Nhat Pham wrote: > >>>>>>> On Mon, Oct 28, 2024 at 5:23=E2=80=AFAM Usama Arif wrote: > >>>>>>>> > >>>>>>>> I wonder if instead of having counters, it might be better to ke= ep track > >>>>>>>> of the number of zeropages currently stored in zeromap, similar = to how > >>>>>>>> zswap_same_filled_pages did it. It will be more complicated then= this > >>>>>>>> patch, but would give more insight of the current state of the s= ystem. > >>>>>>>> > >>>>>>>> Joshua (in CC) was going to have a look at that. > >>>>>>> > >>>>>>> I don't think one can substitute for the other. > >>>>>> > >>>>>> Yes agreed, they have separate uses and provide different informat= ion, but > >>>>>> maybe wasteful to have both types of counters? They are counters s= o maybe > >>>>>> dont consume too much resources but I think we should still think = about > >>>>>> it.. > >>>>> > >>>>> Not for or against here, but I would say that statement is debatabl= e > >>>>> at best for memcg stats :) > >>>>> > >>>>> Each new counter consumes 2 longs per-memcg per-CPU (see > >>>>> memcg_vmstats_percpu), about 16 bytes, which is not a lot but it ca= n > >>>>> quickly add up with a large number of CPUs/memcgs/stats. > >>>>> > >>>>> Also, when flushing the stats we iterate all of them to propagate > >>>>> updates from per-CPU counters. This is already a slowpath so adding > >>>>> one stat is not a big deal, but again because we iterate all stats = on > >>>>> multiple CPUs (and sometimes on each node as well), the overall flu= sh > >>>>> latency becomes a concern sometimes. > >>>>> > >>>>> All of that is not to say we shouldn't add more memcg stats, but we > >>>>> have to be mindful of the resources. > >>>> > >>>> Yes agreed! Plus the cost of incrementing similar counters (which of= course is > >>>> also not much). > >>>> > >>>> Not trying to block this patch in anyway. Just think its a good poin= t > >>>> to discuss here if we are ok with both types of counters. If its too= wasteful > >>>> then which one we should have. > >>> > >>> Hi Usama, > >>> my point is that with all the below three counters: > >>> 1. PSWPIN/PSWPOUT > >>> 2. ZSWPIN/ZSWPOUT > >>> 3. SWAPIN_SKIP/SWAPOUT_SKIP or (ZEROSWPIN, ZEROSWPOUT what ever) > >>> > >>> Shouldn't we have been able to determine the portion of zeromap > >>> swap indirectly? > >>> > >> > >> Hmm, I might be wrong, but I would have thought no? > >> > >> What if you swapout a zero folio, but then discard it? > >> zeromap_swpout would be incremented, but zeromap_swapin would not. > > > > I understand. It looks like we have two issues to tackle: > > 1. We shouldn't let zeromap swap in or out anything that vanishes into > > a black hole > > 2. We want to find out how much I/O/memory has been saved due to zeroma= p so far > > > > From my perspective, issue 1 requires a "fix", while issue 2 is more > > of an optimization. > > Hmm I dont understand why point 1 would be an issue. > > If its discarded thats fine as far as I can see. it is fine to you and probably me who knows zeromap as well :-) but any userspace code as below might be entirely confused: p =3D malloc(1G); write p to 0; or write part of p to 0 madv_pageout(p, 1g) read p to swapin. The entire procedure used to involve 1GB of swap out and 1GB of swap in by = any means. Now, it has recorded 0 swaps counted. I don't expect userspace is as smart as you :-) > > As a reference, memory.stat.zswapped !=3D memory.stat.zswapout - memory.s= tat.zswapin. > Because zswapped would take into account swapped out anon memory freed, M= ADV_FREE, > shmem truncate, etc as Yosry said about zeromap, But zswapout and zswapin= dont. I understand. However, I believe what we really need to focus on is this: if we=E2=80=99ve swapped out, for instance, 100GB in the past hour, how much of that 100GB i= s zero? This information can help us assess the proportion of zero data in th= e workload, along with the potential benefits that zeromap can provide for me= mory, I/O space, or read/write operations. Additionally, having the second count can enhance accuracy when considering MADV_DONTNEED, FREE, TRUNCATE, and so on. > > > > > > I consider issue 1 to be more critical because, after observing a phone > > running for some time, I've been able to roughly estimate the portion > > zeromap can > > help save using only PSWPOUT, ZSWPOUT, and SWAPOUT_SKIP, even without a > > SWPIN counter. However, I agree that issue 2 still holds significant va= lue > > as a separate patch. > > Thanks Barry