From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E550CD5B154 for ; Mon, 28 Oct 2024 21:40:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 590D96B00A9; Mon, 28 Oct 2024 17:40:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 541466B00AA; Mon, 28 Oct 2024 17:40:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3BA668D0003; Mon, 28 Oct 2024 17:40:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1B56E6B00A9 for ; Mon, 28 Oct 2024 17:40:47 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id BFC3A1C5B22 for ; Mon, 28 Oct 2024 21:40:46 +0000 (UTC) X-FDA: 82724329716.13.E000AE8 Received: from mail-vs1-f43.google.com (mail-vs1-f43.google.com [209.85.217.43]) by imf06.hostedemail.com (Postfix) with ESMTP id F3DA2180016 for ; Mon, 28 Oct 2024 21:40:27 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GoiEOiuG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730151470; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2s8BEfzXfzLm9lw/3mg6kk22d4x4jEBxL+In4u3q7Sg=; b=aa8x7gxbGLZoJuQvhxJBPiH2zZvinV5PmCK3dyLqf1Md6v9ej9vHp6AkdlYetrHnJjWHpU E3eus3odDw0gmHNSAeOQmHYvEXm8/OM19207hxg58yXgns2bXWRiAzE3eYaOY5+4UKnz9g D5/dBQQxez2hAgIbmGexSEdUnoVIobM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730151470; a=rsa-sha256; cv=none; b=Qe9GduYeIfxR0msP3RlvuQNetipPPzfqye2xuLck+RxwR8K5cz1lcZozGl6/s5FJBG17Pr zU7dHaCOy83YnttEFPi41aPbgSspnvK1BBuMXM7qTFymF0+0Tm/ho8iCEuze3khKxpoChv JXdjvJAxK1+03NS+hV7sfuXC/wmk8w8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=GoiEOiuG; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.43 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vs1-f43.google.com with SMTP id ada2fe7eead31-4a47ec4ef2cso1235497137.1 for ; Mon, 28 Oct 2024 14:40:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730151644; x=1730756444; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2s8BEfzXfzLm9lw/3mg6kk22d4x4jEBxL+In4u3q7Sg=; b=GoiEOiuGcQ5I3OU1JIHMV8El+HzLNYVR7E/BKc/rCVLfz4TiwKHR9o2CGbmMJ0jRjU 7ng3QCHjCDVSlJSnosbgnycQU78XASe2vYlI0yqIZkNfllERu+/2GfWONdbX0xvco248 jX4P3m07MALmThQ+T2eemtNfwcWLXF7oOGjxFDO4+dRrAG+/mdyADjXeGPLYUfq2mMe5 oQX+EM/azMmix3Mm93tw/MmAlnkAFqM6OpYZJrfI8Bp4Q9e51buPPWmfAg7wJgy4jFaq PJC3nf8HYCl/oZ1q64B9aQgoE39FuWT0Tv9+hDgc7qBmphHFypkJAEPs0XeJTPwetZ8m QGqg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730151644; x=1730756444; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2s8BEfzXfzLm9lw/3mg6kk22d4x4jEBxL+In4u3q7Sg=; b=FzqFtya2XbILnX50u/mQAi0zfinwb2t+oao3WDXG0iWP5MEzsnw2xo2boNfwosPuLm e7tjot4mvXGTJSxTXbveexSwoZiPqD+cr1d0kTqBO/ZRgR38Qy7xrns0h7zpczDoARi4 HXrmKyy/FOBLI+VFZtGIGk02Ba0oy/ZkVaPp+f3eTSMAH7wawU/1pWSWL3rH7XP+Qd3M c4vC6ickouRd9tT2/AszWCu2d/+rFk612EtCAnE6c+wd1Lb5KzAshNuCwydjfo8pIQ9Q JfqK6lxWj/f+uzB3l74AhuXy1RWAp8G1rfFfxpZaLG+ZqB5PxwpAaVKNaGowePz03vod qBHQ== X-Forwarded-Encrypted: i=1; AJvYcCUEwegrKJ+zZitxSgsDn9/IATY/ooFtdWp5E82h4lwmOv6AEreoRe2lzZnN69u/0waeYodOZtmfbg==@kvack.org X-Gm-Message-State: AOJu0Yw6aJcngchsv3S+zmKmczk6pS+c/ppVG7JDwGKjLKwoF/26HPOi SjTgRSaHpd8RFE9V9keaag0sKzxq7BfEMbrmg+i4WhDL6MjE8+/Gu6fFBvFt63aH0rGsD67CLWL R2I4N1kkcYWjK1SLSBXpqOJBQzHk= X-Google-Smtp-Source: AGHT+IGtW1SJibHNISw8SW9HAsPcmFd1euEk8iVYuw9ii0pUsA1oDKqXjb0e7VMl4uMeMay3AOlzUhnHQok6d2LI73U= X-Received: by 2002:a05:6102:5109:b0:4a7:4900:4b39 with SMTP id ada2fe7eead31-4a8cfb4451dmr7873278137.4.1730151643991; Mon, 28 Oct 2024 14:40:43 -0700 (PDT) MIME-Version: 1.0 References: <20241027011959.9226-1-21cnbao@gmail.com> <678a1e30-4962-48de-b5cb-03a1b4b9db1b@gmail.com> <6303e3c9-85d5-40f5-b265-70ecdb02d5ba@gmail.com> <64f12abd-dde3-41a4-b694-cc42784217fb@gmail.com> <882008b6-13e0-41d8-91fa-f26c585120d8@gmail.com> <228c428d-d116-4be1-9d0d-0591667b7ccb@gmail.com> In-Reply-To: <228c428d-d116-4be1-9d0d-0591667b7ccb@gmail.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 29 Oct 2024 05:40:32 +0800 Message-ID: Subject: Re: [PATCH RFC] mm: count zeromap read and set for swapout and swapin To: Usama Arif Cc: Yosry Ahmed , Nhat Pham , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Barry Song , Chengming Zhou , Johannes Weiner , David Hildenbrand , Hugh Dickins , Matthew Wilcox , Shakeel Butt , Andi Kleen , Baolin Wang , Chris Li , "Huang, Ying" , Kairui Song , Ryan Roberts , joshua.hahnjy@gmail.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: F3DA2180016 X-Stat-Signature: hgus9g1wb7o5g4dnsreck8jyup9cm5tf X-Rspam-User: X-HE-Tag: 1730151627-909690 X-HE-Meta: U2FsdGVkX18+++cxSDlFPuydz2INFOmpoD87lwSe78jkH6WwurMcd7/Hn84vjyjv3PaxleiMocs9go1quLaoV132PgdNAZl7JD7sJRRlDicAJm9mJ2ZQ0lV1KKuJ3YLlnuCctQOoHfZINursCpRALTdlsBAOFBSwU5fL/6ABm1Mks4XgYzKA6P5nENjsoKg2qo4ZWi7/pzKzeodG+ynJuARrz5oXyt8H/VHsp9OHAJLNubX3GbMizahOEmIHClzk63kENwUbDGaIzvrfZVYDf1SgwCvH7Glbr33CKj+AQrJ6vm1BAwCaIPm1vdcWCEcofHfe7KTBeqTSYmdwmd7maEd+iJZQv3a9q0gb5242cPWDMoE4qM1mCFCSnJa7DxOYfKLuipBVZFaue4dZLCeZfGrWLUHGFxjI2xdqtHfvrTMYs4sNSD7wOKKqAYdSY2Br0cYbd4Cyvgmf4JOTMAXki0GvhidYuAR1FQiveLT95ruXf5zChjn3BNKUtCI2HAB8RPSObaD8em0iRJtkfavF9p5LibFiaHfj0Iim6yYpX76vCxiOKw07yFzlba2Hb5DLwLIbQ3k6muq9JAaqSzEA+vQmMUp6jwcmLEKdT63Fu02/uqZilsgkV+OfTBbkDMsiwPW2IFBHzqusdTsHrSUVwlZw6wAqHycZFmklqf6JiYUMaGTQKzBZKnEVCNRRQOH6DvtV5qka4WIRcFkfVE4unM1myk2LiXIzwRUXZsqyZsVvaOW9kflsi7JHQ8iOnFNlgwzeVB6XIoHyK2kdw1i6yPFnosU+RtGUw7pwGApLyzPw+bXARAGw/od8jslXK1c+gT4JNMBnwlKHNQmPZSFdp+UPZXLwAsoxQcwfA5svwmYfagGXY7p4OsnNhV7MbURhom+uaGY3waWLfzt86i1CV+4f2tntjvD/Te9kIjDOWON6F4CW3ONTgAbM+noljatQtRRnu+tzf2EsmvINDjK ISAlwlq3 rb4bj9PvKw+zWEn5+tTB88dO36MpkVINJ31pFEKDUxSZy6JhSHyukQFtZla5v1arpNaeZjuYMEsovscB++G97kxJj7wgoo5pK6+HBT9alaoSGkCr3qzvJ9v0dzPAHseJGSCr74VKQwKtAzzoVTLi0C4j8kwHKzVX+5TNfVNrM96spXVP/sb/YjWWGAcKYkzrF9UQ4zPMOEjMXiimPn/Ir/w5BVSpi6o+YXpjV5NM08mbIfg4fMeY+gHD6UrQQuIYsE1Ur0kSTIEj/vFit8M8HoB6WxLS5H3XW9NwCZVdz07Ub+G1ENo5+WYUfJPFZ/nzngihrTNOkAfWeMwEsQ11fQipMhd9jXbDvr2zyl1rL9UENmuwuViPqD0NQi+iim1+jX2WYSx57fa0ejrn0TfaqqOjXXCduXPP0sx4A X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 29, 2024 at 5:24=E2=80=AFAM Usama Arif = wrote: > > > > On 28/10/2024 21:15, Barry Song wrote: > > On Tue, Oct 29, 2024 at 4:51=E2=80=AFAM Usama Arif wrote: > >> > >> > >> > >> On 28/10/2024 20:42, Barry Song wrote: > >>> On Tue, Oct 29, 2024 at 4:00=E2=80=AFAM Usama Arif wrote: > >>>> > >>>> > >>>> > >>>> On 28/10/2024 19:54, Barry Song wrote: > >>>>> On Tue, Oct 29, 2024 at 1:20=E2=80=AFAM Usama Arif wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 28/10/2024 17:08, Yosry Ahmed wrote: > >>>>>>> On Mon, Oct 28, 2024 at 10:00=E2=80=AFAM Usama Arif wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> On 28/10/2024 16:33, Nhat Pham wrote: > >>>>>>>>> On Mon, Oct 28, 2024 at 5:23=E2=80=AFAM Usama Arif wrote: > >>>>>>>>>> > >>>>>>>>>> I wonder if instead of having counters, it might be better to = keep track > >>>>>>>>>> of the number of zeropages currently stored in zeromap, simila= r to how > >>>>>>>>>> zswap_same_filled_pages did it. It will be more complicated th= en this > >>>>>>>>>> patch, but would give more insight of the current state of the= system. > >>>>>>>>>> > >>>>>>>>>> Joshua (in CC) was going to have a look at that. > >>>>>>>>> > >>>>>>>>> I don't think one can substitute for the other. > >>>>>>>> > >>>>>>>> Yes agreed, they have separate uses and provide different inform= ation, but > >>>>>>>> maybe wasteful to have both types of counters? They are counters= so maybe > >>>>>>>> dont consume too much resources but I think we should still thin= k about > >>>>>>>> it.. > >>>>>>> > >>>>>>> Not for or against here, but I would say that statement is debata= ble > >>>>>>> at best for memcg stats :) > >>>>>>> > >>>>>>> Each new counter consumes 2 longs per-memcg per-CPU (see > >>>>>>> memcg_vmstats_percpu), about 16 bytes, which is not a lot but it = can > >>>>>>> quickly add up with a large number of CPUs/memcgs/stats. > >>>>>>> > >>>>>>> Also, when flushing the stats we iterate all of them to propagate > >>>>>>> updates from per-CPU counters. This is already a slowpath so addi= ng > >>>>>>> one stat is not a big deal, but again because we iterate all stat= s on > >>>>>>> multiple CPUs (and sometimes on each node as well), the overall f= lush > >>>>>>> latency becomes a concern sometimes. > >>>>>>> > >>>>>>> All of that is not to say we shouldn't add more memcg stats, but = we > >>>>>>> have to be mindful of the resources. > >>>>>> > >>>>>> Yes agreed! Plus the cost of incrementing similar counters (which = ofcourse is > >>>>>> also not much). > >>>>>> > >>>>>> Not trying to block this patch in anyway. Just think its a good po= int > >>>>>> to discuss here if we are ok with both types of counters. If its t= oo wasteful > >>>>>> then which one we should have. > >>>>> > >>>>> Hi Usama, > >>>>> my point is that with all the below three counters: > >>>>> 1. PSWPIN/PSWPOUT > >>>>> 2. ZSWPIN/ZSWPOUT > >>>>> 3. SWAPIN_SKIP/SWAPOUT_SKIP or (ZEROSWPIN, ZEROSWPOUT what ever) > >>>>> > >>>>> Shouldn't we have been able to determine the portion of zeromap > >>>>> swap indirectly? > >>>>> > >>>> > >>>> Hmm, I might be wrong, but I would have thought no? > >>>> > >>>> What if you swapout a zero folio, but then discard it? > >>>> zeromap_swpout would be incremented, but zeromap_swapin would not. > >>> > >>> I understand. It looks like we have two issues to tackle: > >>> 1. We shouldn't let zeromap swap in or out anything that vanishes int= o > >>> a black hole > >>> 2. We want to find out how much I/O/memory has been saved due to zero= map so far > >>> > >>> From my perspective, issue 1 requires a "fix", while issue 2 is more > >>> of an optimization. > >> > >> Hmm I dont understand why point 1 would be an issue. > >> > >> If its discarded thats fine as far as I can see. > > > > it is fine to you and probably me who knows zeromap as well :-) but > > any userspace code > > as below might be entirely confused: > > > > p =3D malloc(1G); > > write p to 0; or write part of p to 0 > > madv_pageout(p, 1g) > > read p to swapin. > > > > The entire procedure used to involve 1GB of swap out and 1GB of swap in= by any > > means. Now, it has recorded 0 swaps counted. > > > > I don't expect userspace is as smart as you :-) > > > Ah I completely agree, we need to account for it in some metric. I probab= ly > misunderstood when you said "We shouldn't let zeromap swap in or out anyt= hing that > vanishes into a black hole", by we should not have the zeromap optimizati= on for those > cases. What I guess you meant is we need to account for it in some metric= . > > >> > >> As a reference, memory.stat.zswapped !=3D memory.stat.zswapout - memor= y.stat.zswapin. > >> Because zswapped would take into account swapped out anon memory freed= , MADV_FREE, > >> shmem truncate, etc as Yosry said about zeromap, But zswapout and zswa= pin dont. > > > > I understand. However, I believe what we really need to focus on is > > this: if we=E2=80=99ve > > swapped out, for instance, 100GB in the past hour, how much of that 100= GB is > > zero? This information can help us assess the proportion of zero data i= n the > > workload, along with the potential benefits that zeromap can provide fo= r memory, > > I/O space, or read/write operations. Additionally, having the second co= unt > > can enhance accuracy when considering MADV_DONTNEED, FREE, TRUNCATE, > > and so on. > > > Yes completely agree! > > I think we can look into adding all three metrics, zeromap_swapped, zerom= ap_swpout, > zeromap_swpin (or whatever name works). It's great to reach an agreement. Let me work on some patches for it. By the way, I recently had an idea: if we can conduct the zeromap check earlier - for example - before allocating swap slots and pageout(), could we completely eliminate swap slot occupation and allocation/release for zeromap data? For example, we could use a special swap entry value in the PTE to indicate zero content and directly fill it with zeros when swapping back. We've observed that swap slot allocation and freeing can consume a lot of CPU and slow down functions like zap_pte_range and swap-in. If we can entirely skip these steps, it could improve performance. However, I'm uncertain about the benefits we would gain if we only have 1-2% zeromap data. I'm just putting this idea out there to see if you're interested in moving forward with it. :-) > > >> > >> > >>> > >>> I consider issue 1 to be more critical because, after observing a pho= ne > >>> running for some time, I've been able to roughly estimate the portion > >>> zeromap can > >>> help save using only PSWPOUT, ZSWPOUT, and SWAPOUT_SKIP, even without= a > >>> SWPIN counter. However, I agree that issue 2 still holds significant = value > >>> as a separate patch. > >>> > > Thanks Barry