From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B6D1C48BF6 for ; Mon, 4 Mar 2024 05:43:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 30D546B0098; Mon, 4 Mar 2024 00:43:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BD5C6B009C; Mon, 4 Mar 2024 00:43:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1852E6B009D; Mon, 4 Mar 2024 00:43:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 0A5F26B0098 for ; Mon, 4 Mar 2024 00:43:13 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5987C1A0925 for ; Mon, 4 Mar 2024 05:43:12 +0000 (UTC) X-FDA: 81858263424.18.FBF614E Received: from mail-ua1-f46.google.com (mail-ua1-f46.google.com [209.85.222.46]) by imf06.hostedemail.com (Postfix) with ESMTP id 8C30218000B for ; Mon, 4 Mar 2024 05:43:10 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mpeTmB8p; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709530990; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/g+ihsCrAtAeTgC7avXfAm7wA/HUc5ofGFAsJ0YST60=; b=KB1dkXiy242ADET6QosXYrD9tA88c57PSkIZrNNUdFG1VUN+hBEcgXc8BgYl61N0R/RH/t lqYnWY6D5X63P3PzZKGEiiMcRhfsrLKqIIESFV5toLqOEGij8VP7SYDt68SEJU8ysYWl+U JV3CehqqO9DtPWvI7FjFR4HFCj6ltjU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709530990; a=rsa-sha256; cv=none; b=5ur+Ggx/5hfXzBt2FhKZTXEBO+AC+P8reSxnc+4+x0/5v7zaV24JgBkM7jASsG7XWLAQgp Vw7i6yJhPL2xR8jddVX85OErpTtkWyPeuCdC/BHksbBQPbVjjBfqO9HuqXXMxlylbB8VV7 uu5WkD5XcbfKvklj+bgAsClHFbqULls= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=mpeTmB8p; spf=pass (imf06.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.46 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-ua1-f46.google.com with SMTP id a1e0cc1a2514c-7db123701bcso811513241.2 for ; Sun, 03 Mar 2024 21:43:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709530989; x=1710135789; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/g+ihsCrAtAeTgC7avXfAm7wA/HUc5ofGFAsJ0YST60=; b=mpeTmB8pOR3sfXfhzGD49+3XEdjmkT9F/9l2kkn/8NoxvSbT22ZNds2DXcg4lgdxaY kOktjfb4vKa1RNTCDAxYMmr8bZVmFa+PFRxuR3CWbJoKVI9eZS+V11VwHQUCPesjk9+F kMnv9R9+D3nIj507PzAQxF7isoaZuNQVIOcWZNEgRHBPWtufWhaTAcdAvxBA/0I1IlM4 72+Ii51u2UvGT3Pwo5Md1FSlomzKueDUuA5UZMsRXzUYLDaIfI6o51/oOhANvOZjNdCd juZ0+VFAXIBAilKfa0kvD/E7hSDdWjv6mJeNdRtNUvlkWuDqhtoP+mjOvpWlwvHI7rKa xnkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709530989; x=1710135789; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/g+ihsCrAtAeTgC7avXfAm7wA/HUc5ofGFAsJ0YST60=; b=pq5/JfDkLCOctpSBuaQFwSGw5SI3ZBk6ZnfnnvFi5anSAATDDvDmGLohoyFyCB4kiJ xkXlam5lmJfI2Geo5uXV/1NzJ6KCbbqrNNW+tFFMkf51BckTxdj6nLjpkNTM8LktZgMf Q5o4Q7lcNDAoR5QeqlHoe8jeQl9DRR0Ybs4TVboHudTCYzlWOlFLlS0HmfTlDVOb9y68 HXv+RgvBFsxuUNQRTwc3WC8KWKoywaslDcnb3awV9IP9dibSbaluc9P7qub9Ny+vCbyO fC0x4zD/W740p+KZdWaLJIIXOGdIqxac/K/E4BqkBYVj98tGJ2L2KHPRPIgH1ikUHjqM xpgA== X-Forwarded-Encrypted: i=1; AJvYcCXSEyYLCOLfelYA7oc4QF1T124Rms0IjQPuG4JP1/dVah1x3JhuAbcU4QHmfzBPQXYSUhIsXvQ/eTFgHY2VHoGhTVI= X-Gm-Message-State: AOJu0YwZl0/YYlZsI1/xTtXF0XlCKZXG+WYVq567EZTPO0kN0fbtMZXh h66QQ7Oqdyj0SiOCLMAmQrCJys2De/Nq+Js4LoKKl91wRZDxXGndN+N9KHU0fYNxOg/pQOu+gmA Xgi7lqoKQYHpGcvT3imBsQWygVZs= X-Google-Smtp-Source: AGHT+IHmrzHnlSGNMQT1LDr5ZPOnTBCSIRwmkXzZoCfJxJbVJW23l/KMqlLlKDS+J5plvFFi2h1EAIwizEqsXc4qcVk= X-Received: by 2002:a05:6102:356e:b0:472:b188:30ad with SMTP id bh14-20020a056102356e00b00472b18830admr1766445vsb.1.1709530989623; Sun, 03 Mar 2024 21:43:09 -0800 (PST) MIME-Version: 1.0 References: <6541e29b-f25a-48b8-a553-fd8febe85e5a@redhat.com> <2934125a-f2e2-417c-a9f9-3cb1e074a44f@redhat.com> <049818ca-e656-44e4-b336-934992c16028@arm.com> <4a73b16e-9317-477a-ac23-8033004b0637@arm.com> <1195531c-d985-47e2-b7a2-8895fbb49129@redhat.com> <5ebac77a-5c61-481f-8ac1-03bc4f4e2b1d@arm.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Mon, 4 Mar 2024 18:42:58 +1300 Message-ID: Subject: Re: [PATCH v3 1/4] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags To: Ryan Roberts Cc: Matthew Wilcox , David Hildenbrand , Andrew Morton , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8C30218000B X-Rspam-User: X-Stat-Signature: 81iswn4tucj3cy1qaiwsmzo5wp6d3n9e X-Rspamd-Server: rspam03 X-HE-Tag: 1709530990-890959 X-HE-Meta: U2FsdGVkX19HeWMPszpP5GulUbZMwdW9r7zWJOTwhkbjueQ1VZeD6HHvohEoLj4NdcEzLB4TzolpJAEUuaulHjHkc6JwWzO34l88zfr76+cvzWxqEGGmZ8YSmOudCEebMrSwK5fnLgJjRP/xrpKbgbqeu9Oj1A71ejmdO5AjpaWfKaAOhF21cUV+kxjNnLFr+leQkJiv783LSzUs+RcCcjLjpfwyW0ITLHWrp3XKqB0AWozzvWpjfi6Sw9b5ow/J3Z3g35tX8KikJU3teEKonqd0UphNt8icxswS2ihl8iDzU3GuYywz6t/cSo8ff/ewgvW2FsElhkXQw2ibTJQfovk0vfelC9Hkw06rssU1gmTlDaap+6aTJ+c3i1AttA6CH4lZp9XghmGCQIAsmMMz7pETseWfmqq5XnXwiNjzwFlYnaU1M40jmhZm1nH9tkvQb4tPlN0nhpY1r/iSNjIQdxg/Iuzfym1DxKy7fZtU0Mewx4zoiiy7YonmD3N9SCumMnhkIL462tG1jh8LPuDrt8zxcEshj0joRUH3dY2abTZY3hARpkVtbDBNaxZavWEIFGS5DtUCYfUrl5V+l73d1DWCS5fRPcv8gk+rENOEgpeGPjDOrPKQToGcpdyhR/WdGP8eMoVn45NGuLVmc5Eo/2tWWys6dBvs1FC+Pp1RbnFH7DHs6QC9Mp/vuDbJ5ZqAcxy8cYyMtHuZuWjKhZ1FLMM8FGLXbS7me+Gq4ldLsLysdqTpx1+CFXbHE2nvHX0ZFBNyQfIh+BGTbmYT8NoUDivxOrjhJkEiW6mqLoU6miCO9Eb7mtKMcVBCoFkXaLOAUZq4E8sRHp52hfe6Y4Fv3H0z9d1GC0hEOSWomv6L0a39KnmoMDO9pJb+j6sOZ9F8SWT2ws6MAhJXgm2OOB6SWhmDR+wcwcl7Ze6DGNMbumeYcEBXWWsm2EG3E64aC/cz/OqVTfX/+zfcyZzQxND uj9tW16v rqiJKEiTAJVbBaKSM6qc4zT0iKxrJ1/Fuk60IopP5je9UELQPk352iANTkC4AuP6j3H6s8wJk1iS+tNDptdLjy1ErW9TEZtknTNlNnLMvO5yNOYIYj5uirLngzkaujniG/EnDwRMMJSTVo2BoLwZhf/nsf4rl8d6kO7E8imysk2FLF2mX3g4Dp1TWgEw9TDZXJXhaxQWZT/HXu64N+3fPoQrtLxShl5nN2PE3TiFqX3dtaY809NUq3WFMoV8OYU5Nn9uNoIXYNfYF2Ujei5iPWduELP49YF2eILDRlscsBZJU/hIlJGIx8Ejlh8i2X17/UjP5e8nmkzOZ2hS96jYMtvReNPSXO8TCUd6TPnhQkzXeIJgxJMEWDb3bE+mbvXDs/k9cLkynoQvjypMVAEtSQ/IJqWpFCYUPl7qI2uN+3yq2EwuV/PGw5QWB8Un40hjDjYIzw9pxnjVZKnQ7XtgBsCFTdw7hJ6os0iq1Win1ZvCCCGEcA2dYb0iERd9YI8sqxwPH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Mar 4, 2024 at 5:52=E2=80=AFPM Barry Song <21cnbao@gmail.com> wrote= : > > On Sat, Mar 2, 2024 at 6:08=E2=80=AFAM Ryan Roberts wrote: > > > > On 01/03/2024 16:44, Ryan Roberts wrote: > > > On 01/03/2024 16:31, Matthew Wilcox wrote: > > >> On Fri, Mar 01, 2024 at 04:27:32PM +0000, Ryan Roberts wrote: > > >>> I've implemented the batching as David suggested, and I'm pretty co= nfident it's > > >>> correct. The only problem is that during testing I can't provoke th= e code to > > >>> take the path. I've been pouring through the code but struggling to= figure out > > >>> under what situation you would expect the swap entry passed to > > >>> free_swap_and_cache() to still have a cached folio? Does anyone hav= e any idea? > > >>> > > >>> This is the original (unbatched) function, after my change, which c= aused David's > > >>> concern that we would end up calling __try_to_reclaim_swap() far to= o much: > > >>> > > >>> int free_swap_and_cache(swp_entry_t entry) > > >>> { > > >>> struct swap_info_struct *p; > > >>> unsigned char count; > > >>> > > >>> if (non_swap_entry(entry)) > > >>> return 1; > > >>> > > >>> p =3D _swap_info_get(entry); > > >>> if (p) { > > >>> count =3D __swap_entry_free(p, entry); > > >>> if (count =3D=3D SWAP_HAS_CACHE) > > >>> __try_to_reclaim_swap(p, swp_offset(entry), > > >>> TTRS_UNMAPPED | TTRS_FULL= ); > > >>> } > > >>> return p !=3D NULL; > > >>> } > > >>> > > >>> The trouble is, whenever its called, count is always 0, so > > >>> __try_to_reclaim_swap() never gets called. > > >>> > > >>> My test case is allocating 1G anon memory, then doing madvise(MADV_= PAGEOUT) over > > >>> it. Then doing either a munmap() or madvise(MADV_FREE), both of whi= ch cause this > > >>> function to be called for every PTE, but count is always 0 after > > >>> __swap_entry_free() so __try_to_reclaim_swap() is never called. I'v= e tried for > > >>> order-0 as well as PTE- and PMD-mapped 2M THP. > > >> > > >> I think you have to page it back in again, then it will have an entr= y in > > >> the swap cache. Maybe. I know little about anon memory ;-) > > > > > > Ahh, I was under the impression that the original folio is put into t= he swap > > > cache at swap out, then (I guess) its removed once the IO is complete= ? I'm sure > > > I'm miles out... what exactly is the lifecycle of a folio going throu= gh swap out? > > > > > > I guess I can try forking after swap out, then fault it back in in th= e child and > > > exit. Then do the munmap in the parent. I guess that could force it? = Thanks for > > > the tip - I'll have a play. > > > > That has sort of solved it, the only problem now is that all the folios= in the > > swap cache are small (because I don't have Barry's large swap-in series= ). So > > really I need to figure out how to avoid removing the folio from the ca= che in > > the first place... > > I am quite sure we have a chance to hit a large swapcache even using zRAM= - > a sync swapfile and even during swap-out. > > I have a test case as below, > 1. two threads to run MADV_PAGEOUT > 2. two threads to read data being swapped-out > > in do_swap_page, from time to time, I can get a large swapcache. > > We have a short time window after add_to_swap() and before > __removing_mapping() of > vmscan, a large folio is still in swapcache. > > So Ryan, I guess you can trigger this by adding one more thread of > MADV_DONTNEED to do zap_pte_range? Ryan, I have modified my test case to have 4 threads: 1. MADV_PAGEOUT 2. MADV_DONTNEED 3. write data 4. read data and git push the code here so that you can get it, https://github.com/BarrySong666/swaptest/blob/main/swptest.c I can reproduce the issue in zap_pte_range() in just a couple of minutes. > > > > > > > > > >> > > >> If that doesn't work, perhaps use tmpfs, and use some memory pressur= e to > > >> force that to swap? > > >> > > >>> I'm guessing the swapcache was already reclaimed as part of MADV_PA= GEOUT? I'm > > >>> using a block ram device as my backing store - I think this does sy= nchronous IO > > >>> so perhaps if I have a real block device with async IO I might have= more luck? > > >>> Just a guess... > > >>> > > >>> Or perhaps this code path is a corner case? In which case, perhaps = its not worth > > >>> adding the batching optimization after all? > > >>> > > >>> Thanks, > > >>> Ryan > > >>> > > > Thanks Barry