From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C9E9C54E49 for ; Mon, 4 Mar 2024 04:52:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 55DD66B0098; Sun, 3 Mar 2024 23:52:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4E7D76B009C; Sun, 3 Mar 2024 23:52:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 361736B009D; Sun, 3 Mar 2024 23:52:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1FDEB6B0098 for ; Sun, 3 Mar 2024 23:52:30 -0500 (EST) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C5F198094B for ; Mon, 4 Mar 2024 04:52:29 +0000 (UTC) X-FDA: 81858135618.04.64B55F9 Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf28.hostedemail.com (Postfix) with ESMTP id 0E8CDC000A for ; Mon, 4 Mar 2024 04:52:27 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=S0AqJnGL; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709527948; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AkFFEU+0+DW/z+oRCHXNccKfQu4nUdTuR16h3ZobfTg=; b=YqlbyayV0bYDL9oH3Wv3Rx0RL7yd/9wQH5dHriZZflW7jD2E++J0bH7BEyHFg1lZFIISdr MsWo6RuEH4RdJmLa1BHt3e4t+mX5SOoZFs1FO67aMvIbnUvRUYxDdaTuCI6cbRgnpXmvlD bTVXlHJUdrNTjn7K5ybcA8ZDvyy2VR4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1709527948; a=rsa-sha256; cv=none; b=IKzzjROJn6JyjtDNB7Y4RUVzPiSg83XWsffEQnbIbVsyupWiXe47r9As49tLRKrT3qhRuS OYdDkAD7SRHU8LyhhHrWv2jUvojce9penaaPTCN175DUG5xZvVDwAk/EJR1Tx54+30ZmuY 7Ln+zBwjRU3YN2+2Ita9mpaPK+dfRck= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=S0AqJnGL; spf=pass (imf28.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-4d35ab44ba9so557116e0c.1 for ; Sun, 03 Mar 2024 20:52:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1709527947; x=1710132747; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=AkFFEU+0+DW/z+oRCHXNccKfQu4nUdTuR16h3ZobfTg=; b=S0AqJnGLKLXVgocIs0IWCXcuVMBoxSFUcVNU0Xrr6HLmYZu9M13sJ2v0Y2B2bU8MNf MYdW5fr3d9loU5mMD7n2XxjK2/M1OblXUOo/jWZR4mMjkrZpY0QBoyN+6o/NBcsYhP6/ +O6mnq85N2R4gGnHYRVb2kedKgoXTfVbU05OeoV/qFk+TT5AZ1TQ8e1weqhQOXOh0Oyh 7XqCNu17jjd+04Di1t9Fql0hUuFy/BbpcsvoaaXAo1Ko4rFAwyfMe5ohvqi0c9gCCSUL MQYhCKVhR5AyIh68uDhzQh/hBjsf6iul6e8P5uiDOpGgf5y/cDcckayaYmfcEodEswtn d4JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1709527947; x=1710132747; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=AkFFEU+0+DW/z+oRCHXNccKfQu4nUdTuR16h3ZobfTg=; b=lHX9y3WCKdvgtzAT8NnpcacqjwhT7HOxaNCAUlbYclrY5ne2l9rbKYp4AyKsBY7kXr mEdMKVHANLxrpKZX5tqZTvNaP/8frg0U5KOnSw6KMU8ytNOJf7ZhTmbN3ILlA+9oTJnR Vtxb6HlZp8WFrmDYOxiPwdWiDKu9s5wee3snHfApMrqC9JXbnsZj5u1Eaep7niGfuwkG IsyOwG2F/eh7Hmut+fXTtj92xVGjuiJNoQMEWPKwPBUTauyBG0yz62HTJkhHpqr8rwf6 JA2vjorOpGI4wj0hS8FtqMYPpOsTPhzKMnJuR7JgvSdcIGSvp47k5LRleEeRNAqdrw0j CpHw== X-Forwarded-Encrypted: i=1; AJvYcCU7wPCMSS21+ju59LwCGc204/20KSLusdiorsBQ038gQ4pQUw9k+PhxzB/irMJUp3875lCgKvJLAfFTj0o1yBl9xwA= X-Gm-Message-State: AOJu0YyEMSF1s854vVcojIc/1DmLarjIFhAcXo68lowAEZUyqPS1M4zW 7Kiy0Ynbel4OBGxMaYkEcEl/e9S2s2Nb2NQVuMh4w3xlwr4TOzAePY8Aw8HlwetayQ4K/Q/sq94 LFI8RvfEyKP6g5yr7dZW4lhF5B8s= X-Google-Smtp-Source: AGHT+IF5jsyTk3HvZc/020+awQxvPMQ6Wqx7+kXDFnVeuDm+ms6f3qJz8dmtgHtoYC+wtmep4RpI73MYauaIhytdsfk= X-Received: by 2002:a05:6122:1825:b0:4d3:373b:4db1 with SMTP id ay37-20020a056122182500b004d3373b4db1mr5205933vkb.6.1709527947073; Sun, 03 Mar 2024 20:52:27 -0800 (PST) MIME-Version: 1.0 References: <6541e29b-f25a-48b8-a553-fd8febe85e5a@redhat.com> <2934125a-f2e2-417c-a9f9-3cb1e074a44f@redhat.com> <049818ca-e656-44e4-b336-934992c16028@arm.com> <4a73b16e-9317-477a-ac23-8033004b0637@arm.com> <1195531c-d985-47e2-b7a2-8895fbb49129@redhat.com> <5ebac77a-5c61-481f-8ac1-03bc4f4e2b1d@arm.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Mon, 4 Mar 2024 17:52:15 +1300 Message-ID: Subject: Re: [PATCH v3 1/4] mm: swap: Remove CLUSTER_FLAG_HUGE from swap_cluster_info:flags To: Ryan Roberts Cc: Matthew Wilcox , David Hildenbrand , Andrew Morton , Huang Ying , Gao Xiang , Yu Zhao , Yang Shi , Michal Hocko , Kefeng Wang , linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0E8CDC000A X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 4ut884jm6foegbyjibnbtsn4bn8cpyb5 X-HE-Tag: 1709527947-217717 X-HE-Meta: U2FsdGVkX1+O0w9AysWmCxCv3uhaABZeVnRPN6A03OZvviA2KCQBZQz15/WaoKjK5t7Q0NTC41Xl+q+4YeyK81+Hjr7nM7G/U7L72YnLXsidzqjH6SxJ5KnQ7VHPgRd6BV9yDa1h2ptBHcO3O+ixpiwRW/SiQdlnC2nN8cgV5iPQpmK6e9TyanQbfS8T3IJI8976Ish6l9f53j0zc09zkGGbhxEBnwjy6qReHV9+bKGyFRL5znmQLprSlZUGtHzX7k7aDlboKu4QP3k/ypim6M1BD5/ZaccwMX/iCNViu4LignfRd59M/OyaqUyrDesIzpcnbs73reEjLUnvBIlOh2XWrdE/aLOhBD9mQqbxdDXCFS/5kfV0Uia3FEZ41CCGGx/GFllVsf1TVsEeXN6mOS5REho9xa3QSmqpi+41G+SMn81G/VdACmhNLCxDmtz/9O9mgQ6xpKl5cS2zyRb8SczI2URkuUWCY7BBIblVwkUvSG5EZrmfNDthZ7M5kiugZwb65eIRSPCdcHm9jOYeU/C7lcx1NHXmC3Bq4Q3I9EgGKYIJeTJZsTpXhpb0C4UP44v198XsjQJO3N5lbcseA0ybb7HnQBSRtP+sF460OWGkkHBbm9caR+mozWGsTQ6kQay4P8NOCyVj1F+VcFiNu/haOwdZDabkpmEridAm5iTbt2pORiVAWJiY1LtRyvv9nDLNrdktTNzE4s9nOd2BSs6ljzzieq2UrN/qdUFsQwRonmbk5BRGX5XRocYyPxaRFPtrsy9252mTSn+ysSCvn8RoTQSQulIdTKDJq2NU2AXOXHRE+7R/EGE0qFRtAqH+vTWOWk7owtDJSZqzphW0dHHt3RT2pXfOvKrPWbbKiNMkuLI7KEtLHy47fOQ2xjstwshi2ulcZNNQhKNx/ksW3P/HutQzr7c4nxI8beD8YMuc2VRXFOebiE52/2rGMcxKcRhwFtY9PNzsX5ynFQ0 ptMlfC0o EO7lFSMoIIBIBrGEWbecHMNFy/kMzUKO6COuxEXFUJ5auhLTZP8aRZ5rDUFjtr2zqQdHCQJOBvZrkdHLg2SiUdvIeFo1HJFCLMwac5UsgVJBihaSvJNELDEDtWwn889CNtRr2letXj9x6bDvpbtSIrC9+vDIGX4Drka4abWotv+GsbaOmZjB6jfQd7ofO2llYZXdo2AiIpk89O92N/sH/tnfXTM4dK6tuQ1TICrw0FS1gqRqlpIA1iCIP2eSaPuKLe05hHICCVDHxHFmxy3nwNcRE3IksV3iq7dX84AfipS6663las9MTfvG0fpIWSVfE/baFJH1ueaJvYt1ws6He5jDoIuKp+WsOAkie9gAhHFH+/nqOzGVJIMrRwuT8z5pvNI2EVkKfZ58VLs9qO+NIwoiane4dpAQ/Fw2yQZ3ox9sR/fNuNMYVAWZMqxtmryxo0164YvqnwNcPzWJeB0C/NdXt6A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Mar 2, 2024 at 6:08=E2=80=AFAM Ryan Roberts = wrote: > > On 01/03/2024 16:44, Ryan Roberts wrote: > > On 01/03/2024 16:31, Matthew Wilcox wrote: > >> On Fri, Mar 01, 2024 at 04:27:32PM +0000, Ryan Roberts wrote: > >>> I've implemented the batching as David suggested, and I'm pretty conf= ident it's > >>> correct. The only problem is that during testing I can't provoke the = code to > >>> take the path. I've been pouring through the code but struggling to f= igure out > >>> under what situation you would expect the swap entry passed to > >>> free_swap_and_cache() to still have a cached folio? Does anyone have = any idea? > >>> > >>> This is the original (unbatched) function, after my change, which cau= sed David's > >>> concern that we would end up calling __try_to_reclaim_swap() far too = much: > >>> > >>> int free_swap_and_cache(swp_entry_t entry) > >>> { > >>> struct swap_info_struct *p; > >>> unsigned char count; > >>> > >>> if (non_swap_entry(entry)) > >>> return 1; > >>> > >>> p =3D _swap_info_get(entry); > >>> if (p) { > >>> count =3D __swap_entry_free(p, entry); > >>> if (count =3D=3D SWAP_HAS_CACHE) > >>> __try_to_reclaim_swap(p, swp_offset(entry), > >>> TTRS_UNMAPPED | TTRS_FULL); > >>> } > >>> return p !=3D NULL; > >>> } > >>> > >>> The trouble is, whenever its called, count is always 0, so > >>> __try_to_reclaim_swap() never gets called. > >>> > >>> My test case is allocating 1G anon memory, then doing madvise(MADV_PA= GEOUT) over > >>> it. Then doing either a munmap() or madvise(MADV_FREE), both of which= cause this > >>> function to be called for every PTE, but count is always 0 after > >>> __swap_entry_free() so __try_to_reclaim_swap() is never called. I've = tried for > >>> order-0 as well as PTE- and PMD-mapped 2M THP. > >> > >> I think you have to page it back in again, then it will have an entry = in > >> the swap cache. Maybe. I know little about anon memory ;-) > > > > Ahh, I was under the impression that the original folio is put into the= swap > > cache at swap out, then (I guess) its removed once the IO is complete? = I'm sure > > I'm miles out... what exactly is the lifecycle of a folio going through= swap out? > > > > I guess I can try forking after swap out, then fault it back in in the = child and > > exit. Then do the munmap in the parent. I guess that could force it? Th= anks for > > the tip - I'll have a play. > > That has sort of solved it, the only problem now is that all the folios i= n the > swap cache are small (because I don't have Barry's large swap-in series).= So > really I need to figure out how to avoid removing the folio from the cach= e in > the first place... I am quite sure we have a chance to hit a large swapcache even using zRAM - a sync swapfile and even during swap-out. I have a test case as below, 1. two threads to run MADV_PAGEOUT 2. two threads to read data being swapped-out in do_swap_page, from time to time, I can get a large swapcache. We have a short time window after add_to_swap() and before __removing_mapping() of vmscan, a large folio is still in swapcache. So Ryan, I guess you can trigger this by adding one more thread of MADV_DONTNEED to do zap_pte_range? > > > > >> > >> If that doesn't work, perhaps use tmpfs, and use some memory pressure = to > >> force that to swap? > >> > >>> I'm guessing the swapcache was already reclaimed as part of MADV_PAGE= OUT? I'm > >>> using a block ram device as my backing store - I think this does sync= hronous IO > >>> so perhaps if I have a real block device with async IO I might have m= ore luck? > >>> Just a guess... > >>> > >>> Or perhaps this code path is a corner case? In which case, perhaps it= s not worth > >>> adding the batching optimization after all? > >>> > >>> Thanks, > >>> Ryan > >>> > > Thanks Barry