From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 797F2C27C53 for ; Fri, 7 Jun 2024 10:24:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E4C256B00A8; Fri, 7 Jun 2024 06:24:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DFCD46B00A9; Fri, 7 Jun 2024 06:24:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C9D036B00AA; Fri, 7 Jun 2024 06:24:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AA1556B00A8 for ; Fri, 7 Jun 2024 06:24:18 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 5B1DE1A03F7 for ; Fri, 7 Jun 2024 10:24:18 +0000 (UTC) X-FDA: 82203707796.02.1CDFA6E Received: from mail-ej1-f46.google.com (mail-ej1-f46.google.com [209.85.218.46]) by imf15.hostedemail.com (Postfix) with ESMTP id 69592A000D for ; Fri, 7 Jun 2024 10:24:16 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ey7pUtpC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1717755856; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2KbBL1PuW9dZflYjkgNMLA3gaVvoqkOohso43BzxQOU=; b=dIEP9i9cBqKoAmSoiRsGmKNlVfNuhN7b0PrF4n89Lz9SzqgBkGi2y4nADV2NG4kaucOpyK HinVulVwKDYZqzFVylC4iioe7GlILxaDJn19HiFiGPH9av3PUImHPTqGpl1tpA0fBpc8i+ fZqB0Zogkrfyqa0pn2YMU/HmlMI9GV4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1717755856; a=rsa-sha256; cv=none; b=TcLMJGVuKWCITd7xQmfMCquI3NT/V6rn22xXUORjNChF+oVRpRUI2qLPN+ZSWWZhGGpCqZ CuW1yjz69PykcB24fjy/rIE63tVrUEt010lCDUox0CQzWUbxWGb2a8yKD+gWVms1scXogl Yqb3RhoxHXw/nzU8OKGYBtLPSOnBdj0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ey7pUtpC; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf15.hostedemail.com: domain of usamaarif642@gmail.com designates 209.85.218.46 as permitted sender) smtp.mailfrom=usamaarif642@gmail.com Received: by mail-ej1-f46.google.com with SMTP id a640c23a62f3a-a691bbb7031so259705966b.1 for ; Fri, 07 Jun 2024 03:24:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1717755855; x=1718360655; darn=kvack.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=2KbBL1PuW9dZflYjkgNMLA3gaVvoqkOohso43BzxQOU=; b=ey7pUtpC9uQ2ewtgtYX8PsGbQImWoqjdFCMIB6sw74Oq9gUSDq8icvnPozUMGLM++f 1o7l06UNdouGRsYGVLOpY33xnguinvdYHdlom5hDDRmuY9R1KSlvsBExdzEGKbGc6r6T MvhjdtxGS3MBdYOSS2zPtGEDo1m55Nu4vSXCdVFEAEvYPKYfwxo2RY0Vyibnx3sVkybv HCYn+1fEruGcK7UDmbndDL1ZrUbxlDTPCF/KZ872J+NEZhBY84VMiqG6kpkRYrASpIcb dpkwG0Lu0Cleil/9gyD/IVt8vy92/Hg2gPeL5EHYBMzBBuCzOJj2uG1nBwXLBCYxjLVH SZZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717755855; x=1718360655; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2KbBL1PuW9dZflYjkgNMLA3gaVvoqkOohso43BzxQOU=; b=BHnl1D7nkzc+HfHVGxRlyVJ0dcyOxgJVh5OXfEIy49bzFpb5S9+Fzpi7au0Mun+RIU VglYT91Pk9l3G0Yk7fJ+T/sJP2boNxrb0VxeMziHGxoiMjj8m2XTXDk4t37EY4diaOVJ KXwk7vM3kNIqMBkBcDpQEiXFq/iDOXKudiXrXpXG8lGcxmSjbFOeL2/iQ2dxSQD05gnR qHwmEHSSw3r39DOvH5gQppVVJG2XNZcbnEpnzemEMaKuog9Nd61ZdErS0+ZKspj/0jOr GLNvFCsASqWAYk/RcEkU9SdzxG91d8NvyW4hvLOReqrEhfXI8f1TIug3ONOr4Z8EaEGT wj1A== X-Forwarded-Encrypted: i=1; AJvYcCXMD6V2Z6p0HkFWfj9nkvPxrT6wDuzkLCT9QN4DrCG9p+nEk8PDNOz+cu87gmLN9I1whLALtdHIGbQ9OL7IxFQKxyQ= X-Gm-Message-State: AOJu0YwGkKgSbc3YcGqJmDthWwL4EahG42PzkAfyiB7EutdeGiRczKuH wbu0cuC4kpbryrrjvbGNsh2X25yel2Mg12Dc7tKL+vp9hgHGeeZo X-Google-Smtp-Source: AGHT+IEkLyn8YxaJR3wRmKM+H8+esWqMEZzrZFZ7VTEYbOM7qdJuVUFc8p+J9hlrCdPt6evtrwJnrQ== X-Received: by 2002:a17:906:bc55:b0:a68:b839:485a with SMTP id a640c23a62f3a-a6cdb9d9bacmr153202766b.77.1717755854329; Fri, 07 Jun 2024 03:24:14 -0700 (PDT) Received: from ?IPV6:2a03:83e0:1126:4:eb:d0d0:c7fd:c82c? ([2620:10d:c092:500::7:fe7a]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a6c8070c050sm227963566b.144.2024.06.07.03.24.13 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 07 Jun 2024 03:24:14 -0700 (PDT) Message-ID: Date: Fri, 7 Jun 2024 11:24:13 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 1/2] mm: clear pte for folios that are zero filled To: David Hildenbrand , akpm@linux-foundation.org, shakeel.butt@linux.dev, yosryahmed@google.com, willy@infradead.org Cc: hannes@cmpxchg.org, nphamcs@gmail.com, chengming.zhou@linux.dev, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@meta.com References: <20240604105950.1134192-1-usamaarif642@gmail.com> <20240604105950.1134192-2-usamaarif642@gmail.com> <6b1485b6-c2a1-45b8-8afe-7b211689070b@redhat.com> Content-Language: en-US From: Usama Arif In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: hmoqzf4de14oqbfg4ygctt8ya716egrz X-Rspamd-Queue-Id: 69592A000D X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1717755856-560635 X-HE-Meta: U2FsdGVkX19Y60W8sFYVWAIDy3GPamwkxaAjidEoPmXmrJbkopFhFBY9/y2tZ+rau6St22whg//kd8kW7nY9HJkZgLkjesJ/8toT/Oxwh0RrCvTQELeVksUA/XRQgbGLElF/BHTZqHMWIXe1BVtowHd59egIdZXQFxAezNqPb1uMGH3UJW/2uiuuZDDWv5JyYsWR2LqFShSb96IzEImsg9xzvLMqzouSjJlbmE8IlPDKDDYu1wfp/aK6q/PenFGUbvbSOcj8rbXTJMTfL+f27Ajf8jJ+MRUPHA5EV+D86m7Hp6UuDKEisDKsT1fUwHcq5K+/1DOva9Q4Wr+8R9+lG6QcXdpE+aSWgFGFI2R1d5Es2asrafxCRmsnb66LyhrbSSYxyMv2AIvScfraFoanA2a7jQZNHUuDB+fFSyN1yPfyJEMd3xtR3T7T943X/aaJPE8WvdZFyAoKC6uuDn8clpaW36kO7OVKnazTL5K/jkWuJiiIKAZ8SK1djqBchQdI40s/vRN1q7Mg7/QQ+srLOvVTtP490Nkcw4up55rLy8zwtIhFKwy7V4Ih8Al9YVP4LSoJDSzaXI2WUuUMVhMpS8jH2XeNcCYg5X7EDVjtA9fPAkZ0pfJFsovUvOpuFrdEzX5tG8GGgY1dmzXMpLTOQt8dGKBdc2Ha9z0wxz866h3OU4bcYq6bTg7rlrbK0cYPvkZfqqM5jIDsSZkzWxhG6zP7DP97QOA1pY5CPkA1nVplucXNgnSWDmBoeqc8vN4FRa8IhwnCkvjM6JTa//qaaT8X9vtLZXRxWdad4hATJPXemHtzNAd3namskKMnLwCP88n4fBsOVxayeeln/d+YvMzWZoA0CE488Pl3CTjOt841znQcUtNveESRkXYpMuOuEz073jlJrHqtRkqVwc3RWpHUn9SLCeNTOdqDOr9x0aKiL0Fi+H4aMo22OoyvrCKZajtNS9yZkeVSNpW19Tw LR/99H0k hcHwqaPrbwyEvhlRlMLIXNtz6f6DkqWad1v57AfGazZCGUfOZ3iPHlI0lbjZnaG0ZnTxD0xlssmWbkINm9ItnkZCkEZMYb6NnAnV/IxGS8ho3fAYZ8SpamdgO9VGLEUoq/KOGR6m9CB4XlL7R0YTnDcGY549PsUkQxFJVz7AdK7qsR1iuLKUiOW40N730IDairX38lwFs031Re/TB/ELiv0naxiYjV6MZ+TXlnZTxq+H/gJ2NyP3qsmbEOa1S178NS3cI0l2t7t5b+wUgSkqQSyRSielc7tdNXztCQBK9UTJfOVgQg+5mRAsyHzxY5+Nymf5CxgMEks6WkaOlIYmwHVtEyLo2qXikIDoF4eqyf6rsIlW/cWGNlFCfDxyvvKb+yfeUf4MNLKZoGEZhXfVpQ43OavjGH/BQBmtuJa/J0+HrQDhXIvq0azIYo3dL5GHULjaoxTFndimlfrW+apMZ6su3ZoGWsKw/qDwpKxRn0FQdIOTKUuDry0aI4b9NTXH/rR7bC8hOvz90Ty0fDvKwbVJFnoLFed6A1N1VIl4rihEfnoXKN0l0ra0oS1mq9gfXtJSDJy3ul2TvZSLWieU286Chij7NADjiOB5IdAA2ppjBuzxmqjE6pGerUMkDsUozaokw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 04/06/2024 13:43, David Hildenbrand wrote: > On 04.06.24 14:30, David Hildenbrand wrote: >> On 04.06.24 12:58, Usama Arif wrote: >>> Approximately 10-20% of pages to be swapped out are zero pages [1]. >>> Rather than reading/writing these pages to flash resulting >>> in increased I/O and flash wear, the pte can be cleared for those >>> addresses at unmap time while shrinking folio list. When this >>> causes a page fault, do_pte_missing will take care of this page. >>> With this patch, NVMe writes in Meta server fleet decreased >>> by almost 10% with conventional swap setup (zswap disabled). >>> >>> [1] >>> https://lore.kernel.org/all/20171018104832epcms5p1b2232e2236258de3d03d1344dde9fce0@epcms5p1/ >>> >>> Signed-off-by: Usama Arif >>> --- >>>    include/linux/rmap.h |   1 + >>>    mm/rmap.c            | 163 >>> ++++++++++++++++++++++--------------------- >>>    mm/vmscan.c          |  89 ++++++++++++++++------- >>>    3 files changed, 150 insertions(+), 103 deletions(-) >>> >>> diff --git a/include/linux/rmap.h b/include/linux/rmap.h >>> index bb53e5920b88..b36db1e886e4 100644 >>> --- a/include/linux/rmap.h >>> +++ b/include/linux/rmap.h >>> @@ -100,6 +100,7 @@ enum ttu_flags { >>>                         * do a final flush if necessary */ >>>        TTU_RMAP_LOCKED        = 0x80,    /* do not grab rmap lock: >>>                         * caller holds it */ >>> +    TTU_ZERO_FOLIO        = 0x100,/* zero folio */ >>>    }; >>>       #ifdef CONFIG_MMU >>> diff --git a/mm/rmap.c b/mm/rmap.c >>> index 52357d79917c..d98f70876327 100644 >>> --- a/mm/rmap.c >>> +++ b/mm/rmap.c >>> @@ -1819,96 +1819,101 @@ static bool try_to_unmap_one(struct folio >>> *folio, struct vm_area_struct *vma, >>>                 */ >>>                dec_mm_counter(mm, mm_counter(folio)); >>>            } else if (folio_test_anon(folio)) { >>> -            swp_entry_t entry = page_swap_entry(subpage); >>> -            pte_t swp_pte; >>> -            /* >>> -             * Store the swap location in the pte. >>> -             * See handle_pte_fault() ... >>> -             */ >>> -            if (unlikely(folio_test_swapbacked(folio) != >>> -                    folio_test_swapcache(folio))) { >>> +            if (flags & TTU_ZERO_FOLIO) { >>> +                pte_clear(mm, address, pvmw.pte); >>> +                dec_mm_counter(mm, MM_ANONPAGES); >> >> Is there an easy way to reduce the code churn and highlight the added >> code? >> >> Like >> >> } else if (folio_test_anon(folio) && (flags & TTU_ZERO_FOLIO)) { >> >> } else if (folio_test_anon(folio)) { >> >> >> >> Also to concerns that I want to spell out: >> >> (a) what stops the page from getting modified in the meantime? The CPU >>       can write it until the TLB was flushed. >> Thanks for pointing this out David and Shakeel. This is a big issue in this v2, and as Shakeel pointed out in [1] we need to do a second rmap walk. Looking at how ksm deals with this in try_to_merge_one_page which calls write_protect_page for each vma (i.e. basically an rmap walk), this would be much more CPU expensive and complicated compared to v1 [2], where the swap subsystem can handle all complexities. I will go back to my v1 solution for the next revision as its much more simpler and the memory usage is very low (0.003%) as pointed out by Johannes [3] which would likely go away with the memory savings of not having a zswap_entry for zero filled pages, and the solution being a lot simpler than what a valid v2 approach would look like. [1] https://lore.kernel.org/all/nes73bwc5p6yhwt5tw3upxcqrn5kenn6lvqb6exrf4yppmz6jx@ywhuevpkxlvh/ [2] https://lore.kernel.org/all/20240530102126.357438-1-usamaarif642@gmail.com/ [3] https://lore.kernel.org/all/20240530122715.GB1222079@cmpxchg.org/ >> (b) do you properly handle if the page is pinned (or just got pinned) >>       and we must not discard it? > > Oh, and I forgot, are you handling userfaultd as expected? IIRC there > are some really nasty side-effects with userfaultfd even when > userfaultfd is currently not registered for a VMA [1]. > > [1] > https://lore.kernel.org/linux-mm/3a4b1027-df6e-31b8-b0de-ff202828228d@redhat.com/ > > What should work is replacing all-zero anonymous pages by the shared > zeropage iff the anonymous page is not pinned and we synchronize > against GUP fast. Well, and we handle possible concurrent writes > accordingly. > > KSM does essentially that when told to de-duplicate the shared > zeropage, and I was thinking a while ago if we would want a > zeropage-only KSM version that doesn't need stable tress and all that, > but only deduplicates zero-filled pages into the shared zeropage in a > safe way. > Thanks for the pointer to KSM code.