From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F1E9C04A94 for ; Thu, 10 Aug 2023 11:35:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 748E36B0074; Thu, 10 Aug 2023 07:35:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F93E6B0075; Thu, 10 Aug 2023 07:35:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C13E6B0078; Thu, 10 Aug 2023 07:35:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4D3426B0074 for ; Thu, 10 Aug 2023 07:35:57 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 177C31A0D38 for ; Thu, 10 Aug 2023 11:35:57 +0000 (UTC) X-FDA: 81107990754.16.30EAB2F Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf26.hostedemail.com (Postfix) with ESMTP id 4B431140017 for ; Thu, 10 Aug 2023 11:35:54 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691667354; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FDB5cDOTwE/xiqlOoXbz41aj9zmvLFqGiBFWMD8GZ/4=; b=YWma+HVMISXJ2p7cvKe0Bg4fxGBdIAC6LDMO/wLxid/XfbN00a6Bak3mtHNhtyAP0YkRG/ yurI4XQ3A1dErQOu+9FAz4XDa7J/2Hm25R63HMh0dP4tKOSaDfg1+jo1qdDlW3ye1DbU6J WNb07lfZpGYDEYr/qb5EUnaFx8Q6Yoo= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691667354; a=rsa-sha256; cv=none; b=jg1hlc/oMOx0eqUTLi5Tum+40MsghYskItpK/1hTed+B9jznU1Heh1hQZ5lSZtbMnin10q 3jo/E4KlQNQk+iMmgeUEzW1oGUSlE4aKbyPlJonqlsN6sWZYkyODEMi/mZmbGY16YAnLCj 58n2TZ+F4eRhOuW2MuVMLIXQiEvX2X8= Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id A6B4DD75; Thu, 10 Aug 2023 04:36:35 -0700 (PDT) Received: from [10.1.27.169] (XHFQ2J9959.cambridge.arm.com [10.1.27.169]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id A63703F64C; Thu, 10 Aug 2023 04:35:51 -0700 (PDT) Message-ID: Date: Thu, 10 Aug 2023 12:35:50 +0100 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios Content-Language: en-GB To: David Hildenbrand , linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org, Andrew Morton , Jonathan Corbet , Mike Kravetz , Hugh Dickins , "Matthew Wilcox (Oracle)" , Yin Fengwei , Yang Shi , Zi Yan References: <20230809083256.699513-1-david@redhat.com> <181fcc79-b1c6-412f-9ca1-d1f21ef33e32@arm.com> <60b5b2a2-1d1d-661c-d61e-855178fff44d@redhat.com> <08f066e9-f137-f96c-df63-d5ccc8138470@redhat.com> From: Ryan Roberts In-Reply-To: <08f066e9-f137-f96c-df63-d5ccc8138470@redhat.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 4B431140017 X-Rspam-User: X-Stat-Signature: zj3bnhazmm1i77zibz44nyz6ygxfmro1 X-Rspamd-Server: rspam01 X-HE-Tag: 1691667354-467489 X-HE-Meta: U2FsdGVkX1+8eAjIn5bH0VMd9hUJwK7hst5Q29JlqvmnmXmf6ynQlsVLKF519cRPflA5qPmdM8VO7MVs2Gk98KvdBEzVhLfc95thRpaLXO3aM4kNFU2VNphnh8mYeQ4Uc3vkZMhlXQVww6zjtNSkyFyaUrazqGXtsBMyNo/rPJLPx/LSceUhdDWtkhVISavP5PSEdaOzDW0lpmrhzytwmVbd15YaMqh1OA9EWoFqKiDsQOGSDnZ+/RfMD2rSzkFliBnkqfYFqHD0qK3YfMdsc1HsaRvDQ49rB6YLXeKO+0lQzUp8q8FShndF21QQc8Mw3XK00gcA/enm+0gjSfYjbi1uoP6rklnVGHqZ1OKYou7Sv9eas1/SIp+7V52PAdllWn9xXxX5KJQGDtuZc1VuqTf3RpScd2xhCk93EQNA/RDhQO86lhceN2saXfKjlSbTsvl1VJm3laXiqNxkH6yYaZ2Q97LtvaNuc2IMrfOU4LupeQuj5uiC85AvUKwojDMz/zImhp6A0a6AMGdFOjpfHKtoj6QCn6BOFVZbWWcI0Gmc08HUU0K5zDUSjSK8godDj9sA7/n2pa/mWWfAIajKxEuRFHcxzEioZ9+QbboRaPL+R2q/1seXeBljdIQnreaDg8/VfUfkn5bfPcawI9pZjCef5QYQvQlMRvHfw9Wdmh0y82pa199+r1nFrfQKtnjLBQF416W9ZrBiT9Jnb1rLVLDgcCjeq9Pu29bNjlPAZSiVpBgS3QujgJASGpJA0Ah33KBryyAYOlhm0HX0ZC8v42vMmxJ07ofgjg7YeNK3rF9EyA0hNYImfNPvw+LTF3vGhAXF5MjWLNOrUYfAlBqh6uBugBbW6HlJZUjAUldWjl0x1W2EIyAWkPomC3vRq3RFffYOZ8o0VxkLIrrwM9NQW2OMAhDYHaRKveS0cciwtxWtAChwQAFHD4uZgJUHF1VntJmuaR9upKahaLqOS0l 0skA2vqV 5ITWLzwnbLZgOCjkI3g4oahd5TiVZPvbbxM1QI/vtNbLjg0vPn+KCg2Bon+kH7CPjxKEG7DACZL5uUa1UHqRXOFgloRzEc0VG9jgZTYkXlcbjGAzxYJ3z/okHXfJ2wrqW7QjyOcfg79MnwNQ8d6JEDRhXTwg8J+TYlDsWlwBHyrrfTnZ3+hCafgL2MJJVQZ8vLPZf2q5upQTK11+PQTEEGqo2RM9HeIiCNhzJqOR81zD8xBf9xUn3BNp801eMrbukdfGbyKCYOUiu3ByL2VKu1X6eCCI0NSELxx5V8DHugKCj2bG1lu35x0ZKza2qWD9DXAEoMK4f29v5LcuUkL+XhncVFQo+fYWeV24JEijXEQIyszwBRUEbUR+k7aZIAd7dHrFlREpRjtUmApTElUiaNtNUwv3QBF8Ga2YvFm6Ouy5eZnWNfOCdBCWoKiNSK4+K12RHpfoBQd+LKv5ax1clNdU1VxuTZN1XgLwQPHJg7DvOFEQS1wqgAtn83jVkeR1EY7JmwnE/1F/wqcRYnPGPO2yJlA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/08/2023 12:32, David Hildenbrand wrote: > On 10.08.23 13:27, David Hildenbrand wrote: >> On 10.08.23 13:14, David Hildenbrand wrote: >>> On 09.08.23 21:17, David Hildenbrand wrote: >>>> On 09.08.23 21:07, Ryan Roberts wrote: >>>>> On 09/08/2023 09:32, David Hildenbrand wrote: >>>>>> Let's track the total mapcount for all large folios in the first subpage. >>>>>> >>>>>> The total mapcount is what we actually want to know in folio_mapcount() >>>>>> and it is also sufficient for implementing folio_mapped(). This also >>>>>> gets rid of any "raceiness" concerns as expressed in >>>>>> folio_total_mapcount(). >>>>>> >>>>>> With sub-PMD THP becoming more important and things looking promising >>>>>> that we will soon get support for such anon THP, we want to avoid looping >>>>>> over all pages of a folio just to calculate the total mapcount. Further, >>>>>> we may soon want to use the total mapcount in other context more >>>>>> frequently, so prepare for reading it efficiently and atomically. >>>>>> >>>>>> Make room for the total mapcount in page[1] of the folio by moving >>>>>> _nr_pages_mapped to page[2] of the folio: it is not applicable to hugetlb >>>>>> -- and with the total mapcount in place probably also not desirable even >>>>>> if PMD-mappable hugetlb pages could get PTE-mapped at some point -- so we >>>>>> can overlay the hugetlb fields. >>>>>> >>>>>> Note that we currently don't expect any order-1 compound pages / THP in >>>>>> rmap code, and that such support is not planned. If ever desired, we could >>>>>> move the compound mapcount to another page, because it only applies to >>>>>> PMD-mappable folios that are definitely larger. Let's avoid consuming >>>>>> more space elsewhere for now -- we might need more space for a different >>>>>> purpose in some subpages soon. >>>>>> >>>>>> Maintain the total mapcount also for hugetlb pages. Use the total mapcount >>>>>> to implement folio_mapcount(), total_mapcount(), folio_mapped() and >>>>>> page_mapped(). >>>>>> >>>>>> We can now get rid of folio_total_mapcount() and >>>>>> folio_large_is_mapped(), by just inlining reading of the total mapcount. >>>>>> >>>>>> _nr_pages_mapped is now only used in rmap code, so not accidentially >>>>>> externally where it might be used on arbitrary order-1 pages. The remaining >>>>>> usage is: >>>>>> >>>>>> (1) Detect how to adjust stats: NR_ANON_MAPPED and NR_FILE_MAPPED >>>>>>      -> If we would account the total folio as mapped when mapping a >>>>>>         page (based on the total mapcount), we could remove that usage. >>>>>> >>>>>> (2) Detect when to add a folio to the deferred split queue >>>>>>      -> If we would apply a different heuristic, or scan using the rmap on >>>>>>         the memory reclaim path for partially mapped anon folios to >>>>>>         split them, we could remove that usage as well. >>>>>> >>>>>> So maybe, we can simplify things in the future and remove >>>>>> _nr_pages_mapped. For now, leave these things as they are, they need more >>>>>> thought. Hugh really did a nice job implementing that precise tracking >>>>>> after all. >>>>>> >>>>>> Note: Not adding order-1 sanity checks to the file_rmap functions for >>>>>>           now. For anon pages, they are certainly not required right now. >>>>>> >>>>>> Cc: Andrew Morton >>>>>> Cc: Jonathan Corbet >>>>>> Cc: Mike Kravetz >>>>>> Cc: Hugh Dickins >>>>>> Cc: "Matthew Wilcox (Oracle)" >>>>>> Cc: Ryan Roberts >>>>>> Cc: Yin Fengwei >>>>>> Cc: Yang Shi >>>>>> Cc: Zi Yan >>>>>> Signed-off-by: David Hildenbrand >>>>> >>>>> Other than the nits and query on zeroing _total_mapcount below, LGTM. If >>>>> zeroing >>>>> is correct: >>>>> >>>>> Reviewed-by: Ryan Roberts >>>> >>>> Thanks for the review! >>>> >>>> [...] >>>> >>>>>>           static inline int total_mapcount(struct page *page) >>>>> >>>>> nit: couldn't total_mapcount() just be implemented as a wrapper around >>>>> folio_mapcount()? What's the benefit of PageCompound() check instead of just >>>>> getting the folio and checking if it's large? i.e: >>>> >>>> Good point, let me take a look tomorrow if the compiler can optimize in >>>> both cases equally well. >>> >>> I checked by adjusting total_mapcount(): >>> >>> Before: >>> >>>            if (PageTransHuge(page) && total_mapcount(page) > 1) >>> ffffffff81411931:       4c 89 e7                mov    %r12,%rdi >>> ffffffff81411934:       e8 f7 b1 ff ff          call   ffffffff8140cb30 >>> >>> ffffffff81411939:       85 c0                   test   %eax,%eax >>> ffffffff8141193b:       74 29                   je     ffffffff81411966 >>> >>> ffffffff8141193d:       49 8b 04 24             mov    (%r12),%rax >>>            return test_bit(PG_head, &page->flags) || >>> ffffffff81411941:       a9 00 00 01 00          test   $0x10000,%eax >>> ffffffff81411946:       0f 85 1f 01 00 00       jne    ffffffff81411a6b >>> >>>                   READ_ONCE(page->compound_head) & 1; >>> ffffffff8141194c:       49 8b 44 24 08          mov    0x8(%r12),%rax >>>            return test_bit(PG_head, &page->flags) || >>> ffffffff81411951:       a8 01                   test   $0x1,%al >>> ffffffff81411953:       0f 85 12 01 00 00       jne    ffffffff81411a6b >>> >>> ffffffff81411959:       41 8b 44 24 30          mov    0x30(%r12),%eax >>>                    return atomic_read(&page->_mapcount) + 1; >>> ffffffff8141195e:       83 c0 01                add    $0x1,%eax >>> ffffffff81411961:       83 f8 01                cmp    $0x1,%eax >>> ffffffff81411964:       7f 77                   jg     ffffffff814119dd >>> >>> >>> So a total of 10 instructions after handling the mov/call/test/je for >>> PageTransHuge(). >>> >>> After: >>> >>>            if (PageTransHuge(page) && total_mapcount(page) > 1) >>> ffffffff81411931:       4c 89 e7                mov    %r12,%rdi >>> ffffffff81411934:       e8 f7 b1 ff ff          call   ffffffff8140cb30 >>> >>> ffffffff81411939:       85 c0                   test   %eax,%eax >>> ffffffff8141193b:       74 2f                   je     ffffffff8141196c >>> >>>            unsigned long head = READ_ONCE(page->compound_head); >>> ffffffff8141193d:       49 8b 44 24 08          mov    0x8(%r12),%rax >>>            if (unlikely(head & 1)) >>> ffffffff81411942:       a8 01                   test   $0x1,%al >>> ffffffff81411944:       0f 85 fc 05 00 00       jne    ffffffff81411f46 >>> >>> ffffffff8141194a:       0f 1f 44 00 00          nopl   0x0(%rax,%rax,1) >>>                    return page; >>> ffffffff8141194f:       4c 89 e0                mov    %r12,%rax >>> ffffffff81411952:       48 8b 10                mov    (%rax),%rdx >>>            if (likely(!folio_test_large(folio))) >>> ffffffff81411955:       f7 c2 00 00 01 00       test   $0x10000,%edx >>> ffffffff8141195b:       0f 85 da 05 00 00       jne    ffffffff81411f3b >>> >>> ffffffff81411961:       8b 40 30                mov    0x30(%rax),%eax >>>                    return atomic_read(&folio->_mapcount) + 1; >>> ffffffff81411964:       83 c0 01                add    $0x1,%eax >>> ffffffff81411967:       83 f8 01                cmp    $0x1,%eax >>> ffffffff8141196a:       7f 77                   jg     ffffffff814119e3 >>> >>> >>> So a total of 11 (+ 1 NOP) instructions after handling the mov/call/test/je >>> for PageTransHuge(). >>> >>> Essentially one more MOV instruction. >>> >>> I guess nobody really cares :) >>> >> >> Also, let's simply do: >> >> static inline bool folio_mapped(struct folio *folio) >> { >>     folio_mapcount(folio) > 1; > >> 0, obviously :) > ;-) All sounds good to me!