From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F0D6BC001B0 for ; Fri, 11 Aug 2023 22:18:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 348DF6B0074; Fri, 11 Aug 2023 18:18:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F9916B0078; Fri, 11 Aug 2023 18:18:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 199AA6B007B; Fri, 11 Aug 2023 18:18:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 074696B0074 for ; Fri, 11 Aug 2023 18:18:32 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 412A6A127F for ; Fri, 11 Aug 2023 22:18:24 +0000 (UTC) X-FDA: 81113238822.01.B4F8583 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 7497E120015 for ; Fri, 11 Aug 2023 22:18:28 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fLUNQ60F; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691792308; a=rsa-sha256; cv=none; b=sntcBQkPFjMBja+gI1pdRdAWP65s6EAxWBKbVn+tYJcQIsMzU7qKiLluQLLaWeqDuZfC9z 6AHT09MRWd9MQBz5WTKaqAeBz/01E55Sp6RdHJCKFKtpa9Bx0Gc0ONbRcjE4at3hb0yAIY dWDG9Pfx8gj8cX/WJQ1NntaXqs44aAQ= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=fLUNQ60F; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf29.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691792308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=bmeas2NEzxsPQpZufMSchtFdVh8A95sRbIu+0MZwtg4=; b=n5nU20Pyqcde29pcGsfDPB02pBcBKV0HYr/FIy/P3OTs7FPLHrEUmS0gJHP3oKvgi5RNXu zL5g7OnRswEBA1sOjPeWl1QPtpV/6LRobhy3kW59/mMtOPhlKEEwulBCIhPQ2IjEt4GaZJ I4n/huO1F4VtTgxcgFccK9LHpFAktPQ= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1691792307; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=bmeas2NEzxsPQpZufMSchtFdVh8A95sRbIu+0MZwtg4=; b=fLUNQ60FqxHQwfZ3xxC63cIjxiPyaEKzuHh4biyB7XuWnYhwP/QcezItG9RcGEZJrKtI7O htzpGaoOI0/XwEeJgI4ik84dhHeauOWOtI/DeKhY7V1eTbEzPIuc25MO6K7JgG7PdHkuHG ul8dMMDoil9CCgT/KKAJIGDmfXMpQsc= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-308-5IqK2RNmMJaWYG_PFnYjbA-1; Fri, 11 Aug 2023 18:18:26 -0400 X-MC-Unique: 5IqK2RNmMJaWYG_PFnYjbA-1 Received: by mail-qk1-f200.google.com with SMTP id af79cd13be357-76cb292df12so73060685a.0 for ; Fri, 11 Aug 2023 15:18:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691792306; x=1692397106; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=bmeas2NEzxsPQpZufMSchtFdVh8A95sRbIu+0MZwtg4=; b=VHs9z9yItFhg9BBt/VIRTAtwQPVnMJiA2ll2VpA24zwf/0CRsN1b/valT+0E70uUwZ YVwE1eXFwIOFH6udZ9Dsm3Hp+u0i1NXiaK4wikBxJLYZiFiOmCZb9AymC7swcBYIr6DQ hv2gG43HIbVW6DC0/vY7Ovf4yIXIgR2IT/ZRDNVXHwsf0adW6Q6WE5GUFEX8ctLWohmj soKlFrEy/Y+8EaJFuzU3dXlKg51t596WxU+b2ReHlDQxFNsjRsuBlDNkpRGGXyjEmuRA 22vXQjt/3g+Pusxih1H8pMNx/vgHn13+tISopQFV33VqMb9bSc4Ajk0Uw6LSSrJaX7yo pEpg== X-Gm-Message-State: AOJu0YynqtHGpr7utIi7VGNw6kkU/3x88xnfXVjeQJAiTvaoscblzLbL lE+5VRmd/4PBU9u85toGHcVpV+I3ifiCdovHnhvdbjH9ffGDuEk3LterD7zaoJ+JALbWtkpNp6P vqe9qlFEPX+A= X-Received: by 2002:ac8:5995:0:b0:40f:f509:3a85 with SMTP id e21-20020ac85995000000b0040ff5093a85mr3138548qte.6.1691792305854; Fri, 11 Aug 2023 15:18:25 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGUxOrMW7NnKDkIKGGX7eyKW3GFX6pksc6x+BXUwwIhVMYQQt6D3U4Z0wLLpDBFfKmHZ6W4Fw== X-Received: by 2002:ac8:5995:0:b0:40f:f509:3a85 with SMTP id e21-20020ac85995000000b0040ff5093a85mr3138525qte.6.1691792305540; Fri, 11 Aug 2023 15:18:25 -0700 (PDT) Received: from x1n (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id h11-20020ac8744b000000b00403ad6ec2e8sm1444566qtr.26.2023.08.11.15.18.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 11 Aug 2023 15:18:24 -0700 (PDT) Date: Fri, 11 Aug 2023 18:18:23 -0400 From: Peter Xu To: Zi Yan Cc: David Hildenbrand , Matthew Wilcox , Ryan Roberts , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-doc@vger.kernel.org, Andrew Morton , Jonathan Corbet , Mike Kravetz , Hugh Dickins , Yin Fengwei , Yang Shi Subject: Re: [PATCH mm-unstable v1] mm: add a total mapcount for large folios Message-ID: References: <8222bf8f-6b99-58f4-92cc-44113b151d14@redhat.com> <8aac858e-0f12-4b32-e9df-63c76bdf2377@redhat.com> <14C73423-C643-4B72-B3DD-573F5636B5E0@nvidia.com> MIME-Version: 1.0 In-Reply-To: <14C73423-C643-4B72-B3DD-573F5636B5E0@nvidia.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7497E120015 X-Stat-Signature: sh399ad7kfxjhq1no6w55nfuf3oqbghd X-HE-Tag: 1691792308-69399 X-HE-Meta: U2FsdGVkX18zbSHjiAveTNoy27bvIQ0P6HFXSREj4q2D8FLqBIdd0uRk6L1aws/APDwLKUnbll6BG7++EAg3EoHgtBncSJZehNLvmTt9VAxJOC9A4StJ4Ot0xY1RS4um/b+lt6j47HW5coJ/CpyWmZayYwglDBye9TIlY0wuTl4xSXw1/Xy8kOESFxdbvEOgEtjYoNUFnA0B18iNj0nqDyPGJ1GvJeb/ghhJHSjun8W9HlWT1PCq4Dp+NXajmcXPpAN6F3n5qkE3frQaMRASPkVztEYFhTMdx5AuzgfjEwKlponeG/oWYQepiRBedfgSdEBE1LPGvNn/g5yDBc1cg/Zqqr0noAxiYRWdbLpXAJOyEHg7GH3d6rryKYMgwt9RoMQgQ3pVnDpGP4kWycQYYV60mVSz7PitsZin9goQZlxpIYva5WaZM4xNjODDhinPU5mC9+3YrwHvKuFMAVagvL2DZ9PWcAZL19j65amP3Tcq/h4Pqt2JFdcM43JaJw5d8Bz06zimYwkeXinNL4WOyFLaJqUF4L8hEadQ5ocFdT191inhWyC/+D2sSq7fs2gGub+wbDrfqWsGm6A7rGSlCmx5BbB4te8HjIXXJkKwD4PqwJvAER/w/Z90Y7kpTzzDs1uTvrkE6dStzOMdu3gu5l/IStjW/LEBzKVU5CjyEO+M39a7vtU9cL2Byw79muSvjNvxepwigNfFhnNYPS5wToLu46FAzvXRlHKFxTsbzmWpfxvqU8nzvjDLb1XtTD8vXQNj9dgt0UHFFQjE/W3G4stdaPNyOo8ybRCZVbbNBcquOk1hebIXL0rihSX44SX5RY6QjOdl1piF8jqxFT/7f+qi++FOzrqah8B4jRw3Nn5qNohz5AAX9JRk99NgfApbXkTFSMnUvVTeEG63mMnSICwK/1uX8Jq5QSO6389g7zdC/6VwECcvX/GaFlaFwir2cXGAX7P/vteB1hfJJJe gpKA6eNh pewji3M5E1AnBo/qUA7vL5RrfO+Anadw9ip2QPKzp/c/3ZTtQomFikdBJFXN8d/SOqKHFDrB6xQw0jsFvasg3DcRjWoraObsKO7kZbgkNpITbz4zyykz40h2d1Uld0oakhFCevmy8y5GdoBivSTI90j77rKn2JXlKHb3/2TXOueEeOnLXi2BKvO2YtR4yIR7+xmjAzzpk5lvjAOhqrH2VlO1LuHiofXIyI1H5A1iCBmzVS1JDNzc5P7G5QA6ZwtSfnw0YDpzKvksYmqV7KRhXsY5QVFK9zIdyLH0/HlkoHlRhBvIL4Sa/AgRt2PxTRTgC8HWFRuaMQWvqvEaO1xzdHuVFGdcisO0RTzLREUdCduX3ITtrMGbAL7FP/w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Aug 11, 2023 at 12:11:55PM -0400, Zi Yan wrote: > On 11 Aug 2023, at 12:08, David Hildenbrand wrote: > > > On 11.08.23 17:58, Peter Xu wrote: > >> On Fri, Aug 11, 2023 at 05:32:37PM +0200, David Hildenbrand wrote: > >>> On 11.08.23 17:18, Peter Xu wrote: > >>>> On Fri, Aug 11, 2023 at 12:27:13AM +0200, David Hildenbrand wrote: > >>>>> On 10.08.23 23:48, Matthew Wilcox wrote: > >>>>>> On Thu, Aug 10, 2023 at 04:57:11PM -0400, Peter Xu wrote: > >>>>>>> AFAICS if that patch was all correct (while I'm not yet sure..), you can > >>>>>>> actually fit your new total mapcount field into page 1 so even avoid the > >>>>>>> extra cacheline access. You can have a look: the trick is refcount for > >>>>>>> tail page 1 is still seems to be free on 32 bits (if that was your worry > >>>>>>> before). Then it'll be very nice if to keep Hugh's counter all in tail 1. > >>>>>> > >>>>>> No, refcount must be 0 on all tail pages. We rely on this in many places > >>>>>> in the MM. > >>>>> > >>>>> Very right. > >>>> > >>>> Obviously I could have missed this in the past.. can I ask for an example > >>>> explaining why refcount will be referenced before knowing it's a head? > >>> > >>> I think the issue is, when coming from a PFN walker (or GUP-fast), you might > >>> see "oh, this is a folio, let's lookup the head page". And you do that. > >>> > >>> Then, you try taking a reference on that head page. (see try_get_folio()). > >>> > >>> But as you didn't hold a reference on the folio yet, it can happily get > >>> freed + repurposed in the meantime, so maybe it's not a head page anymore. > >>> > >>> So if the field would get reused for something else, grabbing a reference > >>> would corrupt whatever is now stored in there. > >> > >> Not an issue before large folios, am I right? Because having a head page > >> reused as tail cannot happen iiuc with current thps if only pmd-sized, > >> because the head page is guaranteed to be pmd aligned physically. > > > > There are other users of compound pages, no? THP and hugetlb are just two examples I think. For example, I can spot __GFP_COMP in slab code. > > > > Must such compound pages would not be applicable to GUP, though, but to PFN walkers could end up trying to grab them. > > > For FS supporting large folios, their page cache pages can be any order <= PMD_ORDER. > See page_cache_ra_order() in mm/readahead.c Ah yes.. > > >> > >> I don't really know, where a hugetlb 2M head can be reused by a 1G huge > >> later right during the window of fast-gup walking. But obviously that's not > >> common either if that could ever happen. > >> > >> Maybe Matthew was referring to something else (per "in many places")? > > > > There are some other cases where PFN walkers want to identify tail pages to skip over them. See the comment in has_unmovable_pages(). Indeed. Thanks! -- Peter Xu