From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7636C04A6A for ; Mon, 14 Aug 2023 23:01:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 62EBF940007; Mon, 14 Aug 2023 19:01:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5DE6590000B; Mon, 14 Aug 2023 19:01:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47F12940007; Mon, 14 Aug 2023 19:01:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 37EBC90000B for ; Mon, 14 Aug 2023 19:01:32 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 022FB140832 for ; Mon, 14 Aug 2023 23:01:31 +0000 (UTC) X-FDA: 81124233624.09.317B953 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf10.hostedemail.com (Postfix) with ESMTP id 14BECC0027 for ; Mon, 14 Aug 2023 23:01:29 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=eMEuDRJl; dmarc=none; spf=none (imf10.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=rdunlap@infradead.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692054090; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=f8o/psJ0TXPeTYoQOnqypZJbdp1Q5AMgt88myFFBs/4=; b=gpzgLjMIPjvw0x2F7zCiKKePl/m4x6KWje6Yawj2ZkPfySvE3kOodnUtlhpwFe1CDaWOO4 mahTnKmu1qr6WRwQqGeCjWMuZWtTGnw0K9fi6KDikwT8Zm91M70vMPsYOJAzL34TZdQtEg JRzurVMU1w83hmPhc/PKCesQqz0Pubc= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=eMEuDRJl; dmarc=none; spf=none (imf10.hostedemail.com: domain of rdunlap@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=rdunlap@infradead.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692054090; a=rsa-sha256; cv=none; b=aXSzRvUmB64PcODTtS4zYsx64fnvtjymmapptMAQi5H465kZbTzE/vVttMnw7YS9mxdHTJ R64eWLG/YZGmG8GLoymyZwKBxGIZKrPWBBBk8EC6Y46mi4vHX53rZhsBi1JAe8gu1HsetB eZfLfDYtWv00KlUsNjkIXCHmgn1n2Rc= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Transfer-Encoding:Content-Type: In-Reply-To:From:References:Cc:To:Subject:MIME-Version:Date:Message-ID:Sender :Reply-To:Content-ID:Content-Description; bh=f8o/psJ0TXPeTYoQOnqypZJbdp1Q5AMgt88myFFBs/4=; b=eMEuDRJlcFMvVBNtt4Z5j0lGUb 7UVx4aDbQvcqNcYhZcxv4BJBZbr2AagsRFMGIsMFSwKeM6yKitzcnmk4LSW6HhV3LG4cYZHdsJK/9 rCMy2g95xPX/oiTUmgyrMQKUgvP447KJwd0xKdV3TqGLmDcZ4eSvKPfdfTpGzKxaq2eKbKxcwUCgH l8vMnHuuCFlhqHkS0jIl22tVhPd0bj6824xGB1nNdzffJUopArf1MXvSwd8CK5IEI7cFf5E3E055M EbqNl1+zv/zArNiixUhzp/GqezBG7b646vqP9XuUDJtozXjKuytiDOoUa+NAHmh7QEWDpgkVho0fl CIkiMTow==; Received: from [2601:1c2:980:9ec0::577] by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qVgYx-004hU2-T4; Mon, 14 Aug 2023 23:01:28 +0000 Message-ID: <2a7cac4c-a97e-92e2-56db-9429105d7a83@infradead.org> Date: Mon, 14 Aug 2023 16:01:24 -0700 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 Subject: Re: [PATCH RFC v2 3/3] mm: Proper document tail pages fields for folio Content-Language: en-US To: Peter Xu , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Kravetz , David Hildenbrand , Matthew Wilcox , Andrew Morton , Yu Zhao , Ryan Roberts , Yang Shi , Hugh Dickins , "Kirill A . Shutemov" References: <20230814184411.330496-1-peterx@redhat.com> <20230814184411.330496-4-peterx@redhat.com> From: Randy Dunlap In-Reply-To: <20230814184411.330496-4-peterx@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Queue-Id: 14BECC0027 X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: 8gutrq8qu5nq1yjshppskup15euo1ztq X-HE-Tag: 1692054089-328302 X-HE-Meta: U2FsdGVkX1+f0G8rRNlZS3+t+v4rIDZTSD5XQ2ec5BNTro5ECSSPG0MJT7iPbjXFh+TskxqRWUGUs0Vzl+bzj55zMa5H6uUMhY6RRT9dXVBTViktjHQyc7owYmSMuQJ14IKVN9zDh10cXRD0hvzzqRMza23gaDeq6+sEoUmIKS32OtUDxbic9dpjXG6n7ZC5H0LwENpffxqMCGS8LHWPIrW/i1p+mFUym/SxyXPK0luTLYa5ccgfgWZ2YtDIxO6mHpErZFzMESutixxZcnNtcebKs/MbY2eaJrjERbiVonIxWEbtoSfUzaC+HziPl0Kz4dLnU0fp1fb70utmEfWVtnJrXFFeZQBZg1NRAA3OURCRX6ruWIFHFu5wDyHwosPLXCzibfjHeU6Z1SskKmxppOuPu3xZsE0sWhboqOEOSTiGrPxRaXBh8aDidlgsKcHidiktlnvrgiHu7IuSG5GJjWJq1s0Mqja4bzK6Cm6y+eZ6RyImexnu2jBx4jEm4vutOFLub81izmYfNHB+gmsb8sZnAusUdL8o10zdG+F12LZ012FzlNK5y0vDV96EwHOYmyDspMjXWPrvjtZpFY1OpAQh5LP3qQqPlAQHtVuVcrPWKUFqD/7qVzXXiKuchBhX7ve3/IbEHEkhD+h+ZneOszOaYZcbzLcQOvn4WBJvIBDyfkDcBXUQsevYSeno5RJmSbxqtE00dZpfyTfyOPXKxdT8S4tT17pYXdHsc76CjJ/9bIj8ODbA9Cdv9wTkV8DaslT4VyDrAADAjL2SigTZLrzE1t5goXIeWiLM0zp03C2+4tPPWu+DqcQSvTGpl7sCVx9iKHvTgoLQCpg/+7Xf7RImppCkIsdb/D2cq574+dR+pSer1VNieDyL1Em8Lky56mPTOYd/fBW2bAMIUWR41lL2h3Pa3+Dqu4xreSfKRjAVJ4A7BcQC27zhSrtZEH/9EprBgrlDkaiV05v8aKl 6w4zpd3O +/WQ4hZPa5AQKPi0MwjFD1Q31uTGydIdqjNro1xgk3GvRSrY2Z/JCn0R9AKFACrtxLJO3ITeujz+PeFEAuB4G+DqfpI79I04zzcUchWzNExPJXwhoXvPEhBF4vwJNI4a+4jwh5PlpkLhoK2mC1f3Lm49UTAWey11jR3NDyVwIPweYTeNTFwqnhEZu/y8YsPWTRbjxAVY8RjRVQPhSGDZao9gp/LGyXQ2oamqJ X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi-- On 8/14/23 11:44, Peter Xu wrote: > Tail page struct reuse is over-comlicated. Not only because we have > implicit uses of tail page fields (mapcounts, or private for thp swap > support, etc., that we _may_ still use in the page structs, but not obvious > the relationship between that and the folio definitions), but also because > we have 32/64 bits layouts for struct page so it's unclear what we can use > and what we cannot when trying to find a new spot in folio struct. > > We also have tricks like page->mapping, where we can reuse only the tail > page 1/2 but nothing more than tail page 2. It is all mostly hidden, until > someone starts to read into a VM_BUG_ON_PAGE() of __split_huge_page_tail(). > > It's also unclear on how many fields we can reuse for a tail page. The > real answer is (after help from Matthew): we have 7 WORDs guaranteed on 64 > bits and 8 WORDs on 32 bits. Nothing more than that is guaranteed to even > exist. > > Let's document it clearly on what we can use and what we can't when > extending folio on reusing tail page fields, with 100% explanations on each > of them. Hopefully after the doc update it will make it easier when: > > (1) Any reader to know exactly what field is where and for what, the > relationships between folio tail pages and struct page definitions, > > (2) Any potential new fields to be added to a large folio, so we're clear > which field one can still reuse. > > This is assuming WORD is defined as sizeof(void *) on any archs, just like > the other comment in struct page we already have. > > Signed-off-by: Peter Xu > --- > include/linux/mm_types.h | 41 ++++++++++++++++++++++++++++++++++------ > 1 file changed, 35 insertions(+), 6 deletions(-) > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > index 829f5adfded1..9c744f70ae84 100644 > --- a/include/linux/mm_types.h > +++ b/include/linux/mm_types.h > @@ -322,11 +322,40 @@ struct folio { > }; > struct page page; > }; > + /* > + * Some of the tail page fields may not be reused by the folio > + * object because they're already been used by the page struct. On they have > + * 32bits there're at least 8 WORDs while on 64 bits there're at preferably s/there're/there are/ > + * least 7 WORDs: > + * > + * |--------+-------------+-------------------| > + * | index | 32 bits | 64 bits | > + * |--------+-------------+-------------------| > + * | 0 | flags | flags | > + * | 1 | head | head | > + * | 2 | FREE | FREE | > + * | 3 | FREE [1] | FREE [1] | > + * | 4 | FREE | FREE | > + * | 5 | FREE | private [2] | > + * | 6 | mapcnt | mapcnt+refcnt [3] | > + * | 7 | refcnt [3] | | > + * |--------+-------------+-------------------| > + * > + * [1] "mapping" field. It is free to use but needs to be with > + * some caution due to poisoning, see TAIL_MAPPING_REUSED_MAX. > + * > + * [2] "private" field, used when THP_SWAP is on (but disabled on > + * 32 bits, so this index is FREE on 32bit or hugetlb folios). > + * May need to be fixed finally. > + * > + * [3] "refcount" field must be zero for all tail pages. See e.g. > + * has_unmovable_pages() on page_ref_count() check and comment. > + */ > union { > struct { > unsigned long _flags_1; > unsigned long _head_1; > - /* public: */ > + /* public: WORD 2 */ > unsigned char _folio_dtor; > unsigned char _folio_order; > /* private: 2 bytes can be reused later */ > @@ -335,7 +364,7 @@ struct folio { > /* 4 bytes can be reused later (64 bits only) */ > unsigned char _free_1_1[4]; > #endif > - /* public: */ > + /* public: WORD 3 */ > atomic_t _entire_mapcount; > atomic_t _nr_pages_mapped; > atomic_t _pincount; > @@ -350,20 +379,20 @@ struct folio { > struct page __page_1; > }; > union { > - struct { > + struct { /* hugetlb folios */ > unsigned long _flags_2; > unsigned long _head_2; > - /* public: */ > + /* public: WORD 2 */ > void *_hugetlb_subpool; > void *_hugetlb_cgroup; > void *_hugetlb_cgroup_rsvd; > void *_hugetlb_hwpoison; > /* private: the union with struct page is transitional */ > }; > - struct { > + struct { /* non-hugetlb folios */ > unsigned long _flags_2a; > unsigned long _head_2a; > - /* public: */ > + /* public: WORD 2-3 */ > struct list_head _deferred_list; > /* private: 8 more free bytes for either 32/64 bits */ > unsigned char _free_2_1[8];