From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3874CEB64D7 for ; Fri, 16 Jun 2023 21:28:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4F1DC6B0072; Fri, 16 Jun 2023 17:28:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A1B96B0075; Fri, 16 Jun 2023 17:28:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 369008E0001; Fri, 16 Jun 2023 17:28:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 281B96B0072 for ; Fri, 16 Jun 2023 17:28:58 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id DFBC1C0D0E for ; Fri, 16 Jun 2023 21:28:57 +0000 (UTC) X-FDA: 80909901114.16.6DF42F2 Received: from mail-yw1-f175.google.com (mail-yw1-f175.google.com [209.85.128.175]) by imf30.hostedemail.com (Postfix) with ESMTP id 1E2F38000F for ; Fri, 16 Jun 2023 21:28:55 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=kDqKbU7P; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of vishal.moola@gmail.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=vishal.moola@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686950936; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=S8f4Qp7cwj4O2kdf7qLqLnrTRkxhB0m+8FmgrZaxSBc=; b=FzEb0tkz5b0IY71Vh+NgqkzHVFV8dyfKqObJ0K+oVptCLZlKUzoZ2VPPOn78fnS+VQMPS/ BB/seycQSjq6w9hdAUm+Zk5EI+hTILNDbPeaDyPATguer51z64MkpAo7CDzDGnTLl+jeos QMQSEsYbUilqMCUwrBVw6l1gxhYK9FQ= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=kDqKbU7P; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf30.hostedemail.com: domain of vishal.moola@gmail.com designates 209.85.128.175 as permitted sender) smtp.mailfrom=vishal.moola@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686950936; a=rsa-sha256; cv=none; b=29JxN88fcnaLq8zSk+gDh82dIZr2brrQrmt15lgrmlSzgc314G+Y3WeyCZsf9YJJDbfjET maIuGDF3gElPl4NnEtiksuwVhrrNC9a7bxtfAjsKaaxS6LPc5JAzCeJsnbazByRRyGv6vL jBsCYTvmHekM7Zca1KZ5h7qkxL6DQS0= Received: by mail-yw1-f175.google.com with SMTP id 00721157ae682-570114e1feaso14301357b3.3 for ; Fri, 16 Jun 2023 14:28:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686950935; x=1689542935; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=S8f4Qp7cwj4O2kdf7qLqLnrTRkxhB0m+8FmgrZaxSBc=; b=kDqKbU7P9edSk/BgMH9xSstpKqVCZCkotk48bdhsv9CQedo7tgzNx594G7ZL6GQuxg jA1o2JPcz0QDYMp8Gk7IfVIBu/Qb0pO39azZ5F7HLfZC1IgCGAp3b3idtV3rR/BwTQDN x0xETay6eu3ghZcxBbeBHYOXCm+0UdQ18fM+kmxKxAuiYRqF/zWkw4LlfaXk/Ol9POup f5e5C/WGoi8jmqXjJwP7rx3T/q8TLMKNW53qsepy7E0u9eKRuHWvS7CgQ60QFJLvcMrS P+YDUzuxecN+sqBu1rA+X9cuH9CB9q1H/97ksOSagJnc4ZVXc8JgbNn89oLF/7saT6Tn z99g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686950935; x=1689542935; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=S8f4Qp7cwj4O2kdf7qLqLnrTRkxhB0m+8FmgrZaxSBc=; b=kUZFBY5C31+IC+lMPaVE5zvNc1HM4ffDd+plErLIA/vUBH1VEStPK2R7wvlm6Lknli vGGaGY9fGt/YV60BRLoFe0ajXoP2XF6WX8reA+EoWNo51Rm6LD9/zhVG7XHCGgJmHkZ8 yOG1rlAzD9+HYhpDBaC8eVGCJBT8x3ZGMqB3oM1DuRMsamNDugpB5/Okfd3yfJfYRZER jumdGOy59sjPq+QbdJxsBciHFIACn9YYx1c4go1UWvz/JUl9R+ZZZlNpe5I10J6iubHu OqsLyh7tWXnz38qGhfKG/BsnP+yBHKRs92VNfFfFv19hK0nK3Jv4039Ob2VX54NW++Zk n+6g== X-Gm-Message-State: AC+VfDy/+gIv7vhK3ATVF3iEpcK3ZL2uv3Welpz1rdko8/3dtnpxq+k+ IRQ9qoRVwEG4GC6Q/lPA8w7i9j8jse3Xd83IqyE= X-Google-Smtp-Source: ACHHUZ7/iv7DGmrSbI7dWoycmh8dvViREYGBrKV5KnrTS1VkZy/HhcudLLxyAMIf5U1P44XX8eoY39CR0V8jmRNeXMA= X-Received: by 2002:a81:6c02:0:b0:56c:f684:b4f0 with SMTP id h2-20020a816c02000000b0056cf684b4f0mr2990065ywc.27.1686950935072; Fri, 16 Jun 2023 14:28:55 -0700 (PDT) MIME-Version: 1.0 References: <20230612210423.18611-1-vishal.moola@gmail.com> <20230612210423.18611-5-vishal.moola@gmail.com> In-Reply-To: From: Vishal Moola Date: Fri, 16 Jun 2023 14:28:44 -0700 Message-ID: Subject: Re: [PATCH v4 04/34] pgtable: Create struct ptdesc To: Hugh Dickins Cc: Andrew Morton , Matthew Wilcox , Gerald Schaefer , Vasily Gorbik , Jason Gunthorpe , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , linux-mm@kvack.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-csky@vger.kernel.org, linux-hexagon@vger.kernel.org, loongarch@lists.linux.dev, linux-m68k@lists.linux-m68k.org, linux-mips@vger.kernel.org, linux-openrisc@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-riscv@lists.infradead.org, linux-s390@vger.kernel.org, linux-sh@vger.kernel.org, sparclinux@vger.kernel.org, linux-um@lists.infradead.org, xen-devel@lists.xenproject.org, kvm@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: jurg51xzmphtdj98ehymrai171yunsdn X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 1E2F38000F X-HE-Tag: 1686950935-182884 X-HE-Meta: U2FsdGVkX1/hgiRLWxfdu5+LoBEorAWAao/9LRMG8flc+Fh2MxLwKbVPCc5LLYHho8CkAYsBuSNFarN2ijHmg7dDG0+WV+O/LlHtkiitdcKX/iYfU/8zZjlrALsMUpuTFVeHlzZ7GWvG7cRlrcFVM1YpPEQAwY2LxIoIGOz1WKi/6YAgDI6L1j3DLbVk8YDHq0cFmVI2GC7vKzTdOhnCR9fO24Yx7Vom3gdfZoiORjTBrDt2aeH1urU+CJjJBk6qOdnBuUcj6yUgPN/YHBKwYZ8TaMMt7ym0KaQyyxf+HyjQwanzQnDS2JXzy3R3Q3vcO9ueMtBWnrZ+WEj7pMloAc94RJ+F1PNkEJ/xfw9Nm26QQqvhc+/ivsQg7POR702OSYqFE3lFHprybj752pcnwXFiaWtypD/mtKx9Ou795NS4/VdixqQvpI0itBajy2w38iM634n6+YiEVh66A+FvGGl2QCTosstx+GHJOvvF7BfsHLwfOiDfIUvCLf8CDq/f9OF6Eyl9ogbwc092AjeHo7t2a6ZDSCA1uG1o9oYg55g8nfEy+Oho1/AvZROvNC7r/If5X8EzBQIV8x3espGU4ySmPQFMgN86p2u037pLbKR4vjcD/KFhTJF9BO/BkiWVt3Sp0aPTz1k1UwhqdoX+W3+wE/fPA3+HwXyXaLhPF4cLCcjdxcasGg2ywL4F241LZLDBZcLpdRWpM+HzYgFR/icAya+PeoB7YTofhHbu02Kyav1KaAbO/F4LyRx3RYuM72S6/3u/VVBli4FHIWj32ulRJ0o6zCHQlXK3Wj2m8hVr/8eYXZmABX/hxqdsf9G971wIw6/iKtHW8Mb5Rv8u83DjySSrOWPM0Uh4qJvxJIYgxbZm0cd9r4ILp7daqHObyFYGOXPJp/4Mkz8FqJ7dCrBj1cN2ywh1I6Tg+eFsCTtwiaSLLCmqHUS/EhajRnX2cu4bMUbDPdYE3WGb8sK ndY3I5+i jT2llJGtr5ZtxIxIaiPl2bFoXrmKnQfiZaBqfAjWPtWjC4rAkfsdsFYq6FUtguBXM/4nwM1DGskfoUfZkvzXQtE14JXlc98lPyD+PpoVBhrjSB7aiAOV+mrol6Rg6nB5VOQkgpUhCZ+s+uFMIjzEDkk6kVdPj475n9G1N33wSk09T6QpM0GB3fb5sz4vIiOrPCVhLXGbZ8+P/jzb+ytbCOHPUQSARv3wgjc+c09gBSpz/wFWJIhKuUwSZoqG07qUH1+AU7GuhtNeZXS3pJwm1IkLw3QUsOqDsmLG224mSe5Lv9o9LBMK6ICbfNCR25nvlIJuM5l7sTEA1KyD+YpTT0ZdSk4lyJA3MgQSOB4/vpHKG3PokGUeQ3MD+/+ctEmr9Bjla2CrN1AS8Sn0VSljGDvvm2J1Wvq7Foq6nURVMsAOL24y2KJCq6kvvdA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 15, 2023 at 12:57=E2=80=AFAM Hugh Dickins wr= ote: > > On Mon, 12 Jun 2023, Vishal Moola (Oracle) wrote: > > > Currently, page table information is stored within struct page. As part > > of simplifying struct page, create struct ptdesc for page table > > information. > > > > Signed-off-by: Vishal Moola (Oracle) > > Vishal, as I think you have already guessed, your ptdesc series and > my pte_free_defer() "mm: free retracted page table by RCU" series are > on a collision course. > > Probably just trivial collisions in most architectures, which either > of us can easily adjust to the other; powerpc likely to be more awkward, > but fairly easily resolved; s390 quite a problem. > > I've so far been unable to post a v2 of my series (and powerpc and s390 > were stupidly wrong in the v1), because a good s390 patch is not yet > decided - Gerald Schaefer and I are currently working on that, on the > s390 list (I took off most Ccs until we are settled and I can post v2). > > As you have no doubt found yourself, s390 has sophisticated handling of > free half-pages already, and I need to add rcu_head usage in there too: > it's tricky to squeeze it all in, and ptdesc does not appear to help us > in any way (though mostly it's just changing some field names, okay). > > If ptdesc were actually allowing a flexible structure which architectures > could add into, that would (in some future) be nice; but of course at > present it's still fitting it all into one struct page, and mandating > new restrictions which just make an architecture's job harder. A goal of ptdescs is to make architecture's jobs simpler and standardized. Unfortunately, ptdescs are nowhere near isolated from struct page yet. This version of struct ptdesc contains the exact number of fields architect= ures need right now, just reorganized to be located next to each other. It *prob= ably* shouldn't make an architectures job harder, aside from discouraging their u= se of yet even more members of struct page. > Some notes on problematic fields below FYI. > > > --- > > include/linux/pgtable.h | 51 +++++++++++++++++++++++++++++++++++++++++ > > 1 file changed, 51 insertions(+) > > > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > > index c5a51481bbb9..330de96ebfd6 100644 > > --- a/include/linux/pgtable.h > > +++ b/include/linux/pgtable.h > > @@ -975,6 +975,57 @@ static inline void ptep_modify_prot_commit(struct = vm_area_struct *vma, > > #endif /* __HAVE_ARCH_PTEP_MODIFY_PROT_TRANSACTION */ > > #endif /* CONFIG_MMU */ > > > > + > > +/** > > + * struct ptdesc - Memory descriptor for page tables. > > + * @__page_flags: Same as page flags. Unused for page tables. > > + * @pt_list: List of used page tables. Used for s390 and x86. > > + * @_pt_pad_1: Padding that aliases with page's compound head. > > + * @pmd_huge_pte: Protected by ptdesc->ptl, used for THPs. > > + * @_pt_s390_gaddr: Aliases with page's mapping. Used for s390 gmap on= ly. > > + * @pt_mm: Used for x86 pgds. > > + * @pt_frag_refcount: For fragmented page table tracking. Powerpc and = s390 only. > > + * @ptl: Lock for the page table. > > + * > > + * This struct overlays struct page for now. Do not modify without a g= ood > > + * understanding of the issues. > > + */ > > +struct ptdesc { > > + unsigned long __page_flags; > > + > > + union { > > + struct list_head pt_list; > > I shall be needing struct rcu_head rcu_head (or pt_rcu_head or whatever, > if you prefer) in this union too. Sharing the lru or pt_list with rcu_he= ad > is what's difficult to get right and efficient on s390 - and if ptdesc ga= ve > us an independent rcu_head for each page table, that would be a blessing! > but sadly not, it still has to squeeze into a struct page. I can add a pt_rcu_head along with a comment to deter aliasing issues :) Independent rcu_heads aren't coming any time soon though :( > > + struct { > > + unsigned long _pt_pad_1; > > + pgtable_t pmd_huge_pte; > > + }; > > + }; > > + unsigned long _pt_s390_gaddr; > > + > > + union { > > + struct mm_struct *pt_mm; > > + atomic_t pt_frag_refcount; > > Whether s390 will want pt_mm is not yet decided: I want to use it, > Gerald prefers to go without it; but if we do end up using it, > then pt_frag_refcount is a luxury we would have to give up. I don't like the use of pt_mm for s390 either. s390 uses space equivalent to all five words allocated in the page table struct (albeit in various pla= ces of struct page). Using extra space (especially allocated for unrelated reasons) just because it exists makes things more complicated and confusing, and s390 is already confusing enough as a result of that. If having access to pt_mm is necessary I can drop the pt_frag_refcount patch, but I'd rather avoid it. > s390 does very well already with its _refcount tricks, and I'd expect > powerpc's simpler but more wasteful implementation to work as well > with _refcount too - I know that a few years back, powerpc did misuse > _refcount (it did not allow for speculative accesses, thought it had > sole ownership of that field); but s390 copes well with that, and I > expect powerpc can do so too, without the luxury of pt_frag_refcount. > > But I've no desire to undo powerpc's use of pt_frag_refcount: > just warning that we may want to undo any use of it in s390. > > I thought I had more issues to mention, probably Gerald will > remind me of a whole new unexplored dimension! gmap perhaps. > > Hugh > > > + }; > > + > > +#if ALLOC_SPLIT_PTLOCKS > > + spinlock_t *ptl; > > +#else > > + spinlock_t ptl; > > +#endif > > +}; > > + > > +#define TABLE_MATCH(pg, pt) \ > > + static_assert(offsetof(struct page, pg) =3D=3D offsetof(struct pt= desc, pt)) > > +TABLE_MATCH(flags, __page_flags); > > +TABLE_MATCH(compound_head, pt_list); > > +TABLE_MATCH(compound_head, _pt_pad_1); > > +TABLE_MATCH(pmd_huge_pte, pmd_huge_pte); > > +TABLE_MATCH(mapping, _pt_s390_gaddr); > > +TABLE_MATCH(pt_mm, pt_mm); > > +TABLE_MATCH(ptl, ptl); > > +#undef TABLE_MATCH > > +static_assert(sizeof(struct ptdesc) <=3D sizeof(struct page)); > > + > > /* > > * No-op macros that just return the current protection value. Defined= here > > * because these macros can be used even if CONFIG_MMU is not defined. > > -- > > 2.40.1