From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA9F6C77B7A for ; Wed, 7 Jun 2023 03:49:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ECBFD6B0071; Tue, 6 Jun 2023 23:49:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E7C3D6B0072; Tue, 6 Jun 2023 23:49:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D1C838E0001; Tue, 6 Jun 2023 23:49:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id C222C6B0071 for ; Tue, 6 Jun 2023 23:49:19 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 7C2F01C7EDA for ; Wed, 7 Jun 2023 03:49:19 +0000 (UTC) X-FDA: 80874571638.02.A1104A0 Received: from mail-yw1-f173.google.com (mail-yw1-f173.google.com [209.85.128.173]) by imf04.hostedemail.com (Postfix) with ESMTP id B16A040003 for ; Wed, 7 Jun 2023 03:49:17 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=svYWLhcS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686109757; a=rsa-sha256; cv=none; b=0/nJYycOvL8X2uN1donM7+foAIX/5r1PYT6XNDsZgqKaFOjdzTAUkUG9tslf/1cJrkAu36 YTukQQpWtJ13PEa/bdZwCQRxASNTPfB6SY+QrrU6ew05ebSgdqrG2nHd5Yl6UfWURIrBOo ReT4H6Y+towasyTznwbC+pC84a+bZvY= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=svYWLhcS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of hughd@google.com designates 209.85.128.173 as permitted sender) smtp.mailfrom=hughd@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686109757; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z7egDDp6MmQZCbbodxJ12b9UShhbnTUq6PVtm9aM+04=; b=dS8yYAAORbih21kY997JevuSwL0mjydfNxyWZIECJ5KCywmjDVbaKXhYSdMa7Khd+PDJ3K VF7VTQK4zTtuDmSrSe7VGu12vmV52skgaKXnn/pSSBB+oMycGv6VyZYOfMZNobJ9DFRYFG Lohui1TwUrYCsowgQBmEp+VqfottjfA= Received: by mail-yw1-f173.google.com with SMTP id 00721157ae682-565c9109167so64381577b3.2 for ; Tue, 06 Jun 2023 20:49:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686109757; x=1688701757; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:from:to:cc:subject:date:message-id:reply-to; bh=z7egDDp6MmQZCbbodxJ12b9UShhbnTUq6PVtm9aM+04=; b=svYWLhcSZJLijhXMtOg6zoFWs7WZgeGMlaOBPQ1Hxut+v+SprcEt1mruwBwYGWxVpk R4VAzMV2IzHLh+mPW2uUdGs5aSJI9HeknovXZjb+aHdS52CCbi0kJVDpYzrq9GhXMhRC FTb7+VxV0S6eRqTvQ/kaimlA7y8zmrCH3GjwdpvspjUfa3sgmJFP34mrasuFjLftvSaq ZikWeXr4ILkADAUHGg993IjM2oPDOWjp1qn4p7qC080M6wM/EJiPPCcqhWnkTvL706Q4 xinA9REzuomxQU5CR5ePTn4f1k4YmU014oy2N8I6xWM4Xf+dUauDb3zqmm0ee5e2Irug sDuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686109757; x=1688701757; h=mime-version:references:message-id:in-reply-to:subject:cc:to:from :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=z7egDDp6MmQZCbbodxJ12b9UShhbnTUq6PVtm9aM+04=; b=dil4hOgB1D4YwV5B1EFuOygJVImWTZqfUZEccfejeM8q3u+e8J6Gk1G1WNPfDvilL2 NLxdHbvhQj6DiWersMmhIfCeqybCVrTdgmsW+B7taT3Hg75gJ/5YfwJrg4WyojMW68dt 58KUTuDcd6jaEod2k7rDO64JXtIRVj+6/CTvD92VVOcubdEBnfQMJ8CEnjUAWBwujAIe zMIG6dRuHDqhY8Bd5jKkE2OZ0wpRWRbwCHNFjvKH24kBoWe1sVVWc9VGiQlzkn600wPH 7fYbcTLc+KmSk+fZwz1k38PfpiGCS5q8afKXaGGvis+Ng3LLmLExpCWPPUPoagMde5Wu OomA== X-Gm-Message-State: AC+VfDxeBx/qf5iOPTWHf+eEEYM3LCoRdSMB9DctzEm6ws6PsdxL9UQ8 xbe1WOLtUcHapv9cxTDlHE/41A== X-Google-Smtp-Source: ACHHUZ6FeY70tXV92DFudQ4t5nR7rwwIAGb4bJwe8V5V4ANCDpC/kaCGZoHb9LVSkohk9DbNmeTmEg== X-Received: by 2002:a0d:cb47:0:b0:565:c96b:f526 with SMTP id n68-20020a0dcb47000000b00565c96bf526mr4712606ywd.19.1686109756614; Tue, 06 Jun 2023 20:49:16 -0700 (PDT) Received: from ripple.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id s11-20020a81bf4b000000b00555df877a4csm4446640ywk.102.2023.06.06.20.49.12 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 06 Jun 2023 20:49:15 -0700 (PDT) Date: Tue, 6 Jun 2023 20:49:04 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@ripple.attlocal.net To: Jason Gunthorpe cc: Peter Xu , Hugh Dickins , Matthew Wilcox , Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , Russell King , "David S. Miller" , Michael Ellerman , "Aneesh Kumar K.V" , Heiko Carstens , Christian Borntraeger , Claudio Imbrenda , Alexander Gordeev , Jann Horn , linux-arm-kernel@lists.infradead.org, sparclinux@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 05/12] powerpc: add pte_free_defer() for pgtables sharing page In-Reply-To: Message-ID: <9130acb-193-6fdd-f8df-75766e663978@google.com> References: <35e983f5-7ed3-b310-d949-9ae8b130cdab@google.com> <28eb289f-ea2c-8eb9-63bb-9f7d7b9ccc11@google.com> <4df4909f-f5dd-6f94-9792-8f2949f542b3@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: B16A040003 X-Stat-Signature: rbssiqkpsgo4t57fqm59cqxdfbkh7qse X-HE-Tag: 1686109757-55042 X-HE-Meta: U2FsdGVkX19e/9+38VLRjIWe1BDeoz6N8NGfO2zLI/fwwsGGlUUiT2utdIGRv0SXmGLp8UkEJBVArUI+q91sBiPgua4VoT0rlWGuZ9oumEV/syCV72Ns2LUqfZDKqrq8b8pJVOHbnwi8di0nY1+A8BEyXma0UMilY/pEH+ItK7CSjKrSJWe7C+R4kYf/YrbJbYudKgMaqi+B0tmEoPRTtrU+vDsn7IxfwbNYkSx/taWf59lnu35SB0IyRQreAzOXNnWSp0I6nmUxXEGW9Re/SBZIEPyWWdP3Z7HulmgLpTWV7GAtFS2Lk3opAZbQLSaWzDYXsIsN1/45DKNi/MjOO3S3AwWQFZIUvBduWneI+XEo+HLttYZfzolSj1OzB2Yyg/gKZFyC2MRyGP6TVEhDrlcx4Eji/FW8J78IUfbsYLKVhenWqEeJFwjo0Q7yilqHW651SQznTQzuOE+LJO1Oz+Nr4LfTOIumo7f0mupfhmTSgS9OjWL5TGZKEwQP+PEjpxmAsskiHmCBEDZRTamP8pW+lwdKutvMocKXjbGNiUyZBRA+dJ5Mk4mqM6O3v88XYSK6QjoFRwzrb3BUqeWSW4jiIkgtfShgcTZU4m6m2qZpFCLvZq5u+adO4wCUd3rYIACjI+oS69AyUTN1xAR7sKmc0JXoGalNUP4i/ch8Sm1wpAwPYpsR+0/cU/eR4X3kt3+T2wWGlXETqInWrjp/bm2QpDeWDjYoQiinf+9Iu8RJ7P+RKvyCXtjkDgA8eGfJHwkLP/aDCegXRS2LL25Xi+7QfSuCQEbJ2q/d8eNTsz9DoSWpwC7E64qX5N1eQIfTdxZFe0g0xrMedVIWrf5byZYJ/THZE2NHHtDnq7MSes37gzHwkE4UBCCqMxMYnO948qpFUahHwZ9eqRa7frsSCM5v4lINP4TZuFZ3kBM/m1xB+u0IrHJlPI7yAxzsRh3qlLUG50HTuurzyiqXy2z RlOXaDrv ptzS5GegeuGJ55G6OsieqkfPU4boMyk6Pdow89mHOs7VETufWC7EC92WywjWMz+oKQGOytkUW2iPPNEfkGD5J2VGKIR0VVse/K15+Ro4sX+k4v5Ym8HdLbgqCztxwiYRCGcoUHttHizU/xIJIbGED9a08st89m/3uVetrrtAWF0QxvAzLOewq3ScSHIl0HN3oOwTYb+f80oHPV3zohrYMaEgihAfPx1lKhwZQnlOh/6tSrHZ1sGhhB3tVLy9mcqREwwtE5KDLPfvqc/xMPRSki1dUpBhxcpddWH7sl/iKlFdSUmSfhhr9ycW3lWf2+6Bc6pRNhjzxL/mGWWtwEdwko2fW4JqYqRq6KwBH2EK8A0nNwT125eL5C+uRSUkVGqHxMzZFiAHm47m7EeP3arOYEWQGH/vaTirtKvoq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 6 Jun 2023, Jason Gunthorpe wrote: > On Tue, Jun 06, 2023 at 03:03:31PM -0400, Peter Xu wrote: > > On Tue, Jun 06, 2023 at 03:23:30PM -0300, Jason Gunthorpe wrote: > > > On Mon, Jun 05, 2023 at 08:40:01PM -0700, Hugh Dickins wrote: > > > > > > > diff --git a/arch/powerpc/mm/pgtable-frag.c b/arch/powerpc/mm/pgtable-frag.c > > > > index 20652daa1d7e..e4f58c5fc2ac 100644 > > > > --- a/arch/powerpc/mm/pgtable-frag.c > > > > +++ b/arch/powerpc/mm/pgtable-frag.c > > > > @@ -120,3 +120,54 @@ void pte_fragment_free(unsigned long *table, int kernel) > > > > __free_page(page); > > > > } > > > > } > > > > + > > > > +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > > > > +#define PTE_FREE_DEFERRED 0x10000 /* beyond any PTE_FRAG_NR */ > > > > + > > > > +static void pte_free_now(struct rcu_head *head) > > > > +{ > > > > + struct page *page; > > > > + int refcount; > > > > + > > > > + page = container_of(head, struct page, rcu_head); > > > > + refcount = atomic_sub_return(PTE_FREE_DEFERRED - 1, > > > > + &page->pt_frag_refcount); > > > > + if (refcount < PTE_FREE_DEFERRED) { > > > > + pte_fragment_free((unsigned long *)page_address(page), 0); > > > > + return; > > > > + } > > > > > > From what I can tell power doesn't recycle the sub fragment into any > > > kind of free list. It just waits for the last fragment to be unused > > > and then frees the whole page. Yes, it's relatively simple in that way: not as sophisticated as s390. > > > > > > So why not simply go into pte_fragment_free() and do the call_rcu directly: > > > > > > BUG_ON(atomic_read(&page->pt_frag_refcount) <= 0); > > > if (atomic_dec_and_test(&page->pt_frag_refcount)) { > > > if (!kernel) > > > pgtable_pte_page_dtor(page); > > > call_rcu(&page->rcu_head, free_page_rcu) > > > > We need to be careful on the lock being freed in pgtable_pte_page_dtor(), > > in Hugh's series IIUC we need the spinlock being there for the rcu section > > alongside the page itself. So even if to do so we'll need to also rcu call > > pgtable_pte_page_dtor() when needed. Thanks, Peter, yes that's right. > > Er yes, I botched that, the dtor and the free_page should be in a the > rcu callback function But it was just a botched detail, and won't have answered Jason's doubt. I had three (or perhaps it amounts to two) reasons for doing it this way: none of which may seem good enough reasons to you. Certainly I'd agree that the way it's done seems... arcane. One, as I've indicated before, I don't actually dare to go all the way into RCU freeing of all page tables for powerpc (or any other): I should think it's a good idea that everyone wants in the end, but I'm limited by my time and competence - and dread of losing my way in the mmu_gather TLB #ifdef maze. It's work for someone else not me. (pte_free_defer() do as you suggest, without changing pte_fragment_free() itself? No, that doesn't work out when defer does, say, the decrement of pt_frag_refcount from 2 to 1, then pte_fragment_free() does the decrement from 1 to 0: page freed without deferral.) Two, this was the code I'd worked out before, and was used in production, so I had confidence in it - it was just my mistake that I'd forgotten the single rcu_head issue, and thought I could avoid it in the initial posting. powerpc has changed around since then, but apparently not in any way that affects this. And it's too easy to agree in review that something can be simpler, without bringing back to mind why the complications are there. Three (just an explanation of why the old code was like this), powerpc relies on THP's page table deposit+withdraw protocol, even for shmem/ file THPs. I've skirted that issue in this series, by sticking with retract_page_tables(), not attempting to insert huge pmd immediately. But if huge pmd is inserted to replace ptetable pmd, then ptetable must be deposited: pte_free_defer() as written protects the deposited ptetable from then being freed without deferral (rather like in the example above). But does not protect it from being withdrawn and reused within that grace period. Jann has grave doubts whether that can ever be allowed (or perhaps I should grant him certainty, and examples that it cannot). I did convince myself, back in the day, that it was safe here: but I'll have to put in a lot more thought to re-justify it now, and on the way may instead be completely persuaded by Jann. Not very good reasons: good enough, or can you supply a better patch? Thanks, Hugh