From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B7CBEB64D9 for ; Tue, 27 Jun 2023 18:27:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4E8108D0002; Tue, 27 Jun 2023 14:27:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 497AA8D0001; Tue, 27 Jun 2023 14:27:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 338568D0002; Tue, 27 Jun 2023 14:27:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 23DBF8D0001 for ; Tue, 27 Jun 2023 14:27:04 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E731B120A51 for ; Tue, 27 Jun 2023 18:27:03 +0000 (UTC) X-FDA: 80949359526.05.45A6F1E Received: from mail-qt1-f177.google.com (mail-qt1-f177.google.com [209.85.160.177]) by imf17.hostedemail.com (Postfix) with ESMTP id 1363C4001A for ; Tue, 27 Jun 2023 18:27:01 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=a5RlIMj6; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1687890422; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p3Nmjy2pgfjuqrruYrf3lVaHDsPBe2Ms9OHkO4l14Tc=; b=2qVJojPfYrU2/zxnKOhB7gXm48IgdaANO9oHO8qNJd8+N7eyCKgERr7RYgEN35GHcbhDbL PzgOXzdovXrva7WORqOqxuaRLTrTAGIJug1lwSwSUA1DTKQOx6P5G5xc+RnbPsiFevz+aX t79paAwTXcS/AYTlfG1Dc9zz85OmbjI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1687890422; a=rsa-sha256; cv=none; b=39aGGjNxsgRWWRSHZg+RyLQ4zzBdqqUmNVJcA4Wu/Vp2i0Zp80a4Zo8qHEDEDzqCgrY9hI 1GSV7YdDGFZ8EPwTSRkg/V/ITfb/meEsGNDIOLAT0mffsdfGoepF9htkaYAFf095BbzHAa e+xKBEzXSFuEz5qHzQsuQFQIy9gtdb4= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=a5RlIMj6; spf=pass (imf17.hostedemail.com: domain of yuzhao@google.com designates 209.85.160.177 as permitted sender) smtp.mailfrom=yuzhao@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f177.google.com with SMTP id d75a77b69052e-401d1d967beso42311cf.0 for ; Tue, 27 Jun 2023 11:27:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1687890421; x=1690482421; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=p3Nmjy2pgfjuqrruYrf3lVaHDsPBe2Ms9OHkO4l14Tc=; b=a5RlIMj6/IRppc+ipWDHPaQTI2NhmfAqOceI7FC6rDBUZ0YdWXpB5zfD+TDmPmokOg jJXKx4J2oRR8E3NnFmWvYg9la8baydeNVr9oJbMJyt50K9AMIZlmte0xfGlr6TV3MML5 LrQAUxCQfhlj2l++9GUeDiQl/rQ+QmOHDqrNKf7heXFnN+EDTaAWQdahCnuSKR+hGYi8 mEYm2joPAYfJ/fx11Sys9/N3cPzfAaHa0lhslrsT7SZV8q4oWy5oGXQwkxGDn3xrr6JJ AKbfCrZGMBQjGn035U+41RxpE/ToIdBPphfBiTFtcaLZRR63Up3cgl2i04ScxAi1WsUV 6uQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1687890421; x=1690482421; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=p3Nmjy2pgfjuqrruYrf3lVaHDsPBe2Ms9OHkO4l14Tc=; b=GWzoAiYTE79Wd/mZ8gQw02qh5fBei1KhHToMQdWg9BTeIyWnwL6p3wxRrzFZ2BmOTv cXREasGd1pmC4NbwgVyJ4KrLWUPeHEir3/BvHfDVCK9Hw/esVB+ACuwigzlj881OTdjT gIEi6SNs9rP4dbNOG3FXFK1DI6Ubf2q8EKuWHyzc/LQBHsVcdvgNCazWnm1nWaMTjh1F CI6wRxDjI2WyPZUbaZnxBmiBhnRYT4DAw/Tp8hZQ83TgwvDX7CxYDPrKPzQXCM8+QOE/ NR/tgD0FFv88lI+HRGV/rumpibcz8eXxOTQLpHqqxl+7ori/BwptMmQF8BTF2Jntd2xL kgfQ== X-Gm-Message-State: AC+VfDzTHmV3MsTHDPQoMUTS9dnc0sS/dizy8hZfV7VwO7p0cIMKqr+5 dvaU4jzPHgr02UZfbrAf5AHE6uZYK0JBH1cTTQytBg== X-Google-Smtp-Source: ACHHUZ7AEqZrzh/K1utyZf5ok0cHqv3xg8Py/k6kKIR4nUU8VzPiHpf2nmqNfAfLWWb4tXJtlvac6dFD7KRPewyjF7A= X-Received: by 2002:ac8:57ca:0:b0:3f8:6685:c944 with SMTP id w10-20020ac857ca000000b003f86685c944mr20146qta.14.1687890420797; Tue, 27 Jun 2023 11:27:00 -0700 (PDT) MIME-Version: 1.0 References: <20230626171430.3167004-1-ryan.roberts@arm.com> <20230626171430.3167004-2-ryan.roberts@arm.com> <2ff8ccf6-bf36-48b2-7dc2-e6c0d962f8b7@arm.com> <91e3364f-1d1b-f959-636b-4f60bf5a577b@arm.com> In-Reply-To: <91e3364f-1d1b-f959-636b-4f60bf5a577b@arm.com> From: Yu Zhao Date: Tue, 27 Jun 2023 12:26:24 -0600 Message-ID: Subject: Re: [PATCH v1 01/10] mm: Expose clear_huge_page() unconditionally To: Ryan Roberts Cc: Andrew Morton , "Matthew Wilcox (Oracle)" , "Kirill A. Shutemov" , Yin Fengwei , David Hildenbrand , Catalin Marinas , Will Deacon , Geert Uytterhoeven , Christian Borntraeger , Sven Schnelle , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-alpha@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-ia64@vger.kernel.org, linux-m68k@lists.linux-m68k.org, linux-s390@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: xmzwuecwtn83t6pseodho8ou3mpokm49 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1363C4001A X-Rspam-User: X-HE-Tag: 1687890421-247833 X-HE-Meta: U2FsdGVkX19dM9HtTvN6PFgyv3Uq9E9fWIU2dLJr3d4A9Q2/ZY3C01EyXeB+JU9itP60S2A5wdC3+2xT+Ifb9o5JfQo88K4lGqe3v80VQQb+rSawdSmcaMV6GDjBBEFkLQaMjVuqFoAbczAxtjO5+f8ra6FLQoYsqfFEiDE3Crn+Z2mk9udPecYd8IsEWTA5vg4JBG3faQY776vRvRqJgXJnieKFHu/6ad54QvE+k7JMyUd5VSxcAvjWVgb1q81bdxnepHCs/W/lAcWyDvAIhaxS0x9cdPoJoyekdC0IYYEiQKBkiGpAfv1WDyujTmT8ZvS6pwtW6sKig/mgQb6XsO5syCwaqpqOKv1nkUYPTnYm1i5cwDRV5C+UlyVoTpWIE0wWlBEwciy+fVUx5afB/e7zU80VFG+Z4QQXHycwvRkH06KicTdzu7SRkj2KKgAGz24tXA8AFbFREhSRm93UKNL81LHwLy1DgJdf3QkerErUuOOGjdGSakSWbu4H5TmHdZ2i5+rUvszaCf8RDDs4Jd31WcfL9woonOS8n7AZVSZSuc6SP7lsC3i2ojLDVvD1STQ07fnXEQNwysf9tDexs24N6mIfKRvZpk9Qee9+1yraNzFFwGotjW+iJ4SjYa2W9Rwa7wSfbi4lZD/986dLGD2Aa4wwJaef3ZSOzCIS83GJv/X3dOO0DaP99V9KPdN8BvL9EQ3RZHPIHuiDs+O4Zn79nHghXYMtl+KeO0QyEWa4vjgJr9zyDUvSiBI5KqOWlL99OfT+8VPCwlQ0vy18NEMGHC4FWD5U2gNOof1eaOe1ZxmYeBO8wX61XclRp7vrcveSZqriuTpxtrNB6VAjthD+T2ayQsM7HRNtq+UOD7odH7dsXE5gQjBGnV6yeyONP89gZt2RygSyVAcx+UQzr9KM2JMlrNfvemczGAKZzWvuYcOh4Ahv52tkhNEAm7webeXkJ62gpLZZcWbAYL2 88t640M2 6xFPrGTJfzAR0Aaa2mC+tZsB3iyKqVZDbDJGNHAR4gtHgoKE9u3H4eysIDxFPmPXjNtR14A+7fx1oFgbi5YfmBJ18wq9Btkbo5rf52MVR+zpcnefyntjhfBoUqPLCydUDR4Gj5Zx4CuiPDINSfLW/0fFO/3+8+RjYM+4hZVvVvjykwk8+zNxap+OPNzItq1js8MsP5ilyT3/6ZiEGpXGT1xRr0ia7SaXq/JQc0a4jHjOCatlmJJJ9t4c1FaDjKLbskP4G2PNVgFe63N5gYZSaMooTJh7RAWWMsy5E+alviXFckjTSwUyjdmWKkwuCZg/P0lZJj75YfFTAr4KD2cPilatPSg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Jun 27, 2023 at 3:41=E2=80=AFAM Ryan Roberts = wrote: > > On 27/06/2023 09:29, Yu Zhao wrote: > > On Tue, Jun 27, 2023 at 1:21=E2=80=AFAM Ryan Roberts wrote: > >> > >> On 27/06/2023 02:55, Yu Zhao wrote: > >>> On Mon, Jun 26, 2023 at 11:14=E2=80=AFAM Ryan Roberts wrote: > >>>> > >>>> In preparation for extending vma_alloc_zeroed_movable_folio() to > >>>> allocate a arbitrary order folio, expose clear_huge_page() > >>>> unconditionally, so that it can be used to zero the allocated folio = in > >>>> the generic implementation of vma_alloc_zeroed_movable_folio(). > >>>> > >>>> Signed-off-by: Ryan Roberts > >>>> --- > >>>> include/linux/mm.h | 3 ++- > >>>> mm/memory.c | 2 +- > >>>> 2 files changed, 3 insertions(+), 2 deletions(-) > >>>> > >>>> diff --git a/include/linux/mm.h b/include/linux/mm.h > >>>> index 7f1741bd870a..7e3bf45e6491 100644 > >>>> --- a/include/linux/mm.h > >>>> +++ b/include/linux/mm.h > >>>> @@ -3684,10 +3684,11 @@ enum mf_action_page_type { > >>>> */ > >>>> extern const struct attribute_group memory_failure_attr_group; > >>>> > >>>> -#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBF= S) > >>>> extern void clear_huge_page(struct page *page, > >>>> unsigned long addr_hint, > >>>> unsigned int pages_per_huge_page); > >>>> + > >>>> +#if defined(CONFIG_TRANSPARENT_HUGEPAGE) || defined(CONFIG_HUGETLBF= S) > >>> > >>> We might not want to depend on THP eventually. Right now, we still > >>> have to, unless splitting is optional, which seems to contradict > >>> 06/10. (deferred_split_folio() is a nop without THP.) > >> > >> Yes, I agree - for large anon folios to work, we depend on THP. But I = don't > >> think that helps us here. > >> > >> In the next patch, I give vma_alloc_zeroed_movable_folio() an extra `o= rder` > >> parameter. So the generic/default version of the function now needs a = way to > >> clear a compound page. > >> > >> I guess I could do something like: > >> > >> static inline > >> struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *v= ma, > >> unsigned long vaddr, gfp_t gfp, int= order) > >> { > >> struct folio *folio; > >> > >> folio =3D vma_alloc_folio(GFP_HIGHUSER_MOVABLE | gfp, > >> order, vma, vaddr, false); > >> if (folio) { > >> #ifdef CONFIG_LARGE_FOLIO > >> clear_huge_page(&folio->page, vaddr, 1U << order); > >> #else > >> BUG_ON(order !=3D 0); > >> clear_user_highpage(&folio->page, vaddr); > >> #endif > >> } > >> > >> return folio; > >> } > >> > >> But that's pretty messy and there's no reason why other users might co= me along > >> that pass order !=3D 0 and will be surprised by the BUG_ON. > > > > #ifdef CONFIG_LARGE_ANON_FOLIO // depends on CONFIG_TRANSPARENT_HUGE_PA= GE > > struct folio *alloc_anon_folio(struct vm_area_struct *vma, unsigned > > long vaddr, int order) > > { > > // how do_huge_pmd_anonymous_page() allocs and clears > > vma_alloc_folio(..., *true*); > > This controls the mem allocation policy (see mempolicy.c::vma_alloc_folio= ()) not > clearing. Clearing is done in __do_huge_pmd_anonymous_page(): > > clear_huge_page(page, vmf->address, HPAGE_PMD_NR); Sorry for rushing this previously. This is what I meant. The #ifdef makes it safe to use clear_huge_page() without 01/10. I highlighted the last parameter to vma_alloc_folio() only because it's different from what you chose (not implying it clears the folio). > > } > > #else > > #define alloc_anon_folio(vma, addr, order) > > vma_alloc_zeroed_movable_folio(vma, addr) > > #endif > > Sorry I don't get this at all... If you are suggesting to bypass > vma_alloc_zeroed_movable_folio() entirely for the LARGE_ANON_FOLIO case Correct. > I don't > think that works because the arch code adds its own gfp flags there. For > example, arm64 adds __GFP_ZEROTAGS for VM_MTE VMAs. I think it's the opposite: it should be safer to reuse the THP code because 1. It's an existing case that has been working for PMD_ORDER folios mapped by PTEs, and it's an arch-independent API which would be easier to review. 2. Use vma_alloc_zeroed_movable_folio() for large folios is a *new* case. It's an arch-*dependent* API which I have no idea what VM_MTE does (should do) to large folios and don't plan to answer that for now. > Perhaps we can do away with an arch-owned vma_alloc_zeroed_movable_folio(= ) and > replace it with a new arch_get_zeroed_movable_gfp_flags() then > alloc_anon_folio() add in those flags? > > But I still think the cleanest, simplest change is just to unconditionall= y > expose clear_huge_page() as I've done it. The fundamental choice there as I see it is to whether the first step of large anon folios should lean toward the THP code base or the base page code base (I'm a big fan of the answer "Neither -- we should create something entirely new instead"). My POV is that the THP code base would allow us to move faster, since it's proven to work for a very similar case (PMD_ORDER folios mapped by PTEs).