From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F701C433FE for ; Wed, 19 Oct 2022 09:33:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07B0C6B0072; Wed, 19 Oct 2022 05:33:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 004BE6B0073; Wed, 19 Oct 2022 05:33:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0DBA6B0074; Wed, 19 Oct 2022 05:33:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CF8246B0072 for ; Wed, 19 Oct 2022 05:33:32 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id A29B91C69C2 for ; Wed, 19 Oct 2022 09:33:32 +0000 (UTC) X-FDA: 80037186264.28.1AEBFE7 Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by imf03.hostedemail.com (Postfix) with ESMTP id DA0422002D for ; Wed, 19 Oct 2022 09:33:31 +0000 (UTC) Received: by mail-lj1-f174.google.com with SMTP id by36so21395482ljb.4 for ; Wed, 19 Oct 2022 02:33:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jwlo8R/0/8T2BaS4ED3Fbn++HGrzEIZnvt6eFx8y47Q=; b=P+zj9cFms3cMCo99+NjZF1XPSaSJ2uprXNWfxJt3tCEtDDyuY39tfX8fW+J3QfAxBj vNKvBRA1ulDEpO8gEKXW+Etmxqi4gvba42CfxQA69DoLZLYQHKwMlO4nK/tbSQVQLTWK 6RRVHTup/B35MvSKkXotjzP28+6ganhteQmPodrgrwYI+nnU2Vh70bb+GUsTEXVEbOF4 G+pL5Ll4lnmB7RV7uDhMr/4X9NUUwGR8dJ/agcOfrzPZm7KKqUJWbcP2rTjzds1oAyLQ FqQ55zNl3HQgPb4XglMiOLGMQ+KKff8Kyca4a9z1AxKvemsm3dLba/ya5jVgBMQsN0YN sHfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jwlo8R/0/8T2BaS4ED3Fbn++HGrzEIZnvt6eFx8y47Q=; b=HgjenwiHu37/9h05qc0tDaNdJOP7qLLXEtHAIhjGjSWF8m7Kzyuya4OT8yN0CTDoTM 7Rp5dNmJaZCctSGScZBfSNFJjiN+lgkWG28/OezNuFPs9hyZKuuM8BmWTPzPNggqXKx4 UzC4E23T5pkYJf/UyZEIqKw9HtwrK/rSkZwPG9GyWKI7sT9+fTMKTALYSMmlfH5NKzfQ 7NuZi6dc6ssZjMYaV3p/EDYABQMMM+CTFl5cXQwHOo/fuBYTQzTR0qHFMQfDvinThviw 4PJ3jtPCzATI4q8VYunPq/nyYTn4qg2bBOz8vmMGJEkYjPZ1t5SrwGPCn+epcDS7uL9a ezLA== X-Gm-Message-State: ACrzQf0uRS/D+Ihek6hrvqerlmFSsbPyrJHcb2pDa1xEvnqJnu2nm8nO EA4NmTeY23Iod42JYXQiVzctMJssuyh2e7o/6C+13Q== X-Google-Smtp-Source: AMsMyM6albaiTnFLdkNT2mRZ20TKyJq3hvbzHFXYG7354SevtR1e4OwHGVqzRzUW3ny5Dq0WDHOSIxfGqHW/bLFnwDM= X-Received: by 2002:a05:651c:235:b0:26f:c0c7:a5da with SMTP id z21-20020a05651c023500b0026fc0c7a5damr2554519ljn.500.1666172009944; Wed, 19 Oct 2022 02:33:29 -0700 (PDT) MIME-Version: 1.0 References: <20221012081526.73067-1-huangjie.albert@bytedance.com> <5f7ef6ee-6241-9912-f434-962be53272c@google.com> In-Reply-To: <5f7ef6ee-6241-9912-f434-962be53272c@google.com> From: =?UTF-8?B?6buE5p2w?= Date: Wed, 19 Oct 2022 17:33:17 +0800 Message-ID: Subject: Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops To: Hugh Dickins Cc: Mike Kravetz , Muchun Song , Andi Kleen , Andrew Morton , linux-mm@kvack.org, linux-kernel Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666172012; a=rsa-sha256; cv=none; b=3t93awpoe5fmdhfhlWyg016GX35tKcXzjb8uo/0S7prAFs9+JFM6/bmKH5YkIk9PTcHOzR Nv9ihj0rKwnyolMfVr1RdgBk6zcjCouoBNUnWmPWZ+MQf8/Dj9F9q3oLPZAfy7G9Ka2V1A fI6cG+Ho4Ds0w7+3jn63FsHv+K/dDXg= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=P+zj9cFm; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf03.hostedemail.com: domain of huangjie.albert@bytedance.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=huangjie.albert@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666172012; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jwlo8R/0/8T2BaS4ED3Fbn++HGrzEIZnvt6eFx8y47Q=; b=qB4KPZHsihRQdiH/B7deh7TN0DmXDbDR/C9rA9h7AFMJWYEQ7EcOT84I50T6XWo3bZ1UMi f+z0Z7BbD80MCeuifOY3yEtW2KyGgsmxl4OjQZHhmJmhPJyQs3LKpx34Exah1UvwysKp3N HDQWfHseBaNIQzlbTT+aqdYndhwPc38= X-Rspamd-Server: rspam05 X-Rspam-User: X-Rspamd-Queue-Id: DA0422002D Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=P+zj9cFm; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf03.hostedemail.com: domain of huangjie.albert@bytedance.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=huangjie.albert@bytedance.com X-Stat-Signature: bgkatu6m8q5kay96kxkccj751jop59uo X-HE-Tag: 1666172011-985968 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hugh Dickins =E4=BA=8E2022=E5=B9=B410=E6=9C=8813=E6=97= =A5=E5=91=A8=E5=9B=9B 03:45=E5=86=99=E9=81=93=EF=BC=9A > > On Wed, 12 Oct 2022, Albert Huang wrote: > > > From: "huangjie.albert" > > > > implement these two functions so that we can set the mempolicy to > > the inode of the hugetlb file. This ensures that the mempolicy of > > all processes sharing this huge page file is consistent. > > > > In some scenarios where huge pages are shared: > > if we need to limit the memory usage of vm within node0, so I set qemu'= s > > mempilciy bind to node0, but if there is a process (such as virtiofsd) > > shared memory with the vm, in this case. If the page fault is triggered > > by virtiofsd, the allocated memory may go to node1 which depends on > > virtiofsd. > > > > Signed-off-by: huangjie.albert > > Aha! Congratulations for noticing, after all this time. hugetlbfs > contains various little pieces of code that pretend to be supporting > shared NUMA mempolicy, but in fact there was nothing connecting it up. > > It will be for Mike to decide, but personally I oppose adding > shared NUMA mempolicy support to hugetlbfs, after eighteen years. > > The thing is, it will change the behaviour of NUMA on hugetlbfs: > in ways that would have been sensible way back then, yes; but surely > those who have invested in NUMA and hugetlbfs have developed other > ways of administering it successfully, without shared NUMA mempolicy. > > At the least, I would expect some tests to break (I could easily be > wrong), and there's a chance that some app or tool would break too. Hi : Hugh Can you share some issues here? Thanks. > > I have carried the reverse of Albert's patch for a long time, stripping > out the pretence of shared NUMA mempolicy support from hugetlbfs: I > wanted that, so that I could work on modifying the tmpfs implementation, > without having to worry about other users. > > Mike, if you would prefer to see my patch stripping out the pretence, > let us know: it has never been a priority to send in, but I can update > it to 6.1-rc1 if you'd like to see it. (Once upon a time, it removed > all need for struct hugetlbfs_inode_info, but nowadays that's still > required for the memfd seals.) > > Whether Albert's patch is complete and correct, I haven't begun to think > about: I am not saying it isn't, but shared NUMA mempolicy adds another > dimension of complexity, and need for support, that I think hugetlbfs > would be better off continuing to survive without. > > Hugh > > > --- > > mm/hugetlb.c | 22 ++++++++++++++++++++++ > > 1 file changed, 22 insertions(+) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 0ad53ad98e74..ed7599821655 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -4678,6 +4678,24 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_= fault *vmf) > > return 0; > > } > > > > +#ifdef CONFIG_NUMA > > +int hugetlb_vm_op_set_policy(struct vm_area_struct *vma, struct mempol= icy *mpol) > > +{ > > + struct inode *inode =3D file_inode(vma->vm_file); > > + > > + return mpol_set_shared_policy(&HUGETLBFS_I(inode)->policy, vma, m= pol); > > +} > > + > > +struct mempolicy *hugetlb_vm_op_get_policy(struct vm_area_struct *vma,= unsigned long addr) > > +{ > > + struct inode *inode =3D file_inode(vma->vm_file); > > + pgoff_t index; > > + > > + index =3D ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > > + return mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy, ind= ex); > > +} > > +#endif > > + > > /* > > * When a new function is introduced to vm_operations_struct and added > > * to hugetlb_vm_ops, please consider adding the function to shm_vm_op= s. > > @@ -4691,6 +4709,10 @@ const struct vm_operations_struct hugetlb_vm_ops= =3D { > > .close =3D hugetlb_vm_op_close, > > .may_split =3D hugetlb_vm_op_split, > > .pagesize =3D hugetlb_vm_op_pagesize, > > +#ifdef CONFIG_NUMA > > + .set_policy =3D hugetlb_vm_op_set_policy, > > + .get_policy =3D hugetlb_vm_op_get_policy, > > +#endif > > }; > > > > static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *pa= ge, > > -- > > 2.31.1