From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97021C433FE for ; Mon, 17 Oct 2022 03:35:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BEF648E0001; Sun, 16 Oct 2022 23:35:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B9E076B0074; Sun, 16 Oct 2022 23:35:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A659E8E0001; Sun, 16 Oct 2022 23:35:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 90E8B6B0072 for ; Sun, 16 Oct 2022 23:35:24 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 5CA51402B3 for ; Mon, 17 Oct 2022 03:35:24 +0000 (UTC) X-FDA: 80029026168.27.BDF951D Received: from mail-pg1-f177.google.com (mail-pg1-f177.google.com [209.85.215.177]) by imf02.hostedemail.com (Postfix) with ESMTP id 3B56280032 for ; Mon, 17 Oct 2022 03:35:22 +0000 (UTC) Received: by mail-pg1-f177.google.com with SMTP id 78so9336455pgb.13 for ; Sun, 16 Oct 2022 20:35:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=UQyjFxKIVxpO9URep8o3V12gx+adrZro37WDC1j2Avo=; b=NIkKHteXRHMRyGEOkF2GN51kuhLYBDIsCqj7JH/TX8J9nV/lAhBXiEkk8wMs+pz+cq OCWlGa1LQ2NBUNxH2BQaxaAgt6R+SDAOehrPE8Bo0pSTuftpnfRYh8V7gAxHNyzEkJSn FnCPoN40NgExD/oGch5fDbOIQl6KXK9MQ9S0iS9KILW2jJnPC8Gd9/fA0JQ511zaBlti Mx+SyrJeS/7J/hCeZB4lT7y21gg5IXV6RsTL2tPhlIwcxsmq5tABWsGEDijDO2C1HWLF 56Qenb6r4/Jxhuol8Qw605NQ0hkrdHzsmRmPc5jBIqaUhaF07A2smN+dDr4QBp0gCqLw 2ZiA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=UQyjFxKIVxpO9URep8o3V12gx+adrZro37WDC1j2Avo=; b=kTl2M6Mw7Xplv80RlVmnfesFeeJIExgW1fMysuvEffzGFj1Rh9juv/eErl8xXXZHVa mp0mG9ULrJz6iCxmplyIRBsh0ceNUkVVSgmqalu8GjOKJydiN5OY4AMxaZxNpPjlDa3z gNhkLBnT4MEw+0obo6kAvNZQSsao/90G7PvbhWzwAza2MypJAMMUMbVTzhkWEkQFS3Ss MX4UGa+P/3iYUZ13SbiFvcI0NuUtr83/T6nMcirdQnI6m3hyfDybvE2gQ4AaLB46FfL2 BMXdVPb0ig3IWGvWrMF112ssNkTa6zevFzvDoNRAgoxAiCYsgtMyTSNMZhQ+NQakBqOd nsBg== X-Gm-Message-State: ACrzQf1lRSIquoxsdv/iSy1calVdrG6loCZiVNKtEJ6a/KLC4y8epfNW 5K7ELOPVUucJQzbg+kD07h7tENei1w9TRiadTeox3A== X-Google-Smtp-Source: AMsMyM5G+lcf8noGBvLgbaSV+EfnhjP2kP4NstSFqvBKxSXcQAJ4ifu2hX9RPHaq9Cr+AhdbPEvHeoz805m3a1e5pJA= X-Received: by 2002:a05:6a00:1943:b0:563:8173:b46a with SMTP id s3-20020a056a00194300b005638173b46amr10139385pfk.34.1665977720711; Sun, 16 Oct 2022 20:35:20 -0700 (PDT) MIME-Version: 1.0 References: <20221012081526.73067-1-huangjie.albert@bytedance.com> <5f7ef6ee-6241-9912-f434-962be53272c@google.com> In-Reply-To: From: =?UTF-8?B?6buE5p2w?= Date: Mon, 17 Oct 2022 11:35:09 +0800 Message-ID: Subject: Re: [External] Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops To: Mike Kravetz Cc: Hugh Dickins , Muchun Song , Andi Kleen , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665977723; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UQyjFxKIVxpO9URep8o3V12gx+adrZro37WDC1j2Avo=; b=Bq+lD/j0zbpdsJ6opiibMMO2Pa+qsdLUOTb8ClcJ48UIntjIgek4YtuAo9kIT5W+COJs7h h599W0KBat8aWIoe98Fl6hNU19KsYGq9nxU7kuvQwTOa1QFlg7ejKD7m+qcixaTzK4VypN IpNyoRBFDLtqYc/YPGFdz0pqepN9Pjs= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=NIkKHteX; spf=pass (imf02.hostedemail.com: domain of huangjie.albert@bytedance.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=huangjie.albert@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665977723; a=rsa-sha256; cv=none; b=JW3Az3gt5xO5BYpAXIjiz1amOTHFR3Ng+ltWzZFT5NoXdbnYTj7dHVo7kVwAuV33E+DcHL lsIQ2aQixVJeiLR17LrFPDuRABmLHepU/nooAj+d3UikfA3moHtKLkkID6CA6A2jL8q5uK CMwdKBKq0c3ZvLPzh5JLxVv979enm+E= X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 3B56280032 X-Rspam-User: Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=NIkKHteX; spf=pass (imf02.hostedemail.com: domain of huangjie.albert@bytedance.com designates 209.85.215.177 as permitted sender) smtp.mailfrom=huangjie.albert@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Stat-Signature: ap683ytzkxrrrjh3oy4hskfmpjx13nwc X-HE-Tag: 1665977722-331009 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mike Kravetz =E4=BA=8E2022=E5=B9=B410=E6=9C=8815= =E6=97=A5=E5=91=A8=E5=85=AD 00:56=E5=86=99=E9=81=93=EF=BC=9A > > On 10/12/22 12:45, Hugh Dickins wrote: > > On Wed, 12 Oct 2022, Albert Huang wrote: > > > > > From: "huangjie.albert" > > > > > > implement these two functions so that we can set the mempolicy to > > > the inode of the hugetlb file. This ensures that the mempolicy of > > > all processes sharing this huge page file is consistent. > > > > > > In some scenarios where huge pages are shared: > > > if we need to limit the memory usage of vm within node0, so I set qem= u's > > > mempilciy bind to node0, but if there is a process (such as virtiofsd= ) > > > shared memory with the vm, in this case. If the page fault is trigger= ed > > > by virtiofsd, the allocated memory may go to node1 which depends on > > > virtiofsd. > > > > > > Signed-off-by: huangjie.albert > > Thanks for the patch Albert, and thank you Hugh for the comments! > > > Aha! Congratulations for noticing, after all this time. hugetlbfs > > contains various little pieces of code that pretend to be supporting > > shared NUMA mempolicy, but in fact there was nothing connecting it up. > > I actually had to look this up to verify it was not supported. However, = the > documentation is fairly clear. > From admin-guide/mm/numa_memory_policy.rst. > > "As of 2.6.22, only shared memory segments, created by shmget() or > mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy. When shared > policy support was added to Linux, the associated data structures were > added to hugetlbfs shmem segments. At the time, hugetlbfs did not > support allocation at fault time--a.k.a lazy allocation--so hugetlbfs > shmem segments were never "hooked up" to the shared policy support. > Although hugetlbfs segments now support lazy allocation, their support > for shared policy has not been completed." > > It is somewhat embarrassing that this has been known for so long and > nothing has changed. > > > It will be for Mike to decide, but personally I oppose adding > > shared NUMA mempolicy support to hugetlbfs, after eighteen years. > > > > The thing is, it will change the behaviour of NUMA on hugetlbfs: > > in ways that would have been sensible way back then, yes; but surely > > those who have invested in NUMA and hugetlbfs have developed other > > ways of administering it successfully, without shared NUMA mempolicy. > > > > At the least, I would expect some tests to break (I could easily be > > wrong), and there's a chance that some app or tool would break too. > > > > I have carried the reverse of Albert's patch for a long time, stripping > > out the pretence of shared NUMA mempolicy support from hugetlbfs: I > > wanted that, so that I could work on modifying the tmpfs implementation= , > > without having to worry about other users. > > > > Mike, if you would prefer to see my patch stripping out the pretence, > > let us know: it has never been a priority to send in, but I can update > > it to 6.1-rc1 if you'd like to see it. (Once upon a time, it removed > > all need for struct hugetlbfs_inode_info, but nowadays that's still > > required for the memfd seals.) > > > > Whether Albert's patch is complete and correct, I haven't begun to thin= k > > about: I am not saying it isn't, but shared NUMA mempolicy adds another > > dimension of complexity, and need for support, that I think hugetlbfs > > would be better off continuing to survive without. > > To be honest, I have not looked into the complexities of shared NUMA > mempolicy and exactly what is required for it's support. With my limited > knowledge, it appears that this patch adds some type of support for share= d > policy, but it may not provide all support mentioned in the documentation= . > > At the very least, this patch should also update documentation to state > what type of support is provided. > > Albert, can you look into what would be required for full support? I can= take > a look as well but have some other higher priority tasks to work first. > Lucky to do this job, let me think about it. > TBH, I like Hugh's idea of removing the 'pretence of shared policy suppor= t'. > We are currently wasting memory carrying around extra unused fields in > hugetlbfs_inode_info. :( > -- > Mike Kravetz