From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1053C433FE for ; Tue, 18 Oct 2022 09:24:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1FA3D6B0078; Tue, 18 Oct 2022 05:24:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1AB926B007B; Tue, 18 Oct 2022 05:24:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 073426B007D; Tue, 18 Oct 2022 05:24:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id EB4E56B0078 for ; Tue, 18 Oct 2022 05:24:45 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id AE0ABC0F64 for ; Tue, 18 Oct 2022 09:24:45 +0000 (UTC) X-FDA: 80033535330.19.7675ADA Received: from mail-pj1-f47.google.com (mail-pj1-f47.google.com [209.85.216.47]) by imf20.hostedemail.com (Postfix) with ESMTP id 2CA201C0006 for ; Tue, 18 Oct 2022 09:24:43 +0000 (UTC) Received: by mail-pj1-f47.google.com with SMTP id d7-20020a17090a2a4700b0020d268b1f02so16785829pjg.1 for ; Tue, 18 Oct 2022 02:24:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wKnlUfrKHH6eOsaSmAJUO+I2fU7S/gAKrbUxCi3SJ3I=; b=I0hgQCxv1YK3Yx7MFIoanQtnbRRU/q/Y9nwkex3L3rTzpV5SrXcyGeV/YoETuDGWxm G9SDoKCYuz/Hj3Afo74j1JWjtWJJpRJdmpIbRsDQfBP+DpvWwqG7hBtvkUzwVES0N5HF A6RJP6ihHhRL8P48d6btxHJHMFVY5oPdhpem07EVYa5mxuFcfMtxhMPHlYl4MI66DQpX SD+iuorSg6VvYfvrtT3rubN77vmvFwxljc7G3hzsEoxXJyC0rfb6t2KtRsW9HceT81FV /JYxN9xn806WrlnJTrq3VQ7gtxfhHy1kWq/wgN0wrZ0qpFgPMwK6ABNEReJ0xrw57ZiA oeRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wKnlUfrKHH6eOsaSmAJUO+I2fU7S/gAKrbUxCi3SJ3I=; b=ep2hwzr+3mNVhtAn1n28UcPQxQxrGeT3eMbtpHLY4HD7VzZbLavxEKuioyuy0m88D2 qjm2Ow7LKFJmzhgRWUMUGUX4tAJnyZx20zVCtkgywcwo1ELTo0bzOJBBB8tJW8eqXLmv NpStU8DBdoawhVsAjjdhEB8ed/qxxuPoP3DWLSGUgpm9H0/vXpKqoAjR67FDm1oLNAxJ nqJtcnPu6CdeWUYhF+FxkXEHZRKVUiuL+onWH+3cojHMm+mKgrhyUnn1p7QsN49YRKt6 NC3sghWTx9C3NCZjNePkklndLUR2+WBlMUDsHzEU/1pttUb/LEWGPCIHnCtrwYjkX0pp Y1nA== X-Gm-Message-State: ACrzQf3s+z1+4rZXxOPBfMVPTY9L1/H4SoNCm82wbCx2Tj/AwMXc9+Ra Lyq3Lpw7GK7T7bYhKlwiUCbNpP0pAM7YfNbK8A1xTQ== X-Google-Smtp-Source: AMsMyM6Al+UeqXMjxbQCJZ1izxgxVdWd1ECaLrmokKJkXUMrTkefqJRBaUMrjKi1IV7vAPEotVrwTE9p2hL2gHSCHZw= X-Received: by 2002:a17:902:ab89:b0:17a:67c:b9e9 with SMTP id f9-20020a170902ab8900b0017a067cb9e9mr2037192plr.55.1666085082721; Tue, 18 Oct 2022 02:24:42 -0700 (PDT) MIME-Version: 1.0 References: <20221012081526.73067-1-huangjie.albert@bytedance.com> <2aaf2c3a-6e49-abb9-b9c8-19ce87404982@redhat.com> <2f41fc4c-68eb-ab7d-970b-fcb10f474fd4@redhat.com> In-Reply-To: From: =?UTF-8?B?6buE5p2w?= Date: Tue, 18 Oct 2022 17:24:31 +0800 Message-ID: Subject: Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops To: Mike Kravetz Cc: David Hildenbrand , Muchun Song , Andrew Morton , linux-mm@kvack.org, linux-kernel Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666085085; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wKnlUfrKHH6eOsaSmAJUO+I2fU7S/gAKrbUxCi3SJ3I=; b=X0aOvzaQldFXV5X3ASmnEpu/gGRiswRTiZJVyQmDyBMQER+UkOzSnGLu74a8yOGfAlNqlP GW6eCJaiUBha/w9hEV7hkYpCJArQxDV9vnb3Y698KGa1HHLZLZPYE1hF/fE1CMX6d3Q5cs XfQMEaav2ezpRzCxcKxNX9J9w7PcnxA= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=I0hgQCxv; spf=pass (imf20.hostedemail.com: domain of huangjie.albert@bytedance.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=huangjie.albert@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666085085; a=rsa-sha256; cv=none; b=eZKUmF/1mqiGrkh31eegcbyRRO8TpiJsBL94CMYaUtfs2sVQgumaY1B5s40jqk/NZXhjOz 9GWqU+ns7RvjiQXY/diB2WEyVUuDpZVIuTw/5YEXPh3UZVPbCjhHyn5rYCbCcqDhA/hDxs ATQg9TPGxvSxU5uOsF6sDv8OWQyFdBA= X-Rspamd-Server: rspam12 X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=I0hgQCxv; spf=pass (imf20.hostedemail.com: domain of huangjie.albert@bytedance.com designates 209.85.216.47 as permitted sender) smtp.mailfrom=huangjie.albert@bytedance.com; dmarc=pass (policy=none) header.from=bytedance.com X-Stat-Signature: io339wbrqmzw99ndq8dc4ttwp6ojswhb X-Rspamd-Queue-Id: 2CA201C0006 X-HE-Tag: 1666085083-937957 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mike Kravetz =E4=BA=8E2022=E5=B9=B410=E6=9C=8818= =E6=97=A5=E5=91=A8=E4=BA=8C 01:59=E5=86=99=E9=81=93=EF=BC=9A > > On 10/17/22 13:33, David Hildenbrand wrote: > > On 17.10.22 11:48, =E9=BB=84=E6=9D=B0 wrote: > > > David Hildenbrand =E4=BA=8E2022=E5=B9=B410=E6=9C= =8817=E6=97=A5=E5=91=A8=E4=B8=80 16:44=E5=86=99=E9=81=93=EF=BC=9A > > > > > > > > On 12.10.22 10:15, Albert Huang wrote: > > > > > From: "huangjie.albert" > > > > > > > > > > implement these two functions so that we can set the mempolicy to > > > > > the inode of the hugetlb file. This ensures that the mempolicy of > > > > > all processes sharing this huge page file is consistent. > > > > > > > > > > In some scenarios where huge pages are shared: > > > > > if we need to limit the memory usage of vm within node0, so I set= qemu's > > > > > mempilciy bind to node0, but if there is a process (such as virti= ofsd) > > > > > shared memory with the vm, in this case. If the page fault is tri= ggered > > > > > by virtiofsd, the allocated memory may go to node1 which depends= on > > > > > virtiofsd. > > > > > > > > > > > > > Any VM that uses hugetlb should be preallocating memory. For exampl= e, > > > > this is the expected default under QEMU when using huge pages. > > > > > > > > Once preallocation does the right thing regarding NUMA policy, ther= e is > > > > no need to worry about it in other sub-processes. > > > > > > > > > > Hi, David > > > thanks for your reminder > > > > > > Yes, you are absolutely right, However, the pre-allocation mechanism > > > does solve this problem. > > > However, some scenarios do not like to use the pre-allocation mechani= sm, such as > > > scenarios that are sensitive to virtual machine startup time, or > > > scenarios that require > > > high memory utilization. The on-demand allocation mechanism may be be= tter, > > > so the key point is to find a way support for shared policy=E3=80=82 > > > > Using hugetlb -- with a fixed pool size -- without preallocation is lik= e > > playing with fire. Hugetlb reservation makes one believe that on-demand > > allocation is going to work, but there are various scenarios where that= can > > go seriously wrong, and you can run out of huge pages. > > I absolutely agree with this cautionary note. > > hugetlb reservations guarantee that a sufficient number of huge pages exi= st. > However, there is no guarantee that those pages are on any specific node > associated with a numa policy. Therefore, an 'on demand' allocation coul= d > fail resulting in SIGBUS being set to the faulting process. > - Yes, supporting on-demand requires adding a lot of other code to support, I have thought about this, but there is currently no code that is suitable for submitting to the community. > Mike Kravetz