From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AF9AC4332F for ; Mon, 17 Oct 2022 11:47:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CD0D6B0074; Mon, 17 Oct 2022 07:47:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87C876B0078; Mon, 17 Oct 2022 07:47:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76C1C6B007B; Mon, 17 Oct 2022 07:47:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 6A4606B0074 for ; Mon, 17 Oct 2022 07:47:09 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 47144AAF39 for ; Mon, 17 Oct 2022 11:47:09 +0000 (UTC) X-FDA: 80030265378.07.DDBA9C8 Received: from mail-pf1-f170.google.com (mail-pf1-f170.google.com [209.85.210.170]) by imf05.hostedemail.com (Postfix) with ESMTP id D8B25100035 for ; Mon, 17 Oct 2022 11:47:07 +0000 (UTC) Received: by mail-pf1-f170.google.com with SMTP id g28so10822049pfk.8 for ; Mon, 17 Oct 2022 04:47:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ksfEmUMb7fqe6MKjS5VADZftutcjHhvp3aU5YGY9770=; b=ilshxEz0NM4IUFvrlO+i3NTmAKABpZ1PYdzmiFfnSYGrQbzxtp/QLuGZ3Zhbwedmtu Op7wxHqtoU8S+Cf4rSl2QlHXDZC/wiACHr/hejbCxiy7eAn4O/TOBQKpF6X4ynm1byWl 5/ietLzqATBpk6IfF2izzfzWHAA9uGDKgzmAlJHSbZ144POSu8yOMa8GU26k52JJAhrs L8H/uwXK61DUFxeJiyRdoqW/fIkU+bY7LMakvB/Ws45g6o32RAjj+iV3p42BQ3GIpiLp fmxWKcTll2W4tvAuYGiopLbgrAN7CskfVrGhNdHqJsbfuAqyCHuRmJLXJetgMEAX3BeP EwdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ksfEmUMb7fqe6MKjS5VADZftutcjHhvp3aU5YGY9770=; b=cHLcJ8BdOTozI8QKBsmRBSFWvAI8f2WFVwqP+tKgretWGoMzbypFWMwN+g1xr/NUxQ IShm9aaMu7E9za7tRt/qfCy+0CQb7Pd+BgVadJLqeI1OV8C+acjQUFRNWeG0Kaqh8JP8 SdVufwuNJevC9PNgwYoZnMYryfg8UtpPzi8ekeelfqm0emRnPbHebnuxpYh56ZWL0+lU TFrv3o4lYApi1KZdLJmhwuu2I6Jeqcg4aJ9JTDZbf0UD+UCZnZmkD1putA/l+BX2Widk 4xiQX8hDLordbolvc7ypdnflzeMDEReYYk/sO/YkdTfHt8xjh2o8TcZ/ODqHr69MXXiQ WrCQ== X-Gm-Message-State: ACrzQf2xRt2zaIzWUORidb5n4gPMjutjcjo8BJLlbSItnZWxHe0qeDNl bgiyF52+UM5hvMTVMZ3muX3HDXTQl7uHyDYb/9ZczA== X-Google-Smtp-Source: AMsMyM5/2hMQhvXO1kSvrXbnNa8L+9UJVa78f0Y0yDrWwnFN4J7x4pyTOmjhSPeghQKkJRj+/RqH/KHeTd2YnKU9rdo= X-Received: by 2002:a05:6a00:1946:b0:565:c337:c530 with SMTP id s6-20020a056a00194600b00565c337c530mr12019565pfk.47.1666007226701; Mon, 17 Oct 2022 04:47:06 -0700 (PDT) MIME-Version: 1.0 References: <20221012081526.73067-1-huangjie.albert@bytedance.com> <2aaf2c3a-6e49-abb9-b9c8-19ce87404982@redhat.com> <2f41fc4c-68eb-ab7d-970b-fcb10f474fd4@redhat.com> In-Reply-To: <2f41fc4c-68eb-ab7d-970b-fcb10f474fd4@redhat.com> From: =?UTF-8?B?6buE5p2w?= Date: Mon, 17 Oct 2022 19:46:55 +0800 Message-ID: Subject: Re: [External] Re: [PATCH] mm: hugetlb: support get/set_policy for hugetlb_vm_ops To: David Hildenbrand Cc: songmuchun@bytedance.com, Mike Kravetz , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=ilshxEz0; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf05.hostedemail.com: domain of huangjie.albert@bytedance.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=huangjie.albert@bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666007228; a=rsa-sha256; cv=none; b=CPsM8AFm6AnWUrUso8b164mSr6TCJRm+dxi7Ptp5fool/FWgrWJB3XF03l+lb3nqu4/C+1 wr9tGCkZil1qGEyw9JxUgvVb7xEzz7Z4YZzOxZNl1w2t/60oAGj5au82wHNADtfTvoNfyp C4J9piXinbmWLM7eCp+wCioFgZcdrvc= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666007228; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ksfEmUMb7fqe6MKjS5VADZftutcjHhvp3aU5YGY9770=; b=p8jC9irOq744CVwsa8xsd26NL9dFySJcuetaKWpi/B59eDcjB7yOt1kdKHcj2jE7EoGgzA y+0y2vWoHavg27UB6BALjcvog7OrfJy83YqWiRB1pwhqO2RnSTsaU8FjBKXmo/4YURL8jB ba67vm8heSxDCfpyYXlMZYst7pvCAps= Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=ilshxEz0; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf05.hostedemail.com: domain of huangjie.albert@bytedance.com designates 209.85.210.170 as permitted sender) smtp.mailfrom=huangjie.albert@bytedance.com X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: bykcr6s9nbndadqnog1pc3kf5oki6obe X-Rspamd-Queue-Id: D8B25100035 X-HE-Tag: 1666007227-957475 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: David Hildenbrand =E4=BA=8E2022=E5=B9=B410=E6=9C=8817=E6= =97=A5=E5=91=A8=E4=B8=80 19:33=E5=86=99=E9=81=93=EF=BC=9A > > On 17.10.22 11:48, =E9=BB=84=E6=9D=B0 wrote: > > David Hildenbrand =E4=BA=8E2022=E5=B9=B410=E6=9C=881= 7=E6=97=A5=E5=91=A8=E4=B8=80 16:44=E5=86=99=E9=81=93=EF=BC=9A > >> > >> On 12.10.22 10:15, Albert Huang wrote: > >>> From: "huangjie.albert" > >>> > >>> implement these two functions so that we can set the mempolicy to > >>> the inode of the hugetlb file. This ensures that the mempolicy of > >>> all processes sharing this huge page file is consistent. > >>> > >>> In some scenarios where huge pages are shared: > >>> if we need to limit the memory usage of vm within node0, so I set qem= u's > >>> mempilciy bind to node0, but if there is a process (such as virtiofsd= ) > >>> shared memory with the vm, in this case. If the page fault is trigger= ed > >>> by virtiofsd, the allocated memory may go to node1 which depends on > >>> virtiofsd. > >>> > >> > >> Any VM that uses hugetlb should be preallocating memory. For example, > >> this is the expected default under QEMU when using huge pages. > >> > >> Once preallocation does the right thing regarding NUMA policy, there i= s > >> no need to worry about it in other sub-processes. > >> > > > > Hi, David > > thanks for your reminder > > > > Yes, you are absolutely right, However, the pre-allocation mechanism > > does solve this problem. > > However, some scenarios do not like to use the pre-allocation mechanism= , such as > > scenarios that are sensitive to virtual machine startup time, or > > scenarios that require > > high memory utilization. The on-demand allocation mechanism may be bett= er, > > so the key point is to find a way support for shared policy=E3=80=82 > > Using hugetlb -- with a fixed pool size -- without preallocation is like > playing with fire. Hugetlb reservation makes one believe that on-demand > allocation is going to work, but there are various scenarios where that > can go seriously wrong, and you can run out of huge pages. > > If you're using hugetlb as memory backend for a VM without > preallocation, you really have to be very careful. I can only advise > against doing that. > > > Also: why does another process read/write *first* to a guest physical > memory location before the OS running inside the VM even initialized > that memory? That sounds very wrong. What am I missing? > for example : virtio ring buffer. For the avial descriptor, the guest kernel only gives an address to the backend, and does not actually access the memory. Thanks. > -- > Thanks, > > David / dhildenb >