From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.0 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E5C7C433DB for ; Mon, 4 Jan 2021 20:18:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9BC20221F8 for ; Mon, 4 Jan 2021 20:18:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9BC20221F8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 116968D002A; Mon, 4 Jan 2021 15:18:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0C7478D001C; Mon, 4 Jan 2021 15:18:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EF7208D002A; Mon, 4 Jan 2021 15:18:19 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id D98608D001C for ; Mon, 4 Jan 2021 15:18:19 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A27CD180AD807 for ; Mon, 4 Jan 2021 20:18:19 +0000 (UTC) X-FDA: 77669204718.25.need33_2501cad274d3 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 809001804E3A0 for ; Mon, 4 Jan 2021 20:18:19 +0000 (UTC) X-HE-Tag: need33_2501cad274d3 X-Filterd-Recvd-Size: 7752 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf18.hostedemail.com (Postfix) with ESMTP for ; Mon, 4 Jan 2021 20:18:18 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1609791498; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZmpJGv4TG2+177t1uzCmYIaOzWqm0wQxr8JJ8qGFelU=; b=FK/vZ1U87aE/GT2wKRZTIFY66MRODkHTgq8i381aKj0NdiXdAlr+/dYDVVoKRIewNAf70w /6WtLnvzeWQfuZHQCDQnIOQiGL6w3QkeANvPkXV3xaNtPXe3z9fkR24XxV8kq2IQjUhIwZ fIYy0SzYg2JhC2URsBeKHRXdO3bjcrU= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-221-yp8Xd9MsMSKe8HkVyXo-Lg-1; Mon, 04 Jan 2021 15:18:16 -0500 X-MC-Unique: yp8Xd9MsMSKe8HkVyXo-Lg-1 Received: by mail-wm1-f72.google.com with SMTP id w204so208594wmb.1 for ; Mon, 04 Jan 2021 12:18:16 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=sL80W/bpndKTkJr6XATsXD1MBTH0F9B9dG84CxK2Xe0=; b=NJtj5owKkqAFQ0mBuEbYPzepTmlOPDc8nL+tdJhrkw1Ckk1bbFh5wtghrJMcREtvfu pVbxKiNycU9ZAM4gOOXSIu7J8npDRForbec/I5Ktlp5CShavlubYDOILm1qCBP+CAOzh eaOrZzmj09vHVSoXGk+yZ9HO3MhuaX8aZp3CvYWo6ulJgT5gGQoQdZS6nvlciZSPlWIU nlVjPeC+bhEE8+Qva8Fu3ANSsrSN8xnEPdDgWzQFPuIY4EwOgCGAFwwcwnUG2c1sX3NE EaakhWPXXAYVU3x7fkCNwFNCWq+5U6+2WsrB8ync6C16j5KddFEm8/SYmEW2YOZKEfsT P8bg== X-Gm-Message-State: AOAM530i/b7rIr4cwBEeP5AQqeePxv0Ioj/gDRAMvFZc2FN5p9QgSYaA Rg18B7ezDfNW7qkIrjNVW2jXnRjln8Z2m8MjSZWNPh5pwsPUouVvC7linlSkP67aegixaEVy8NT YS3KNwt8n0go= X-Received: by 2002:a5d:43cc:: with SMTP id v12mr54719769wrr.319.1609791495696; Mon, 04 Jan 2021 12:18:15 -0800 (PST) X-Google-Smtp-Source: ABdhPJwx4gy5JFxjZnXjZPAl6stC7zCwsld9QLtkKU+syXPFvA2bF6Ji0oL2rXc4WQQ0vvJ9VuptQQ== X-Received: by 2002:a5d:43cc:: with SMTP id v12mr54719760wrr.319.1609791495539; Mon, 04 Jan 2021 12:18:15 -0800 (PST) Received: from [192.168.3.108] (p5b0c69d7.dip0.t-ipconnect.de. [91.12.105.215]) by smtp.gmail.com with ESMTPSA id h9sm845757wme.11.2021.01.04.12.18.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Jan 2021 12:18:15 -0800 (PST) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO Date: Mon, 4 Jan 2021 21:18:14 +0100 Message-Id: <96BB0656-F234-4634-853E-E2A747B6ECDB@redhat.com> References: Cc: David Hildenbrand , Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm , LKML , virtualization@lists.linux-foundation.org In-Reply-To: To: Liang Li X-Mailer: iPhone Mail (18C66) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 23.12.2020 um 13:12 schrieb Liang Li : >=20 > =EF=BB=BFOn Wed, Dec 23, 2020 at 4:41 PM David Hildenbrand wrote: >>=20 >> [...] >>=20 >>>> I was rather saying that for security it's of little use IMHO. >>>> Application/VM start up time might be improved by using huge pages (an= d >>>> pre-zeroing these). Free page reporting might be improved by using >>>> MADV_FREE instead of MADV_DONTNEED in the hypervisor. >>>>=20 >>>>> this feature, above all of them, which one is likely to become the >>>>> most strong one? From the implementation, you will find it is >>>>> configurable, users don't want to use it can turn it off. This is no= t >>>>> an option? >>>>=20 >>>> Well, we have to maintain the feature and sacrifice a page flag. For >>>> example, do we expect someone explicitly enabling the feature just to >>>> speed up startup time of an app that consumes a lot of memory? I highl= y >>>> doubt it. >>>=20 >>> In our production environment, there are three main applications have s= uch >>> requirement, one is QEMU [creating a VM with SR-IOV passthrough device]= , >>> anther other two are DPDK related applications, DPDK OVS and SPDK vhost= , >>> for best performance, they populate memory when starting up. For SPDK v= host, >>> we make use of the VHOST_USER_GET/SET_INFLIGHT_FD feature for >>> vhost 'live' upgrade, which is done by killing the old process and >>> starting a new >>> one with the new binary. In this case, we want the new process started = as quick >>> as possible to shorten the service downtime. We really enable this feat= ure >>> to speed up startup time for them :) Am I wrong or does using hugeltbfs/tmpfs ... i.e., a file not-deleted betwe= en shutting down the old instances and firing up the new instance just solv= e this issue? >>=20 >> Thanks for info on the use case! >>=20 >> All of these use cases either already use, or could use, huge pages >> IMHO. It's not your ordinary proprietary gaming app :) This is where >> pre-zeroing of huge pages could already help. >=20 > You are welcome. For some historical reason, some of our services are > not using hugetlbfs, that is why I didn't start with hugetlbfs. >=20 >> Just wondering, wouldn't it be possible to use tmpfs/hugetlbfs ... >> creating a file and pre-zeroing it from another process, or am I missing >> something important? At least for QEMU this should work AFAIK, where you >> can just pass the file to be use using memory-backend-file. >>=20 > If using another process to create a file, we can offload the overhead to > another process, and there is no need to pre-zeroing it's content, just > populating the memory is enough. Right, if non-zero memory can be tolerated (e.g., for vms usually has to). > If we do it that way, then how to determine the size of the file? it depe= nds > on the RAM size of the VM the customer buys. > Maybe we can create a file > large enough in advance and truncate it to the right size just before the > VM is created. Then, how many large files should be created on a host? That=E2=80=98s mostly already existing scheduling logic, no? (How many vms = can I put onto a specific machine eventually) > You will find there are a lot of things that have to be handled properly. > I think it's possible to make it work well, but we will transfer the > management complexity to up layer components. It's a bad practice to let > upper layer components process such low level details which should be > handled in the OS layer. It=E2=80=98s bad practice to squeeze things into the kernel that can just b= e handled on upper layers ;)