From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C89D9C433E0 for ; Tue, 22 Dec 2020 08:47:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 52EE123127 for ; Tue, 22 Dec 2020 08:47:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 52EE123127 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 995B48D0005; Tue, 22 Dec 2020 03:47:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 91F5C6B009F; Tue, 22 Dec 2020 03:47:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E82E8D0005; Tue, 22 Dec 2020 03:47:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194]) by kanga.kvack.org (Postfix) with ESMTP id 657106B009E for ; Tue, 22 Dec 2020 03:47:43 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 22820180AD815 for ; Tue, 22 Dec 2020 08:47:43 +0000 (UTC) X-FDA: 77620290006.05.pin59_3e16f732745e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 0AA60180373DA for ; Tue, 22 Dec 2020 08:47:43 +0000 (UTC) X-HE-Tag: pin59_3e16f732745e X-Filterd-Recvd-Size: 7189 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 08:47:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608626861; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZMcP2wLii6Kjvj+3Y3lAFZyeOUTv8mMHwaZj9JwWr84=; b=aVgT3EffFwMcAXXOa2zt55gHnhJAqCveLFwspIVlidvb8/074KKLx6TwDajns69s9NVcwV N4QHmlD9mzN7E0w49+ufDo5csYX0+VTi5XnDq38KkyQ/fT8C7uh2V60SpeLDwW/gOsZ5/p UABnlIa47WuK1Fc9/caZ3FvBy2VRz6U= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-491-qwENkSFTNHmswTNiqYGdzw-1; Tue, 22 Dec 2020 03:47:39 -0500 X-MC-Unique: qwENkSFTNHmswTNiqYGdzw-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 2B73F107ACE8; Tue, 22 Dec 2020 08:47:37 +0000 (UTC) Received: from [10.36.113.220] (ovpn-113-220.ams2.redhat.com [10.36.113.220]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2A2C75D9F8; Tue, 22 Dec 2020 08:47:27 +0000 (UTC) Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO To: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org References: <20201221162519.GA22504@open-light-1.localdomain> From: David Hildenbrand Organization: Red Hat GmbH Message-ID: <7bf0e895-52d6-9e2d-294b-980c33cf08e4@redhat.com> Date: Tue, 22 Dec 2020 09:47:27 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20201221162519.GA22504@open-light-1.localdomain> Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 21.12.20 17:25, Liang Li wrote: > The first version can be found at: https://lkml.org/lkml/2020/4/12/42 >=20 > Zero out the page content usually happens when allocating pages with > the flag of __GFP_ZERO, this is a time consuming operation, it makes > the population of a large vma area very slowly. This patch introduce > a new feature for zero out pages before page allocation, it can help > to speed up page allocation with __GFP_ZERO. >=20 > My original intention for adding this feature is to shorten VM > creation time when SR-IOV devicde is attached, it works good and the > VM creation time is reduced by about 90%. >=20 > Creating a VM [64G RAM, 32 CPUs] with GPU passthrough > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D > QEMU use 4K pages, THP is off > round1 round2 round3 > w/o this patch: 23.5s 24.7s 24.6s > w/ this patch: 10.2s 10.3s 11.2s >=20 > QEMU use 4K pages, THP is on > round1 round2 round3 > w/o this patch: 17.9s 14.8s 14.9s > w/ this patch: 1.9s 1.8s 1.9s > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D >=20 I am still not convinces that we want/need this for this (main) use case. Why can't we use huge pages for such use cases (that really care about VM creation time) and rather deal with pre-zeroing of huge pages instead? If possible, I'd like to avoid GFP_ZERO (for reasons already discussed). > Obviously, it can do more than this. We can benefit from this feature > in the flowing case: >=20 > Interactive sence > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Shorten application lunch time on desktop or mobile phone, it can help > to improve the user experience. Test shows on a > server [Intel(R) Xeon(R) CPU E5-2620 v3 @ 2.40GHz], zero out 1GB RAM by > the kernel will take about 200ms, while some mainly used application > like Firefox browser, Office will consume 100 ~ 300 MB RAM just after > launch, by pre zero out free pages, it means the application launch > time will be reduced about 20~60ms (can be visual sensed?). May be > we can make use of this feature to speed up the launch of Andorid APP > (I didn't do any test for Android). I am not really sure if you can actually visually sense a difference in your examples. Startup time of an application is not just memory allocation (page zeroing) time. It would be interesting of much of a difference this actually makes in practice. (e.g., firefox startup time etc.) >=20 > Virtulization > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > Speed up VM creation and shorten guest boot time, especially for PCI > SR-IOV device passthrough scenario. Compared with some of the para > vitalization solutions, it is easy to deploy because it=E2=80=99s trans= parent > to guest and can handle DMA properly in BIOS stage, while the para > virtualization solution can=E2=80=99t handle it well. What is the "para virtualization" approach you are talking about? >=20 > Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for memor= y > overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest pa= ge > to the VMM, VMM will unmap the corresponding host page for reclaim, > when guest allocate a page just reclaimed, host will allocate a new pag= e > and zero it out for guest, in this case pre zero out free page will hel= p > to speed up the proccess of fault in and reduce the performance impacti= on. Such faults in the VMM are no different to other faults, when first accessing a page to be populated. Again, I wonder how much of a difference it actually makes. >=20 > Speed up kernel routine > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > This can=E2=80=99t be guaranteed because we don=E2=80=99t pre zero out = all the free pages, > but is true for most case. It can help to speed up some important syste= m > call just like fork, which will allocate zero pages for building page > table. And speed up the process of page fault, especially for huge page > fault. The POC of Hugetlb free page pre zero out has been done. Would be interesting to have an actual example with some numbers. >=20 > Security > =3D=3D=3D=3D=3D=3D=3D=3D > This is a weak version of "introduce init_on_alloc=3D1 and init_on_free= =3D1 > boot options", which zero out page in a asynchronous way. For users can= 't > tolerate the impaction of 'init_on_alloc=3D1' or 'init_on_free=3D1' bri= ngs, > this feauture provide another choice. "we don=E2=80=99t pre zero out all the free pages" so this is of little a= ctual use. --=20 Thanks, David / dhildenb