From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6CCA5C433DB for ; Tue, 22 Dec 2020 14:01:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CFB3923124 for ; Tue, 22 Dec 2020 14:01:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CFB3923124 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2192E6B0092; Tue, 22 Dec 2020 09:01:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1A3266B0095; Tue, 22 Dec 2020 09:01:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 092FC8D0007; Tue, 22 Dec 2020 09:01:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id E55936B0092 for ; Tue, 22 Dec 2020 09:01:13 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id AC22182499A8 for ; Tue, 22 Dec 2020 14:01:13 +0000 (UTC) X-FDA: 77621080026.08.hour81_52073f227460 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 741481819E773 for ; Tue, 22 Dec 2020 14:01:11 +0000 (UTC) X-HE-Tag: hour81_52073f227460 X-Filterd-Recvd-Size: 8428 Received: from mail-lf1-f41.google.com (mail-lf1-f41.google.com [209.85.167.41]) by imf19.hostedemail.com (Postfix) with ESMTP for ; Tue, 22 Dec 2020 14:01:10 +0000 (UTC) Received: by mail-lf1-f41.google.com with SMTP id 23so32123804lfg.10 for ; Tue, 22 Dec 2020 06:01:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=hI5vOGLTb6+d/ZiBGujMN42RTQ0cnMqktl558H/e2DE=; b=ULLYhOa+G6D2xLAi91PV7GcuH43KR/Vj6VD0unco1IU9a44FQy92CmCHJUIQS8sGub aQ/sLJ/a5ly/+/V+7KCkbXyCeJBISUZDUOxAfghZ3gclrtXWCorvN22+KXoHZFGuvXWC W+CW+1xtkFWUQuAMM191VB1S6NDzkn+fG/Yi4s9yaLqs6HB5X4ygBmnpN9tx6A5pwgDI ebWrTtJ86jvjFtVd5Qzh2nDxJAwHuSW6ti2gADY8HX9ehqgWdhpD+eYIzc3g3es0uwzM yFbjpfR/Vnh3fBeX7ld+5aCyB6J9+CTYTOnt+7WGfEF/yJ8vkmSirwhokzbRfP68pLHt 9kHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=hI5vOGLTb6+d/ZiBGujMN42RTQ0cnMqktl558H/e2DE=; b=hDmkgHaOrUGTenzGGV7YRsahPLWkUOnO1HBZjEDxu/iUkG17tB/vB15PrQlHqvcc8M 8ovjMJDyWLK/f5wuB0E6DanDhl+4+lQg6+mfgoV6UjlzHLRnlVMRCnAQDLqV6NLxPeZ0 mCLDoXjCk13EySyfloProMlF9fLelzjrZhIB3rmBv6Bdx+XgamL5mp2XlNHQGgsklfXb YXJnBNVPQs0lU7fgAMMNFSejhcfdUAlxqVwoByyWi84ucUAdS/LojgFt+68b6aN9uIaa 7b0mkZXdb+YA4/gcXOEe5wOf0jhar6BjqGXfXCBLOFlgzrLDx5pkB03a6seaVC/Q8FsB xGIg== X-Gm-Message-State: AOAM531CqiO8cMayjaRkouSeX5UHuNnwMKg+AvNqjEZ2lkxDEZiYwgAH 2Sx///fGCFjjkHmnYO4GH7EB8Mo6U2C6FNUjbnM= X-Google-Smtp-Source: ABdhPJzh49zs0ovSSN1JIwZGOGUUFsVpbNgHk7cOXudWZ790T2phedRcAZavLNNHZ16x5VoG8+QY4ysmFZ9yvBk/WWM= X-Received: by 2002:a05:6512:1112:: with SMTP id l18mr8449686lfg.538.1608645669043; Tue, 22 Dec 2020 06:01:09 -0800 (PST) MIME-Version: 1.0 References: <20201221162519.GA22504@open-light-1.localdomain> <7bf0e895-52d6-9e2d-294b-980c33cf08e4@redhat.com> <840ff69d-20d5-970a-1635-298000196f3e@redhat.com> In-Reply-To: <840ff69d-20d5-970a-1635-298000196f3e@redhat.com> From: Liang Li Date: Tue, 22 Dec 2020 22:00:56 +0800 Message-ID: Subject: Re: [RFC v2 PATCH 0/4] speed up page allocation for __GFP_ZERO To: David Hildenbrand Cc: Alexander Duyck , Mel Gorman , Andrew Morton , Andrea Arcangeli , Dan Williams , "Michael S. Tsirkin" , Jason Wang , Dave Hansen , Michal Hocko , Liang Li , linux-mm@kvack.org, linux-kernel@vger.kernel.org, virtualization@lists.linux-foundation.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: https://static.sched.com/hosted_files/kvmforum2020/51/The%20Practice%20Meth= od%20to%20Speed%20Up%2010x%20Boot-up%20Time%20for%20Guest%20in%20Alibaba%20= Cloud.pdf > > > > and the flowing link is mine: > > https://static.sched.com/hosted_files/kvmforum2020/90/Speed%20Up%20Crea= tion%20of%20a%20VM%20With%20Passthrough%20GPU.pdf > > Thanks for the pointers! I actually did watch your presentation. You're welcome! And thanks for your time! :) > >> > >>> > >>> Improve guest performance when use VIRTIO_BALLOON_F_REPORTING for mem= ory > >>> overcommit. The VIRTIO_BALLOON_F_REPORTING feature will report guest = page > >>> to the VMM, VMM will unmap the corresponding host page for reclaim, > >>> when guest allocate a page just reclaimed, host will allocate a new p= age > >>> and zero it out for guest, in this case pre zero out free page will h= elp > >>> to speed up the proccess of fault in and reduce the performance impac= tion. > >> > >> Such faults in the VMM are no different to other faults, when first > >> accessing a page to be populated. Again, I wonder how much of a > >> difference it actually makes. > >> > > > > I am not just referring to faults in the VMM, I mean the whole process > > that handles guest page faults. > > without VIRTIO_BALLOON_F_REPORTING, pages used by guests will be zero > > out only once by host. With VIRTIO_BALLOON_F_REPORTING, free pages are > > reclaimed by the host and may return to the host buddy > > free list. When the pages are given back to the guest, the host kernel > > needs to zero out it again. It means > > with VIRTIO_BALLOON_F_REPORTING, guest memory performance will be > > degraded for frequently > > zero out operation on host side. The performance degradation will be > > obvious for huge page case. Free > > page pre zero out can help to make guest memory performance almost the > > same as without > > VIRTIO_BALLOON_F_REPORTING. > > Yes, what I am saying is that this fault handling is no different to > ordinary faults when accessing a virtual memory location the first time > and populating a page. The only difference is that it happens > continuously, not only the first time we touch a page. > > And we might be able to improve handling in the hypervisor in the > future. We have been discussing using MADV_FREE instead of MADV_DONTNEED > in QEMU for handling free page reporting. Then, guest reported pages > will only get reclaimed by the hypervisor when there is actual memory > pressure in the hypervisor (e.g., when about to swap). And zeroing a > page is an obvious improvement over going to swap. The price for zeroing > pages has to be paid at one point. > > Also note that we've been discussing cache-related things already. If > you zero out before giving the page to the guest, the page will already > be in the cache - where the guest directly wants to access it. > OK, that's very reasonable and much better. Looking forward for your work. > >>> > >>> Security > >>> =3D=3D=3D=3D=3D=3D=3D=3D > >>> This is a weak version of "introduce init_on_alloc=3D1 and init_on_fr= ee=3D1 > >>> boot options", which zero out page in a asynchronous way. For users c= an't > >>> tolerate the impaction of 'init_on_alloc=3D1' or 'init_on_free=3D1' b= rings, > >>> this feauture provide another choice. > >> "we don=E2=80=99t pre zero out all the free pages" so this is of littl= e actual use. > > > > OK. It seems none of the reasons listed above is strong enough for > > I was rather saying that for security it's of little use IMHO. > Application/VM start up time might be improved by using huge pages (and > pre-zeroing these). Free page reporting might be improved by using > MADV_FREE instead of MADV_DONTNEED in the hypervisor. > > > this feature, above all of them, which one is likely to become the > > most strong one? From the implementation, you will find it is > > configurable, users don't want to use it can turn it off. This is not > > an option? > > Well, we have to maintain the feature and sacrifice a page flag. For > example, do we expect someone explicitly enabling the feature just to > speed up startup time of an app that consumes a lot of memory? I highly > doubt it. In our production environment, there are three main applications have such requirement, one is QEMU [creating a VM with SR-IOV passthrough device], anther other two are DPDK related applications, DPDK OVS and SPDK vhost, for best performance, they populate memory when starting up. For SPDK vhost= , we make use of the VHOST_USER_GET/SET_INFLIGHT_FD feature for vhost 'live' upgrade, which is done by killing the old process and starting a new one with the new binary. In this case, we want the new process started as q= uick as possible to shorten the service downtime. We really enable this feature to speed up startup time for them :) > I'd love to hear opinions of other people. (a lot of people are offline > until beginning of January, including, well, actually me :) ) OK. I will wait some time for others' feedback. Happy holidays! thanks! Liang