From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.9 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C8055C388F9 for ; Wed, 21 Oct 2020 11:31:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id DC3DF2177B for ; Wed, 21 Oct 2020 11:31:18 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="QwugJTHg" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org DC3DF2177B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id CD2CB6B005C; Wed, 21 Oct 2020 07:31:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C83F86B0062; Wed, 21 Oct 2020 07:31:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B71D26B0068; Wed, 21 Oct 2020 07:31:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id 8BD5E6B005C for ; Wed, 21 Oct 2020 07:31:17 -0400 (EDT) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 12B311EE6 for ; Wed, 21 Oct 2020 11:31:17 +0000 (UTC) X-FDA: 77395716594.18.uncle57_451689027247 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id E6B441014B57A for ; Wed, 21 Oct 2020 11:31:16 +0000 (UTC) X-HE-Tag: uncle57_451689027247 X-Filterd-Recvd-Size: 4526 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf14.hostedemail.com (Postfix) with ESMTP for ; Wed, 21 Oct 2020 11:31:16 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1603279875; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=RW7tOE3D2afcPGdHwdZqtikZ9dAMRd9QiBe01yT5Y7o=; b=QwugJTHgAjBJGGzJwz85Cl86aBem4nMApyyEPzaBf4IhwXo7Tft0VIpFAfqWHT6lm1aAhB l3cy7P63EBbU/GeMQAJ/5AK5qwRA/q6fKOKo+abTOIBhvFXwspBcSMH6YiCtMzbJVq99ai Es6ac5YeKkeGYH9X1lL6eVw6kSVUwu4= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 3A369AF86; Wed, 21 Oct 2020 11:31:15 +0000 (UTC) Date: Wed, 21 Oct 2020 13:31:14 +0200 From: Michal Hocko To: David Hildenbrand Cc: "Guilherme G. Piccoli" , Mike Kravetz , linux-mm@kvack.org, kernel-hardening@lists.openwall.com, linux-hardening@vger.kernel.org, linux-security-module@vger.kernel.org, kernel@gpiccoli.net, cascardo@canonical.com, Alexander Potapenko , James Morris , Kees Cook Subject: Re: [PATCH] mm, hugetlb: Avoid double clearing for hugetlb pages Message-ID: <20201021113114.GC23790@dhcp22.suse.cz> References: <20201019182853.7467-1-gpiccoli@canonical.com> <20201020082022.GL27114@dhcp22.suse.cz> <9cecd9d9-e25c-4495-50e2-8f7cb7497429@canonical.com> <20201021061538.GA23790@dhcp22.suse.cz> <0ad2f879-7c72-3eef-5cb6-dee44265eb82@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <0ad2f879-7c72-3eef-5cb6-dee44265eb82@redhat.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 21-10-20 11:50:48, David Hildenbrand wrote: > On 21.10.20 08:15, Michal Hocko wrote: > > On Tue 20-10-20 16:19:06, Guilherme G. Piccoli wrote: > >> On 20/10/2020 05:20, Michal Hocko wrote: > >>> > >>> Yes zeroying is quite costly and that is to be expected when the feature > >>> is enabled. Hugetlb like other allocator users perform their own > >>> initialization rather than go through __GFP_ZERO path. More on that > >>> below. > >>> > >>> Could you be more specific about why this is a problem. Hugetlb pool is > >>> usualy preallocatd once during early boot. 24s for 65GB of 2MB pages > >>> is non trivial amount of time but it doens't look like a major disaster > >>> either. If the pool is allocated later it can take much more time due to > >>> memory fragmentation. > >>> > >>> I definitely do not want to downplay this but I would like to hear about > >>> the real life examples of the problem. > >> > >> Indeed, 24s of delay (!) is not so harmful for boot time, but...64G was > >> just my simple test in a guest, the real case is much worse! It aligns > >> with Mike's comment, we have complains of minute-like delays, due to a > >> very big pool of hugepages being allocated. > > > > The cost of page clearing is mostly a constant overhead so it is quite > > natural to see the time scaling with the number of pages. That overhead > > has to happen at some point of time. Sure it is more visible when > > allocating during boot time resp. when doing pre-allocation during > > runtime. The page fault path would be then faster. The overhead just > > moves to a different place. So I am not sure this is really a strong > > argument to hold. > > We have people complaining that starting VMs backed by hugetlbfs takes > too long, they would much rather have that initialization be done when > booting the hypervisor ... I can imagine. Everybody would love to have a free lunch ;) But more seriously, the overhead of the initialization is unavoidable. The memory has to be zeroed out by definition and somebody has to pay for that. Sure one can think of a deferred context to do that but this just spreads the overhead out to the overall system overhead. Even if the zeroying is done during the allocation time then it is the first user who can benefit from that. Any reuse of the hugetlb pool has to reinitialize again. One can still be creative - e.g. prefault hugetlb files from userspace in parallel and reuse that but I am not sure kernel should try to be clever here. -- Michal Hocko SUSE Labs