From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D4D59C2D0E4 for ; Tue, 24 Nov 2020 13:14:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 242F2206F9 for ; Tue, 24 Nov 2020 13:14:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=suse.com header.i=@suse.com header.b="Pc/Opv3a" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 242F2206F9 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3BB606B00CB; Tue, 24 Nov 2020 08:14:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 36AAB6B00CC; Tue, 24 Nov 2020 08:14:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 259496B00CD; Tue, 24 Nov 2020 08:14:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0211.hostedemail.com [216.40.44.211]) by kanga.kvack.org (Postfix) with ESMTP id 0D1E26B00CB for ; Tue, 24 Nov 2020 08:14:40 -0500 (EST) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B8C50180AD806 for ; Tue, 24 Nov 2020 13:14:39 +0000 (UTC) X-FDA: 77519356278.09.guide02_0a17efd2736e Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin09.hostedemail.com (Postfix) with ESMTP id 93B9C180AD802 for ; Tue, 24 Nov 2020 13:14:39 +0000 (UTC) X-HE-Tag: guide02_0a17efd2736e X-Filterd-Recvd-Size: 5421 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Tue, 24 Nov 2020 13:14:38 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1606223677; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=HqT93hPnj0lZKcD5jf1u3wzpiTA2s4b6ubLt/KHbxM8=; b=Pc/Opv3aa3WAuV7rrBCPOrdnlDjYA+E02bHsIe2iDUefcYRCH3QKFA3INs1m1E2VAPmWPE gXPJqsGzDZgiR1xyfkGlCbkWPTYhfAvLkopDie6KOSc5TSbB6K667GlWC+e5/P8E+eYnx3 VTkZ4ROgmeNsTrcATunjXRl0UTG+JK8= Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id 96937AC2D; Tue, 24 Nov 2020 13:14:37 +0000 (UTC) Date: Tue, 24 Nov 2020 14:14:36 +0100 From: Michal Hocko To: Muchun Song Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Subject: Re: [External] Re: [PATCH v6 09/16] mm/hugetlb: Defer freeing of HugeTLB pages Message-ID: <20201124131436.GX27488@dhcp22.suse.cz> References: <20201124095259.58755-1-songmuchun@bytedance.com> <20201124095259.58755-10-songmuchun@bytedance.com> <20201124115109.GW27488@dhcp22.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 24-11-20 20:45:30, Muchun Song wrote: > On Tue, Nov 24, 2020 at 7:51 PM Michal Hocko wrote: > > > > On Tue 24-11-20 17:52:52, Muchun Song wrote: > > > In the subsequent patch, we will allocate the vmemmap pages when free > > > HugeTLB pages. But update_and_free_page() is called from a non-task > > > context(and hold hugetlb_lock), so we can defer the actual freeing in > > > a workqueue to prevent use GFP_ATOMIC to allocate the vmemmap pages. > > > > This has been brought up earlier without any satisfying answer. Do we > > really have bother with the freeing from the pool and reconstructing the > > vmemmap page tables? Do existing usecases really require such a dynamic > > behavior? In other words, wouldn't it be much simpler to allow to use > > If someone wants to free a HugeTLB page, there is no way to do that if we > do not allow this behavior. Right. The question is how much that matters for the _initial_ feature submission. Is this restriction so important that it would render it unsuable? > When do we need this? On our server, we will > allocate a lot of HugeTLB pages for SPDK or virtualization. Sometimes, > we want to debug some issues and want to apt install some debug tools, > but if the host has little memory and the install operation can be failed > because of no memory. In this time, we can try to free some HugeTLB > pages to buddy in order to continue debugging. So maybe we need this. Or maybe you can still allocate hugetlb pages for debugging in runtime and try to free those when you need to. > > hugetlb pages with sparse vmemmaps only for the boot time reservations > > and never allow them to be freed back to the allocator. This is pretty > > restrictive, no question about that, but it would drop quite some code > > Yeah, if we do not allow freeing the HugeTLB page to buddy, it actually > can drop some code. But I think that it only drop this one and next one > patch. It seems not a lot. And if we drop this patch, we need to add some > another code to do the boot time reservations and other code to disallow > freeing HugeTLB pages. you need a per hugetlb page flag to note the sparse vmemmap anyway so the freeing path should be a trivial check for the flag. Early boot reservation. Special casing for the early boot reservation shouldn't be that hard either. But I haven't checked closely. > So why not support freeing now. Because it adds some non trivial challenges which would be better to deal with with a stable and tested and feature limited implementation. The most obvious one is the problem with vmemmap allocations when freeing hugetlb page. Others like vmemmap manipulation is quite some code but no surprises. Btw. that should be implemented in vmemmap proper and ready for other potential users. But this is a minor detail. -- Michal Hocko SUSE Labs