From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32521C56202 for ; Fri, 20 Nov 2020 11:57:13 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8737B22272 for ; Fri, 20 Nov 2020 11:57:12 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=bytedance-com.20150623.gappssmtp.com header.i=@bytedance-com.20150623.gappssmtp.com header.b="zogu4RFW" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8737B22272 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8A1526B0036; Fri, 20 Nov 2020 06:57:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 829A66B005C; Fri, 20 Nov 2020 06:57:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CDD26B005D; Fri, 20 Nov 2020 06:57:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0005.hostedemail.com [216.40.44.5]) by kanga.kvack.org (Postfix) with ESMTP id 310316B0036 for ; Fri, 20 Nov 2020 06:57:11 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B8377180AD837 for ; Fri, 20 Nov 2020 11:57:10 +0000 (UTC) X-FDA: 77504645820.18.grass25_4d02cc02734b Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 967E5100ED0D0 for ; Fri, 20 Nov 2020 11:57:10 +0000 (UTC) X-HE-Tag: grass25_4d02cc02734b X-Filterd-Recvd-Size: 7262 Received: from mail-pg1-f195.google.com (mail-pg1-f195.google.com [209.85.215.195]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Fri, 20 Nov 2020 11:57:09 +0000 (UTC) Received: by mail-pg1-f195.google.com with SMTP id v21so7120369pgi.2 for ; Fri, 20 Nov 2020 03:57:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0HATfyO4gy8O0ZQ5jHHmrKLiQEgDtfHpD05h/4ZZGaU=; b=zogu4RFWxSnWMmjlhO9tUVIrrKIEj4I7+aL54B7p9OPIs7mCTMKOZwK40HxeC+lf2v dwN5Q45vjBrAU1/Vw9Xbg3OZVBOHAckhT5wBgjXGy4/Pokx2i+hu7DHSGN9MKASyva1d FAga+WRJNW+vB9Blp7uYa698KvQ0hoDt3o7QWCFAGCAHLj/7my4NHkPbNbvdRzTWWDqX Ncs4VLgrgx21N2ZKT7kT/leyiSc+F44wkB1ECS3oiDwTe35zhiAiOwsF1rdIu/dQb2nx M+3VsB40Dcie/hgrZCj74fZxcmP5GFuBzP2IY3z+IxZB8G30DWyBMRUMDBufUrYUzeK3 03FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0HATfyO4gy8O0ZQ5jHHmrKLiQEgDtfHpD05h/4ZZGaU=; b=YXnuPwfpqbouBgKegdmEKQ8RJrWNf/JZAFD/J8NWhAJpTpsSAvNGX5pIFLVj944HDY GSNKZXIlSdnQpgqOFwEB4elkURVPbfzT6l4BFm2rJTDlny6dKYiotSywe04btNwaADir hQ+LyKS32v4kw7qOvLftpIfSWNHwFatuXY605ZOab8nbmrVqSHnLGfdW9RWzNiqhXZ/H 9GVXV4+f/j/fafLoNr4nGEPoSoYLKTd9vm98TlHNReL7y0OK7c+HawbPYv9l+/gsGHLe UkVN8s1v7kOsQrEtPkLRd4KdNDAFZC+zWynlkQJO1RWwNPOhUVgySFfH7BtRxVfwNv+4 x+Qw== X-Gm-Message-State: AOAM5318oIbzRCu6uOHt+0VSOLkvJaH+tlzhUVDt+ru2BTjGJy8TGfqY 8mMCoq0ULWEgiCfnt94Td91YziB9/Vfek3VDjIcHhw== X-Google-Smtp-Source: ABdhPJysLesbqaepefZ43Fir9e7ymcgRNMVnacBD4NCUo1vSVUIHvrqwftTyZFDYpyODPgqNoFIyRqu2S2WC+mjZgq4= X-Received: by 2002:a63:594a:: with SMTP id j10mr16411186pgm.341.1605873428424; Fri, 20 Nov 2020 03:57:08 -0800 (PST) MIME-Version: 1.0 References: <20201120064325.34492-1-songmuchun@bytedance.com> <20201120064325.34492-12-songmuchun@bytedance.com> <20201120081123.GC3200@dhcp22.suse.cz> <20201120092826.GL3200@dhcp22.suse.cz> <20201120111033.GN3200@dhcp22.suse.cz> In-Reply-To: <20201120111033.GN3200@dhcp22.suse.cz> From: Muchun Song Date: Fri, 20 Nov 2020 19:56:25 +0800 Message-ID: Subject: Re: [External] Re: [PATCH v5 11/21] mm/hugetlb: Allocate the vmemmap pages associated with each hugetlb page To: Michal Hocko Cc: Jonathan Corbet , Mike Kravetz , Thomas Gleixner , mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, Peter Zijlstra , viro@zeniv.linux.org.uk, Andrew Morton , paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, Randy Dunlap , oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, Mina Almasry , David Rientjes , Matthew Wilcox , Oscar Salvador , "Song Bao Hua (Barry Song)" , Xiongchun duan , linux-doc@vger.kernel.org, LKML , Linux Memory Management List , linux-fsdevel Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Nov 20, 2020 at 7:10 PM Michal Hocko wrote: > > On Fri 20-11-20 17:37:09, Muchun Song wrote: > > On Fri, Nov 20, 2020 at 5:28 PM Michal Hocko wrote: > > > > > > On Fri 20-11-20 16:51:59, Muchun Song wrote: > > > > On Fri, Nov 20, 2020 at 4:11 PM Michal Hocko wrote: > > > > > > > > > > On Fri 20-11-20 14:43:15, Muchun Song wrote: > > > > > [...] > > > > > > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c > > > > > > index eda7e3a0b67c..361c4174e222 100644 > > > > > > --- a/mm/hugetlb_vmemmap.c > > > > > > +++ b/mm/hugetlb_vmemmap.c > > > > > > @@ -117,6 +117,8 @@ > > > > > > #define RESERVE_VMEMMAP_NR 2U > > > > > > #define RESERVE_VMEMMAP_SIZE (RESERVE_VMEMMAP_NR << PAGE_SHIFT) > > > > > > #define TAIL_PAGE_REUSE -1 > > > > > > +#define GFP_VMEMMAP_PAGE \ > > > > > > + (GFP_KERNEL | __GFP_NOFAIL | __GFP_MEMALLOC) > > > > > > > > > > This is really dangerous! __GFP_MEMALLOC would allow a complete memory > > > > > depletion. I am not even sure triggering the OOM killer is a reasonable > > > > > behavior. It is just unexpected that shrinking a hugetlb pool can have > > > > > destructive side effects. I believe it would be more reasonable to > > > > > simply refuse to shrink the pool if we cannot free those pages up. This > > > > > sucks as well but it isn't destructive at least. > > > > > > > > I find the instructions of __GFP_MEMALLOC from the kernel doc. > > > > > > > > %__GFP_MEMALLOC allows access to all memory. This should only be used when > > > > the caller guarantees the allocation will allow more memory to be freed > > > > very shortly. > > > > > > > > Our situation is in line with the description above. We will free a HugeTLB page > > > > to the buddy allocator which is much larger than that we allocated shortly. > > > > > > Yes that is a part of the description. But read it in its full entirety. > > > * %__GFP_MEMALLOC allows access to all memory. This should only be used when > > > * the caller guarantees the allocation will allow more memory to be freed > > > * very shortly e.g. process exiting or swapping. Users either should > > > * be the MM or co-ordinating closely with the VM (e.g. swap over NFS). > > > * Users of this flag have to be extremely careful to not deplete the reserve > > > * completely and implement a throttling mechanism which controls the > > > * consumption of the reserve based on the amount of freed memory. > > > * Usage of a pre-allocated pool (e.g. mempool) should be always considered > > > * before using this flag. > > > > > > GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_HIGH > > > > We want to free the HugeTLB page to the buddy allocator, but before that, > > we need to allocate some pages as vmemmap pages, so here we cannot > > handle allocation failures. > > Why cannot you simply refuse to shrink the pool size? > > > I think that we should replace the > > __GFP_RETRY_MAYFAIL to __GFP_NOFAIL. > > > > GFP_KERNEL | __GFP_NOFAIL | __GFP_HIGH > > > > This meets our needs here. Thanks. > > Please read again my concern about the disruptive behavior or explain > why it is desirable. OK, I will come up with a solution which does not use the __GFP_NOFAIL. Thanks. > > -- > Michal Hocko > SUSE Labs -- Yours, Muchun