From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4578C433FE for ; Fri, 11 Dec 2020 10:53:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 22FFB23F59 for ; Fri, 11 Dec 2020 10:53:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 22FFB23F59 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 26D086B0068; Fri, 11 Dec 2020 05:53:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 21F686B006C; Fri, 11 Dec 2020 05:53:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 136406B006E; Fri, 11 Dec 2020 05:53:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0063.hostedemail.com [216.40.44.63]) by kanga.kvack.org (Postfix) with ESMTP id F07EA6B0068 for ; Fri, 11 Dec 2020 05:52:59 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id BB7A53630 for ; Fri, 11 Dec 2020 10:52:59 +0000 (UTC) X-FDA: 77580688878.24.ray56_5f132d727400 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin24.hostedemail.com (Postfix) with ESMTP id A45171A4A0 for ; Fri, 11 Dec 2020 10:52:59 +0000 (UTC) X-HE-Tag: ray56_5f132d727400 X-Filterd-Recvd-Size: 7433 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Fri, 11 Dec 2020 10:52:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1607683978; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TXurjSyLXKw2xa9OyxPxoe1zqGHOuPhW6NbKIaV943Q=; b=QvZfNAOF7pEGASn+Wx5Vox7MButt7boVyO7vfrxTe1DLZplGKYv0QIiBMa025o8CVrog87 vByc8YjrWAHYqowQLIsehpx/KSQjLxX5Cbugpjhd6PdWEi2u7ZW+ScLSDQazew9qnIBBeP qAZmTbHrCxHfBi2Ry8814PB9HUXRQzY= Received: from mail-wm1-f71.google.com (mail-wm1-f71.google.com [209.85.128.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-229-ZpHNoaUBNH-K4T8BUpfWbw-1; Fri, 11 Dec 2020 05:52:56 -0500 X-MC-Unique: ZpHNoaUBNH-K4T8BUpfWbw-1 Received: by mail-wm1-f71.google.com with SMTP id l5so1575478wmi.4 for ; Fri, 11 Dec 2020 02:52:55 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:content-transfer-encoding:from:mime-version :subject:date:message-id:references:cc:in-reply-to:to; bh=s9EPJ7Qq439cY6gaz05/+mPgjwbl998XBfiZOOYsa7I=; b=uZOS7J9fNE/W6dFMHE/gMxRMI0fk2hMA9Nr1OUsJ6bD7R0rq1mbw/LZTQ1dMk7LpRo ZFuqyYTb1ZQ81GD+s43hUbU7o4k6KTv9xz4MM1xHqOhKneUCKNBFBipwbLNFU/iwLr5k /yKOUlg6s+NA0VNpgWiI4JzgmxTpOPBVIP+FkGA2j5zQ+XRLl+iYjrvPDmTeKt0Eql+1 2+FuksZ8DvkYoDTe/I5mJz0jWTu3bUzFaWmiZpGsdNMizSJGnKS46aLsG74lpvrQFpk+ 6chMO7sP7seKdM08NSf57MJsX4AjOEVSjSWMXAGfgqCNRxawMOwj4PbgVFYYw7sgMPOS Xzvg== X-Gm-Message-State: AOAM531yP/LaiMfRSbChhpkcaY7G0MELk7UASRyhzsPSwblH3kRIdt3k 0thCwqnL+ygtruaAdVm3BCPvJN8STkF1KmKakNVVf7onQcco3041yI6r2KTjsMx46NagDL0M1W7 CBAOQPzGddGo= X-Received: by 2002:adf:ef06:: with SMTP id e6mr6809770wro.231.1607683974850; Fri, 11 Dec 2020 02:52:54 -0800 (PST) X-Google-Smtp-Source: ABdhPJzFcKb0mOOclCNzhWxkpp/5Sl0tFKIUVIZ6wIxiklr6aH4W3vgHxKbQ3rU80awumj0x9sInUQ== X-Received: by 2002:adf:ef06:: with SMTP id e6mr6809722wro.231.1607683974584; Fri, 11 Dec 2020 02:52:54 -0800 (PST) Received: from [192.168.3.114] (p4ff23c7c.dip0.t-ipconnect.de. [79.242.60.124]) by smtp.gmail.com with ESMTPSA id m81sm14926760wmf.29.2020.12.11.02.52.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 11 Dec 2020 02:52:53 -0800 (PST) From: David Hildenbrand Mime-Version: 1.0 (1.0) Subject: Re: [PATCH v8 06/12] mm/hugetlb: Allocate the vmemmap pages associated with each HugeTLB page Date: Fri, 11 Dec 2020 11:52:52 +0100 Message-Id: <58B0C89E-DD34-4D59-83A4-5DAAF0D617AE@redhat.com> References: <20201211093517.GA22210@linux> Cc: Muchun Song , corbet@lwn.net, mike.kravetz@oracle.com, tglx@linutronix.de, mingo@redhat.com, bp@alien8.de, x86@kernel.org, hpa@zytor.com, dave.hansen@linux.intel.com, luto@kernel.org, peterz@infradead.org, viro@zeniv.linux.org.uk, akpm@linux-foundation.org, paulmck@kernel.org, mchehab+huawei@kernel.org, pawan.kumar.gupta@linux.intel.com, rdunlap@infradead.org, oneukum@suse.com, anshuman.khandual@arm.com, jroedel@suse.de, almasrymina@google.com, rientjes@google.com, willy@infradead.org, mhocko@suse.com, song.bao.hua@hisilicon.com, david@redhat.com, duanxiongchun@bytedance.com, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org In-Reply-To: <20201211093517.GA22210@linux> To: Oscar Salvador X-Mailer: iPhone Mail (18B92) Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > Am 11.12.2020 um 10:35 schrieb Oscar Salvador : >=20 > =EF=BB=BFOn Thu, Dec 10, 2020 at 11:55:20AM +0800, Muchun Song wrote: >> When we free a HugeTLB page to the buddy allocator, we should allocate t= he >> vmemmap pages associated with it. We can do that in the __free_hugepage(= ) > "vmemmap pages that describe the range" would look better to me, but it i= s ok. >=20 >> +#define GFP_VMEMMAP_PAGE \ >> + (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_HIGH | __GFP_NOWARN) >>=20 >> #ifndef VMEMMAP_HPAGE_SHIFT >> #define VMEMMAP_HPAGE_SHIFT HPAGE_SHIFT >> @@ -197,6 +200,11 @@ >> (__boundary - 1 < (end) - 1) ? __boundary : (end); \ >> }) >>=20 >> +typedef void (*vmemmap_remap_pte_func_t)(struct page *reuse, pte_t *pte= , >> + unsigned long start, unsigned long end, >> + void *priv); >=20 > Any reason to not have defined GFP_VMEMMAP_PAGE and the new typedef into > hugetlb_vmemmap.h? >=20 >=20 >> +static void vmemmap_restore_pte_range(struct page *reuse, pte_t *pte, >> + unsigned long start, unsigned long end, >> + void *priv) >> +{ >> + pgprot_t pgprot =3D PAGE_KERNEL; >> + void *from =3D page_to_virt(reuse); >> + unsigned long addr; >> + struct list_head *pages =3D priv; > [...] >> + >> + /* >> + * Make sure that any data that writes to the @to is made >> + * visible to the physical page. >> + */ >> + flush_kernel_vmap_range(to, PAGE_SIZE); >=20 > Correct me if I am wrong, but flush_kernel_vmap_range is a NOOP under arc= hes which > do not have ARCH_HAS_FLUSH_KERNEL_DCACHE_PAGE. > Since we only enable support for x86_64, and x86_64 is one of those arche= s, > could we remove this, and introduced later on in case we enable this feat= ure > on an arch that needs it? >=20 > I am not sure if you need to flush the range somehow, as you did in > vmemmap_remap_range. >=20 >> +retry: >> + page =3D alloc_page(GFP_VMEMMAP_PAGE); >> + if (unlikely(!page)) { >> + msleep(100); >> + /* >> + * We should retry infinitely, because we cannot >> + * handle allocation failures. Once we allocate >> + * vmemmap pages successfully, then we can free >> + * a HugeTLB page. >> + */ >> + goto retry; >=20 > I think this is the trickiest part. > With 2MB HugeTLB pages we only need 6 pages, but with 1GB, the number of = pages > we need to allocate increases significantly (4088 pages IIRC). > And you are using __GFP_HIGH, which will allow us to use more memory (by > cutting down the watermark), but it might lead to putting the system > on its knees wrt. memory. > And yes, I know that once we allocate the 4088 pages, 1GB gets freed, but > still. Similar to memory hotplug, no? I don=E2=80=98t think this is really an issu= e that cannot be mitigated. Yeah, we might want to tweak allocation flags. >=20 > I would like to hear Michal's thoughts on this one, but I wonder if it ma= kes > sense to not let 1GB-HugeTLB pages be freed. >=20 > --=20 > Oscar Salvador > SUSE L3 >=20