From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.1 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 153A4C433E0 for ; Tue, 9 Mar 2021 09:02:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7997364E75 for ; Tue, 9 Mar 2021 09:02:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7997364E75 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0B47C8D00CE; Tue, 9 Mar 2021 04:02:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8336B8D007F; Tue, 9 Mar 2021 04:02:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FAE38D00CE; Tue, 9 Mar 2021 04:02:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id 1E1938D007F for ; Tue, 9 Mar 2021 04:02:21 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DBF72180AD81D for ; Tue, 9 Mar 2021 09:02:20 +0000 (UTC) X-FDA: 77899744440.26.CD08C99 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf15.hostedemail.com (Postfix) with ESMTP id C0F76A0009FF for ; Tue, 9 Mar 2021 09:01:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1615280479; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hJrOIK90yeKd1J7ld5QNGZ0DUG9UcOp8ot5JeN+BY34=; b=GhrxYF+NErUo55Pmgm4Ixl2hMFhs7SVsa1QvYLW11Z+B/jdAKGAXJ2bBgGeqpG+ukOai3S uPapNnVIwwJx/DiuPOjAOpJZHKIzPq2CqoDZFzqAsmZR8LFdkr2C2O6iBEdMhXHcu/XRSM 9ycEpZq5222z468uljrmbBq+s/I4xQQ= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-44-tcUkzYJVPNqQfuneIzZDmg-1; Tue, 09 Mar 2021 04:01:13 -0500 X-MC-Unique: tcUkzYJVPNqQfuneIzZDmg-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 1FCB81005D4A; Tue, 9 Mar 2021 09:01:12 +0000 (UTC) Received: from [10.36.114.143] (ovpn-114-143.ams2.redhat.com [10.36.114.143]) by smtp.corp.redhat.com (Postfix) with ESMTP id 4A90759458; Tue, 9 Mar 2021 09:01:10 +0000 (UTC) To: Mike Kravetz , linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Oscar Salvador , Zi Yan , David Rientjes , Andrew Morton References: <20210309001855.142453-1-mike.kravetz@oracle.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [RFC PATCH 0/3] hugetlb: add demote/split page functionality Message-ID: <29cb78c5-4fca-0f0a-c603-0c75f9f50d05@redhat.com> Date: Tue, 9 Mar 2021 10:01:09 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.8.0 MIME-Version: 1.0 In-Reply-To: <20210309001855.142453-1-mike.kravetz@oracle.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C0F76A0009FF X-Stat-Signature: m9a5jemxz4ikw4iwuk97xkwbxhuxhhgx Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615280479-242860 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 09.03.21 01:18, Mike Kravetz wrote: > The concurrent use of multiple hugetlb page sizes on a single system > is becoming more common. One of the reasons is better TLB support for > gigantic page sizes on x86 hardware. In addition, hugetlb pages are > being used to back VMs in hosting environments. >=20 > When using hugetlb pages to back VMs in such environments, it is > sometimes desirable to preallocate hugetlb pools. This avoids the dela= y > and uncertainty of allocating hugetlb pages at VM startup. In addition= , > preallocating huge pages minimizes the issue of memory fragmentation th= at > increases the longer the system is up and running. >=20 > In such environments, a combination of larger and smaller hugetlb pages > are preallocated in anticipation of backing VMs of various sizes. Over > time, the preallocated pool of smaller hugetlb pages may become > depleted while larger hugetlb pages still remain. In such situations, > it may be desirable to convert larger hugetlb pages to smaller hugetlb > pages. >=20 > Converting larger to smaller hugetlb pages can be accomplished today by > first freeing the larger page to the buddy allocator and then allocatin= g > the smaller pages. However, there are two issues with this approach: > 1) This process can take quite some time, especially if allocation of > the smaller pages is not immediate and requires migration/compactio= n. > 2) There is no guarantee that the total size of smaller pages allocated > will match the size of the larger page which was freed. This is > because the area freed by the larger page could quickly be > fragmented. >=20 > To address these issues, introduce the concept of hugetlb page demotion= . > Demotion provides a means of 'in place' splitting a hugetlb page to > pages of a smaller size. For example, on x86 one 1G page can be > demoted to 512 2M pages. Page demotion is controlled via sysfs files. > - demote_size Read only target page size for demotion > - demote Writable number of hugetlb pages to be demoted >=20 > Only hugetlb pages which are free at the time of the request can be dem= oted. > Demotion does not add to the complexity surplus pages. Demotion also h= onors > reserved huge pages. Therefore, when a value is written to the sysfs d= emote > file that value is only the maximum number of pages which will be demot= ed. > It is possible fewer will actually be demoted. >=20 > If demote_size is PAGESIZE, demote will simply free pages to the buddy > allocator. With the vmemmap optimizations you will have to rework the vmemmap=20 layout. How is that handled? Couldn't it happen that you are half-way=20 through splitting a PUD into PMDs when you realize that you cannot=20 allocate vmemmap pages for properly handling the remaining PMDs? What=20 would happen then? Or are you planning on making both features mutually exclusive? Of course, one approach would be first completely restoring the vmemmap=20 for the whole PUD (allocating more pages than necessary in the end) and=20 then freeing individual pages again when optimizing the layout per PMD. --=20 Thanks, David / dhildenb