From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 784FEC433DB for ; Thu, 25 Feb 2021 11:02:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CCDE864F0F for ; Thu, 25 Feb 2021 11:02:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CCDE864F0F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E72C68D0015; Thu, 25 Feb 2021 06:02:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E23C48D0005; Thu, 25 Feb 2021 06:02:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D11AC8D0015; Thu, 25 Feb 2021 06:02:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id BAEA08D0005 for ; Thu, 25 Feb 2021 06:02:40 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 868C0DB50 for ; Thu, 25 Feb 2021 11:02:40 +0000 (UTC) X-FDA: 77856502080.14.1637594 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf15.hostedemail.com (Postfix) with ESMTP id C10D6A0009EC for ; Thu, 25 Feb 2021 11:02:38 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1614250959; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZpfN0ZQ9iWEOzvI2O0VTGo0oRjcfOgjQxJkaYXaBwvs=; b=HDB5YsZvIfzBT74UEpROuXbV7scs8atKlxTyO1l30J2nfKsaktPzqvQE6Y2EZb7lv/7cj0 JGWJZQ1F1h5wpl2oPEPA2wmxKuX9pnM8UR2RcyXE5tZBOls2HbIW5owSwMnTSH1+Ow728F YEu2kRCh45eRPgE0OCgu94N1OX2LWCk= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-216-1iDx4wjzOR6T8sH3X4svKA-1; Thu, 25 Feb 2021 06:02:35 -0500 X-MC-Unique: 1iDx4wjzOR6T8sH3X4svKA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 44717107ACC7; Thu, 25 Feb 2021 11:02:33 +0000 (UTC) Received: from [10.36.114.58] (ovpn-114-58.ams2.redhat.com [10.36.114.58]) by smtp.corp.redhat.com (Postfix) with ESMTP id 2F635772E0; Thu, 25 Feb 2021 11:02:30 +0000 (UTC) To: Zi Yan , linux-mm@kvack.org Cc: Matthew Wilcox , "Kirill A . Shutemov" , Roman Gushchin , Andrew Morton , Yang Shi , Michal Hocko , John Hubbard , Ralph Campbell , David Nellans , Jason Gunthorpe , David Rientjes , Vlastimil Babka , Mike Kravetz , Song Liu References: <20210224223536.803765-1-zi.yan@sent.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [RFC PATCH v3 00/49] 1GB PUD THP support on x86_64 Message-ID: Date: Thu, 25 Feb 2021 12:02:29 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.0 MIME-Version: 1.0 In-Reply-To: <20210224223536.803765-1-zi.yan@sent.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=david@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C10D6A0009EC X-Stat-Signature: kyrn6ke7yqifynta6rxce87ox4xsgqpj Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1614250958-136822 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 24.02.21 23:35, Zi Yan wrote: > From: Zi Yan >=20 > Hi all, >=20 > I have rebased my 1GB PUD THP support patches on v5.11-mmotm-2021-02-18= -18-29 > and the code is available at > https://github.com/x-y-z/linux-1gb-thp/tree/1gb_thp_v5.11-mmotm-2021-02= -18-18-29 > if you want to give it a try. The actual 49 patches are not sent out wi= th this > cover letter. :) >=20 > Instead of asking for code review, I would like to discuss on the conce= rns I got > from previous RFCs. I think there are two major ones: >=20 > 1. 1GB page allocation. Current implementation allocates 1GB pages from= CMA > regions that are reserved at boot time like hugetlbfs. The concerns= on > using CMA is that an educated guess is needed to avoid depleting ke= rnel > memory in case CMA regions are set too large. Recently David Rientj= es > proposes to use process_madvise() for hugepage collapse, which is a= n > alternative [1] but might not work for 1GB pages, since there is no= way of I see two core ideas of THP: 1) Transparent to the user: you get speedup without really caring=20 *except* having to enable/disable the optimization sometimes manually=20 (i.e., MADV_HUGEPAGE) - because in corner cases (e.g., userfaultfd),=20 it's not completely transparent and might have performance impacts.=20 mprotect(), mmap(MAP_FIXED), mremap() work as expected. 2) Transparent to other subsystems of the kernel: the page size of the=20 mapping is in base pages - we can split anytime on demand in case we=20 cannot handle THP. In addition, no special requirements: no CMA, no=20 movability restrictions, no swappability restrictions, ... most stuff=20 works transparently by splitting. Your current approach messes with 2). Your proposal here messes with 1). Any kind of explicit placement by the user can silently get reverted any=20 time. So process_madvise() would really only be useful in cases where a=20 temporary split might get reverted later on by the os automatically -=20 like we have for 2MB THP right now. So process_madvise() is less likely to help if the system won't try=20 collapsing automatically (more below). > _allocating_ a 1GB page to which collapse pages. I proposed a simil= ar > approach at LSF/MM 2019, generating physically contiguous memory af= ter pages > are allocated [2], which is usable for 1GB THPs. This approach does= in-place > huge page promotion thus does not require page allocation. I like the idea of forming a 1GB THP at a location where already=20 consecutive pages allow for it. It can be applied generically - and both=20 1) and 2) keep working as expected. Anytime there was a split, we can=20 retry forming a THP later. However, I don't follow how this is actually really feasible in big=20 scale. You could only ever collapse into a 1GB THP if you happen to have=20 1GB consecutive 2MB THP / 4k already. Sounds to me like this happens=20 when the stars align. --=20 Thanks, David / dhildenb