From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE173C433E6 for ; Tue, 2 Feb 2021 13:14:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6895F64EDA for ; Tue, 2 Feb 2021 13:14:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6895F64EDA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C20DD6B006C; Tue, 2 Feb 2021 08:14:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BD1CA6B006E; Tue, 2 Feb 2021 08:14:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE71E6B0070; Tue, 2 Feb 2021 08:14:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0131.hostedemail.com [216.40.44.131]) by kanga.kvack.org (Postfix) with ESMTP id 9610F6B006C for ; Tue, 2 Feb 2021 08:14:36 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5FC731EF1 for ; Tue, 2 Feb 2021 13:14:36 +0000 (UTC) X-FDA: 77773372152.25.voice41_53150f1275cb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id DA0C01804E507 for ; Tue, 2 Feb 2021 13:14:30 +0000 (UTC) X-HE-Tag: voice41_53150f1275cb X-Filterd-Recvd-Size: 7132 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Tue, 2 Feb 2021 13:14:30 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1612271669; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TzlZq29k5DxcO121v0em/werHdNKON5M2NKDPpBxCdU=; b=OPPtcnxIdyCjRncwVvFcJrUscPz1+YjBsw6PM/IUFVx/YBz5ThXDG5UOct6wdjrysLEsXQ Bfa7gDetJ4Vrn1wXtqoWchG1aztglNVsZIkxyX6xDkc0lbJAKONwGb4aXCGhScnrTCycgw /WWdPHb4CzGlPKN+Qhs29w736BeeTRI= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-267-YjtjDQfPMTG3ZZy6csP2bQ-1; Tue, 02 Feb 2021 08:14:25 -0500 X-MC-Unique: YjtjDQfPMTG3ZZy6csP2bQ-1 Received: from smtp.corp.redhat.com (int-mx01.intmail.prod.int.phx2.redhat.com [10.5.11.11]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 38FC69CDA0; Tue, 2 Feb 2021 13:14:20 +0000 (UTC) Received: from [10.36.114.148] (ovpn-114-148.ams2.redhat.com [10.36.114.148]) by smtp.corp.redhat.com (Postfix) with ESMTP id 463FE1F0; Tue, 2 Feb 2021 13:14:10 +0000 (UTC) To: Mike Rapoport , Michal Hocko Cc: James Bottomley , Andrew Morton , Alexander Viro , Andy Lutomirski , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Christopher Lameter , Dan Williams , Dave Hansen , Elena Reshetova , "H. Peter Anvin" , Ingo Molnar , "Kirill A. Shutemov" , Matthew Wilcox , Mark Rutland , Mike Rapoport , Michael Kerrisk , Palmer Dabbelt , Paul Walmsley , Peter Zijlstra , Rick Edgecombe , Roman Gushchin , Shakeel Butt , Shuah Khan , Thomas Gleixner , Tycho Andersen , Will Deacon , linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-nvdimm@lists.01.org, linux-riscv@lists.infradead.org, x86@kernel.org, Hagen Paul Pfeifer , Palmer Dabbelt References: <20210121122723.3446-8-rppt@kernel.org> <20210126114657.GL827@dhcp22.suse.cz> <303f348d-e494-e386-d1f5-14505b5da254@redhat.com> <20210126120823.GM827@dhcp22.suse.cz> <20210128092259.GB242749@kernel.org> <73738cda43236b5ac2714e228af362b67a712f5d.camel@linux.ibm.com> <6de6b9f9c2d28eecc494e7db6ffbedc262317e11.camel@linux.ibm.com> <20210202124857.GN242749@kernel.org> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [PATCH v16 07/11] secretmem: use PMD-size pages to amortize direct map fragmentation Message-ID: <6653288a-dd02-f9de-ef6a-e8d567d71d53@redhat.com> Date: Tue, 2 Feb 2021 14:14:09 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <20210202124857.GN242749@kernel.org> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.11 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02.02.21 13:48, Mike Rapoport wrote: > On Tue, Feb 02, 2021 at 10:35:05AM +0100, Michal Hocko wrote: >> On Mon 01-02-21 08:56:19, James Bottomley wrote: >> >> I have also proposed potential ways out of this. Either the pool is no= t >> fixed sized and you make it a regular unevictable memory (if direct ma= p >> fragmentation is not considered a major problem) >=20 > I think that the direct map fragmentation is not a major problem, and t= he > data we have confirms it, so I'd be more than happy to entirely drop th= e > pool, allocate memory page by page and remove each page from the direct > map. >=20 > Still, we cannot prove negative and it could happen that there is a > workload that would suffer a lot from the direct map fragmentation, so > having a pool of large pages upfront is better than trying to fix it > afterwards. As we get more confidence that the direct map fragmentation= is > not an issue as it is common to believe we may remove the pool altogeth= er. >=20 > I think that using PMD_ORDER allocations for the pool with a fallback t= o > order 0 will do the job, but unfortunately I doubt we'll reach a consen= sus > about this because dogmatic beliefs are hard to shake... >=20 > A more restrictive possibility is to still use plain PMD_ORDER allocati= ons > to fill the pool, without relying on CMA. In this case there will be no > global secretmem specific pool to exhaust, but then it's possible to dr= ain > high order free blocks in a system, so CMA has an advantage of limiting > secretmem pools to certain amount of memory with somewhat higher > probability for high order allocation to succeed. I am not really concerned about fragmenting/breaking up the direct map=20 as long as the feature has to be explicitly enabled (similar to=20 fragmenting the vmemmap). As already expressed, I dislike allowing user space to consume an=20 unlimited number unmovable/unmigratable allocations. We already have=20 that in some cases with huge pages (when the arch does not support=20 migration) - but there we can at least manage the consumption using the=20 whole max/reserved/free/... infrastructure. In addition, adding arch=20 support for migration shouldn't be too complicated. The idea of using CMA is quite good IMHO, because there we can locally=20 limit the direct map fragmentation and don't have to bother about=20 migration at all. We own the area, so we can place as many unmovable=20 allocations on it as we can fit. But it sounds like, we would also need some kind of reservation=20 mechanism in either scenario (CMA vs. no CMA). If we don't want to go full-circle on max/reserved/free/..., allowing=20 for migration of secretmem pages would make sense. Then, these pages=20 become "less special". Map source, copy, unmap destination. The security=20 implementations are the ugly part. I wonder if we could temporarily map=20 somewhere else, so avoiding to touch the direct map during migration. --=20 Thanks, David / dhildenb