From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25A96C433F5 for ; Mon, 21 Mar 2022 14:38:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A60A96B0071; Mon, 21 Mar 2022 10:38:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A102E6B0072; Mon, 21 Mar 2022 10:38:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D7436B0074; Mon, 21 Mar 2022 10:38:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0041.hostedemail.com [216.40.44.41]) by kanga.kvack.org (Postfix) with ESMTP id 7A7AE6B0071 for ; Mon, 21 Mar 2022 10:38:02 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2D1719F5F7 for ; Mon, 21 Mar 2022 14:38:02 +0000 (UTC) X-FDA: 79268648004.28.9898CBB Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf13.hostedemail.com (Postfix) with ESMTP id 8300720019 for ; Mon, 21 Mar 2022 14:38:01 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out1.suse.de (Postfix) with ESMTP id 5275A210DF; Mon, 21 Mar 2022 14:38:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1647873480; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OPwIZKapCuZ8gXS3tOFV8Rqk0jduDWdbqMITNzVFL+E=; b=s9yhl6O2CxtXkI8TyPezfkC1Xa+vViZ+4XFgW7Hr1U4M/q8FPJ2a+Age5uNfwQJVA+yOgH qC+8lPNuFwiGRuEujD1GJAGpzg+tS3ocStGnIyRnaVkiJFaEA2sF4a6k04cVjcTcnuGw7P E8mPXQBCVGCJU67xEJzSol768CbVUO8= Received: from suse.cz (unknown [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 1301BA3B81; Mon, 21 Mar 2022 14:38:00 +0000 (UTC) Date: Mon, 21 Mar 2022 15:37:59 +0100 From: Michal Hocko To: Zach O'Keefe Cc: Alex Shi , David Hildenbrand , David Rientjes , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , Yang Shi Subject: Re: [RFC PATCH 00/14] mm: userspace hugepage collapse Message-ID: References: <20220308213417.1407042-1-zokeefe@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 8300720019 X-Stat-Signature: mr94t78durtr7cs1y51uhrocdzho7bom Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=s9yhl6O2; spf=pass (imf13.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Rspam-User: X-HE-Tag: 1647873481-846414 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: [ Removed Richard Henderson from the CC list as the delivery fails for his address] On Tue 08-03-22 13:34:03, Zach O'Keefe wrote: > Introduction > -------------------------------- >=20 > This series provides a mechanism for userspace to induce a collapse of > eligible ranges of memory into transparent hugepages in process context= , > thus permitting users to more tightly control their own hugepage > utilization policy at their own expense. >=20 > This idea was previously introduced by David Rientjes, and thanks to > everyone for your patience while I prepared these patches resulting fro= m > that discussion[1]. >=20 > [1] https://lore.kernel.org/all/C8C89F13-3F04-456B-BA76-DE2C378D30BF@nv= idia.com/ >=20 > Interface > -------------------------------- >=20 > The proposed interface adds a new madvise(2) mode, MADV_COLLAPSE, and > leverages the new process_madvise(2) call. >=20 > (*) process_madvise(2) >=20 > Performs a synchronous collapse of the native pages mapped by > the list of iovecs into transparent hugepages. The default gfp > flags used will be the same as those used at-fault for the VMA > region(s) covered. Could you expand on reasoning here? The default allocation mode for #PF is rather light. Madvised will try harder. The reasoning is that we want to make stalls due to #PF as small as possible and only try harder for madvised areas (also a subject of configuration). Wouldn't it make more sense to try harder for an explicit calls like madvise? > When multiple VMA regions are spanned, if > faulting-in memory from any VMA would permit synchronous > compaction and reclaim, then all hugepage allocations required > to satisfy the request may enter compaction and reclaim. I am not sure I follow here. Let's have a memory range spanning two vmas, one with MADV_HUGEPAGE. > Diverging from the at-fault semantics, VM_NOHUGEPAGE is ignored > by default, as the user is explicitly requesting this action. > Define two flags to control collapse semantics, passed through > process_madvise(2)=E2=80=99s optional flags parameter: This part is discussed later in the thread. >=20 > MADV_F_COLLAPSE_LIMITS >=20 > If supplied, collapse respects pte collapse limits set via > sysfs: > /transparent_hugepage/khugepaged/max_ptes_[none|swap|shared]. > Required if calling on behalf of another process and not > CAP_SYS_ADMIN. >=20 > MADV_F_COLLAPSE_DEFRAG >=20 > If supplied, permit synchronous compaction and reclaim, > regardless of VMA flags. Why do we need this? --=20 Michal Hocko SUSE Labs