From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 42AF3C433E0 for ; Thu, 18 Feb 2021 13:44:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 86CE96023C for ; Thu, 18 Feb 2021 13:44:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 86CE96023C Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E12436B0006; Thu, 18 Feb 2021 08:43:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D9C1C6B006C; Thu, 18 Feb 2021 08:43:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C64326B006E; Thu, 18 Feb 2021 08:43:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id AAD1A6B0006 for ; Thu, 18 Feb 2021 08:43:59 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6F5624DA8 for ; Thu, 18 Feb 2021 13:43:59 +0000 (UTC) X-FDA: 77831506998.12.9FB38DF Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf19.hostedemail.com (Postfix) with ESMTP id C9F0F90009E9 for ; Thu, 18 Feb 2021 13:43:56 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id AD3F3AF31; Thu, 18 Feb 2021 13:43:57 +0000 (UTC) To: Michal Hocko , David Rientjes Cc: Alex Shi , Hugh Dickins , Andrea Arcangeli , "Kirill A. Shutemov" , Song Liu , Matthew Wilcox , Minchan Kim , Chris Kennelly , linux-mm@kvack.org, linux-api@vger.kernel.org, David Hildenbrand References: From: Vlastimil Babka Subject: Re: [RFC] Hugepage collapse in process context Message-ID: <0b51a213-650e-7801-b6ed-9545466c15db@suse.cz> Date: Thu, 18 Feb 2021 14:43:57 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.7.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C9F0F90009E9 X-Stat-Signature: fw1ge1j3xtdkf99pc443fux3buxwjfin Received-SPF: none (suse.cz>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=mx2.suse.de; client-ip=195.135.220.15 X-HE-DKIM-Result: none/none X-HE-Tag: 1613655836-852874 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/17/21 9:21 AM, Michal Hocko wrote: > [Cc linux-api] >=20 > On Tue 16-02-21 20:24:16, David Rientjes wrote: >> Hi everybody, >>=20 >> Khugepaged is slow by default, it scans at most 4096 pages every 10s. = =20 >> That's normally fine as a system-wide setting, but some applications w= ould=20 >> benefit from a more aggressive approach (as long as they are willing t= o=20 >> pay for it). >>=20 >> Instead of adding priorities for eligible ranges of memory to khugepag= ed,=20 >> temporarily speeding khugepaged up for the whole system, or sharding i= ts=20 >> work for memory belonging to a certain process, one approach would be = to=20 >> allow userspace to induce hugepage collapse. >>=20 >> The benefit to this approach would be that this is done in process con= text=20 >> so its cpu is charged to the process that is inducing the collapse. =20 >> Khugepaged is not involved. >=20 > Yes, this makes a lot of sense to me. >=20 >> Idea was to allow userspace to induce hugepage collapse through the ne= w=20 >> process_madvise() call. This allows us to collapse hugepages on behal= f of=20 >> current or another process for a vectored set of ranges. >=20 > Yes, madvise sounds like a good fit for the purpose. Agreed on both points. >> This could be done through a new process_madvise() mode *or* it could = be a=20 >> flag to MADV_HUGEPAGE since process_madvise() allows for a flag parame= ter=20 >> to be passed. For example, MADV_F_SYNC. >=20 > Would this MADV_F_SYNC be applicable to other madvise modes? Most > existing madvise modes do not seem to make much sense. We can argue tha= t > MADV_PAGEOUT would guarantee the range was indeed reclaimed but I am no= t > sure we want to provide such a strong semantic because it can limit > future reclaim optimizations. >=20 > To me MADV_HUGEPAGE_COLLAPSE sounds like the easiest way forward. I guess in the old madvise(2) we could create a new combo of MADV_HUGEPAG= E | MADV_WILLNEED with this semantic? But you are probably more interested in process_madvise() anyway. There the new flag would make more sense. But t= here's also David H.'s proposal for MADV_POPULATE and there might be benefit in considering both at the same time? Should e.g. MADV_POPULATE with MADV_HU= GEPAGE have the collapse semantics? But would MADV_POPULATE be added to process_madvise() as well? Just thinking out loud so we don't end up with= more flags than necessary, it's already confusing enough as it is.