From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2F0E8C433F5 for ; Tue, 31 May 2022 23:48:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 986CF6B0073; Tue, 31 May 2022 19:48:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 910CB6B0074; Tue, 31 May 2022 19:48:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D3686B0075; Tue, 31 May 2022 19:48:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6EF716B0073 for ; Tue, 31 May 2022 19:48:03 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 49E592079F for ; Tue, 31 May 2022 23:48:03 +0000 (UTC) X-FDA: 79527678846.09.C550207 Received: from mail-pj1-f52.google.com (mail-pj1-f52.google.com [209.85.216.52]) by imf22.hostedemail.com (Postfix) with ESMTP id 90AA6C0069 for ; Tue, 31 May 2022 23:47:59 +0000 (UTC) Received: by mail-pj1-f52.google.com with SMTP id l20-20020a17090a409400b001dd2a9d555bso428241pjg.0 for ; Tue, 31 May 2022 16:48:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ucaWKu+xIZxUd3kZGJEJSOfU6zmGP+6iIWa/T44IcIc=; b=Lr4y71gTSgPUDzBSoOFxVZqhMdO/qq3N2gkp5vq6hYfEW898vxF2+Jws+yOPjHLy/Q +nwy/NNwGE4GsARScy1IK/LIgW7JxJCFksCJmG6uXzV0wBDd7Z3ep81TG4jifKFr1eV1 w1Zn4m0OOSkCMd8r7GbUiphyc+6SXo0hW5DwdyxNZx3VPkHZA0K/+2Gi3hxJSgwRKWDz 1Pz7AWuQt3+bRu+AiQx3NmI3xYKQcSaNi8i50BgJxBWv9hHiylA5LPi6GbdxGtsZPIkC 3AyIkjEj4Ea3Hz29AdiRlhugah79MDeXvTquumyUCNCdmim/3JsxSCBuyfXqf15HIUYI rDug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ucaWKu+xIZxUd3kZGJEJSOfU6zmGP+6iIWa/T44IcIc=; b=wtXdapXZuNsrZZyiHCWHGdbwK/Xk2hchuQTzDGPUParqzvQ6wBXw2YIfc5u/AIhn0a 6hiVB8ygz9MkPgYdMpXMVNR5G8S2imM7s0OJMFeyKg+O/e5xDzugeAu0tXbiPNQtoW8O tOo8nhm7CL7GW85SWOGfW8EN8N8SZBQFatfp+uNKERYRP4xPT5hBc7xUyqHm53M/jGV/ MIRtf0m+C8BaLn5VlO6Da+L8YGAlpvtpKEtzkokYnzMlE4JLKanEl6pI1hJUVVsUdltG u3skk593qOEF9P3xghgMe/ygeLFYyhxpfjUhhbyWzGCBlugJUtf1MAW0KXvR5napvlj/ Sbpg== X-Gm-Message-State: AOAM5308s4thXEA7cEnT47iIOF0dQRmZyOCoeYssbhK/G0v+kAfTCHhw SCY+1QkayKlonMbVdVQDzfQ564QMWepHEDPSdIY= X-Google-Smtp-Source: ABdhPJzlDhvHMC+3CM7/cCinXHXIGejRGkXv72zdI0+HVVW8XEYeERrtdpD0Pi/YPV5ENx7DoL092coqSH1XdFldBx0= X-Received: by 2002:a17:90a:ce84:b0:1e4:d803:fad with SMTP id g4-20020a17090ace8400b001e4d8030fadmr1259880pju.99.1654040881798; Tue, 31 May 2022 16:48:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yang Shi Date: Tue, 31 May 2022 16:47:49 -0700 Message-ID: Subject: Re: [RFC] mm: MADV_COLLAPSE semantics To: Michal Hocko Cc: "Zach O'Keefe" , Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Peter Xu , Song Liu , Linux MM , Rongwei Wang , Andrea Arcangeli , Axel Rasmussen , Hugh Dickins , "Kirill A. Shutemov" , Minchan Kim , SeongJae Park , Pasha Tatashin Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 90AA6C0069 X-Stat-Signature: z6t64tm3i9dht9hygbzkyr467hczk6sg X-Rspam-User: Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=Lr4y71gT; spf=pass (imf22.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.52 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-HE-Tag: 1654040879-35700 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, May 27, 2022 at 2:46 AM Michal Hocko wrote: > > On Thu 26-05-22 10:39:42, Yang Shi wrote: > > On Thu, May 26, 2022 at 12:12 AM Michal Hocko wrote: > > > > > > On Wed 25-05-22 10:32:44, Yang Shi wrote: > > > > On Wed, May 25, 2022 at 1:24 AM Michal Hocko wrote: > > > > > > > > > > On Mon 23-05-22 17:18:32, Zach O'Keefe wrote: > > > > > [...] > > > > > > Idea: MADV_COLLAPSE should respect VM_NOHUGEPAGE and "never" THP mode, > > > > > > but otherwise would attempt to collapse. > > > > > > > > > > I do agree that {process_}madvise should fail on VM_NOHUGEPAGE. The > > > > > process has explicitly noted that THP shouldn't be used on such a VMA > > > > > and seeing THP could be observed as not complying with that contract. > > > > > > > > > > I am not so sure about the global "never" policy, though. The global > > > > > policy controls _kernel_ driven THPs. As the request to collapse memory > > > > > comes from the userspace I do not think it should be limited by the > > > > > kernel policy. I also think it can be beneficial to implement userspace > > > > > based THP policies and exclude any kernel interference and that could be > > > > > achieved by global kernel "never" policy and implement the whole > > > > > functionality by process_madvise. > > > > > > > > I'd prefer to respect "never" for now since it is typically used to > > > > disable THP globally even though the mappings are madvised > > > > (MADV_HUGEPAGE). IMHO I treat MADV_COLLAPSE as weaker MADV_HUGEPAGE > > > > (take effect for non-madvised mappings but not flip VM_NOHUGEPAGE) + > > > > best-effort synchronous THP collapse. > > > > > > MADV_HUGEPAGE is a way to tell the kernel what and how to do in future > > > time by the kernel. MADV_COLLAPSE is a way tell what the userspace want > > > at the moment of the call. So I do not really think they are directly > > > related in any way except they somehow control THP. > > > > > > The primary question here is whether we want to support usecases which > > > want to completely rule out THP handling by the kernel and only rely on > > > the userspace. If yes, I do not see other way than using never global > > > policy and rely on MADV_COLLAPSE from the userspace. Or am I missing > > > something? > > > > I'm not sure whether we want to reach that eventually. > > My experience tells me that sooner or later somebody comes with a > usecase for that. We are are not sure that is just a sign somebody will > have that idea. So either we have very good reasons to not allow that > possibility now and ideally we also document that or we should simply > assume it will happen. Yeah, it is definitely possible and nothing prevents that from happening. > > > But isn't > > "madvise" good enough? "madvise" also means to give the delegation to > > the users IMHO. The users decide whether huge page is preferred or > > not. The users could implement policies: > > > > No - MADV_NOHUGEPAGE > > Yes - MADV_HUGEPAGE > > > > But the THP allocation is deferred to real access (page fault) or > > khugepaged. So I treated MADV_COLLAPSE as weaker MAD_HUGEPAGE + > > synchronous THP allocation. > > I really do not see any good reason to tightly couple kernel and user > policies. Hints like MADV_{NO}HUGEPAGE are one thing and both kernel > and userspace might decide to interpret them. But binding MADV_COLLAPSE > to in kernel THP tunables just seems like pushing ourselves into the > corner. I don't mean we should tightly couple kernel and user policies. I think it is about how "never" is treated. AFAICT, typically sys admins tend to expect "never" as a global switch and they don't expect any THP allocation should happen in "never" mode even though it is requested by the users. Maybe they should not expect so in the first place. > -- > Michal Hocko > SUSE Labs