From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9E9B2C433EF for ; Wed, 25 May 2022 18:09:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E98388D0003; Wed, 25 May 2022 14:09:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E29CF8D0002; Wed, 25 May 2022 14:09:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE6748D0003; Wed, 25 May 2022 14:09:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id BA5AE8D0002 for ; Wed, 25 May 2022 14:09:43 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 8746334CC7 for ; Wed, 25 May 2022 18:09:43 +0000 (UTC) X-FDA: 79505053446.22.A0D9E58 Received: from mail-lj1-f171.google.com (mail-lj1-f171.google.com [209.85.208.171]) by imf01.hostedemail.com (Postfix) with ESMTP id 6883140057 for ; Wed, 25 May 2022 18:09:39 +0000 (UTC) Received: by mail-lj1-f171.google.com with SMTP id 1so12024542ljh.8 for ; Wed, 25 May 2022 11:09:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=P50rdxaf5/FPdw/hP5gmAClgu8QEdfEoLltgJCOh8ss=; b=Q/0It17CgwaSG2wbUchZ0FfZJcB/VeOOu/wlx07hJJXnCWoi6fvwncI4+XkQ8mBM33 yHzTYxj/wunKsAsbEQL1ZoU5GQE9gTB101mfPc+F53jo7MBSjBjte2vo6Jxklt7QXfAK vKDk1xEwAfV0GkxDVcBVE1zMari4JrK4gj5MB9AyGRQvuGuy4Uh4jCh4xzi2bq1RezWR zKPU4zrNNP8t0nBQTJEQ5LqRsM4QciCSfutgo+bXBv/dhBGL9xAhQW8qhktnWUf6FCFG 0LshabQAhYqQl4+T4L+PTqVpksV55sZ8F5scPoQz+V23rbc5jtGbx0mgOG+P4LhLDqGl eGOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=P50rdxaf5/FPdw/hP5gmAClgu8QEdfEoLltgJCOh8ss=; b=mo1ZAoXDZ7NWLoIv9L4N/nHY0k0ygjUGbGGnanMVow8CUY7QPX/A3UuUZe1h+KIm4K G7vZrsHBy4PflngDp5QnQSK4JwPeu7WkVCj7Eh3w5O56cyzyywHaObUARcps68v0iDFb LLKT6aDdOccDMSGXYPZ5EaCF2iZkbLErvdvUKIdbq9hmTJnSIWBIipXy480sjJj6s4eZ MHnaaxCc1xPHOkSJp+2c2flbUqqPL7jDphA2KTgUHwIFv9uSqhJsL0IXi5PlGONb0ya7 /nCkuPXH1A02gzXksSUrC6hd5hHOJ0eshjyAwo1KtA+6apfallp6t2Qy39O8DrrARMoy mJUg== X-Gm-Message-State: AOAM531yH8N9cAiAu7myrAEuJaUb2TB5sP6czkuAc2q7XGgvp0kwAV6j HcHXmzI+mbrHGiFaXWz/hGTtIIzBxa85uJSYsq6jmA== X-Google-Smtp-Source: ABdhPJxciZwEqAr9BpcNqLub7+TcspsfEzSDI7xnPgyFCbiKHiPOKGt3HmooC12wROqTwAAsvw/FATDmmMlWRRbwxhM= X-Received: by 2002:a2e:b0d6:0:b0:253:e682:2644 with SMTP id g22-20020a2eb0d6000000b00253e6822644mr12146565ljl.187.1653502181049; Wed, 25 May 2022 11:09:41 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: "Zach O'Keefe" Date: Wed, 25 May 2022 11:09:04 -0700 Message-ID: Subject: Re: [RFC] mm: MADV_COLLAPSE semantics To: Yang Shi , Michal Hocko Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Peter Xu , Song Liu , Linux MM , Rongwei Wang , Andrea Arcangeli , Axel Rasmussen , Hugh Dickins , "Kirill A. Shutemov" , Minchan Kim , SeongJae Park , Pasha Tatashin Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Queue-Id: 6883140057 X-Stat-Signature: bn5m344xrh7jppug71une8pu57cbxxjx Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Q/0It17C"; spf=pass (imf01.hostedemail.com: domain of zokeefe@google.com designates 209.85.208.171 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam03 X-HE-Tag: 1653502179-373008 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hey Michal and Yang, Thanks for the feedback! On Tue, May 24, 2022 at 1:02 PM Yang Shi wrote: > [...] > Page reclaim could also cause the THP split. And it may happen at any > time. I'm not sure how the users or callers could monitor it. I don't have a good idea of what monitoring would look like, but this is a great example that shows splitting can happen from underneath us and we'll have to design accordingly. Luckily in this example, the page is likely cold and therefore of less interest to be backed by THPs. On Wed, May 25, 2022 at 10:33 AM Yang Shi wrote: > > On Wed, May 25, 2022 at 1:24 AM Michal Hocko wrote: > > > > On Mon 23-05-22 17:18:32, Zach O'Keefe wrote: > > [...] > > > Idea: MADV_COLLAPSE should respect VM_NOHUGEPAGE and "never" THP mode, > > > but otherwise would attempt to collapse. > > > > I do agree that {process_}madvise should fail on VM_NOHUGEPAGE. The > > process has explicitly noted that THP shouldn't be used on such a VMA > > and seeing THP could be observed as not complying with that contract. > > > > I am not so sure about the global "never" policy, though. The global > > policy controls _kernel_ driven THPs. As the request to collapse memory > > comes from the userspace I do not think it should be limited by the > > kernel policy. Ya, I agree this would be ideal / is the cleanest. However, Peter mentioned a non-debug example where users wouldn't be expecting THPs after setting "never". Though, as Peter points out, I'm not sure how many users do this with CONFIG_TRANSPARENT_HUGEPAGE=y. >> I also think it can be beneficial to implement userspace > > based THP policies and exclude any kernel interference and that could be > > achieved by global kernel "never" policy and implement the whole > > functionality by process_madvise. I don't have a clear picture yet, but even if we move THP collapse policy to userspace, I imagine we'll still want an informed application/allocator to be able to MADV_HUGEPAGE'ing known hot memory and fault-in THPs rather than MADV_COLLAPSING after-the-fact. IOW, I don't know if we'll ever want "never". When I get started on this work, I was planning on some prctl(2) interface to disable khugepaged on processes where the userspace agent has taken responsibility for THP utilization. > I'd prefer to respect "never" for now since it is typically used to > disable THP globally even though the mappings are madvised > (MADV_HUGEPAGE). IMHO I treat MADV_COLLAPSE as weaker MADV_HUGEPAGE > (take effect for non-madvised mappings but not flip VM_NOHUGEPAGE) + > best-effort synchronous THP collapse. I'm likewise in favor of respecting it until proven otherwise - even though I agree with Michal that it would be nice to not depend on the kernel policy / sysfs settings here. > We could lift the restriction in the future if it turns out non > respecting "never" is more useful.