From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E2E6C433EF for ; Thu, 30 Jun 2022 14:18:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0D1658E0002; Thu, 30 Jun 2022 10:18:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 082D28E0001; Thu, 30 Jun 2022 10:18:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E64558E0002; Thu, 30 Jun 2022 10:17:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D26768E0001 for ; Thu, 30 Jun 2022 10:17:59 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 322B9606D2 for ; Thu, 30 Jun 2022 14:17:59 +0000 (UTC) X-FDA: 79635106278.20.B6B95B7 Received: from mail-pl1-f177.google.com (mail-pl1-f177.google.com [209.85.214.177]) by imf21.hostedemail.com (Postfix) with ESMTP id 8654C1C0038 for ; Thu, 30 Jun 2022 14:17:58 +0000 (UTC) Received: by mail-pl1-f177.google.com with SMTP id m14so17169012plg.5 for ; Thu, 30 Jun 2022 07:17:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=MmO3/GzX/orRz1EK+ezCC+5Hs4cwrkBpc9YnZwjm28s=; b=aT1IALruh58LnywVxFXZP3gdhRZrDKHNh27fSR9/du2uLltQ7JfTxk4tEpgTuJg7mF Sjmy0LF4LCHq7Li+2T5lTnQwTrOBiXTJPb8etDVj+k1TNPVZTM+YPFckda47F85vW1ww EATBersdX978CySRyuBuAnZHAb4IJONsWuhjf9JD43OelZaIQyzgiOhy9+817pSOyAQC wJB3B2NeBwMytJ1C2QVS5lAyPD+L1XxxR0nPh1/jQ9eYMGy43LbxBz949ZRNvWdGkxo5 GvXjodAgoZoJUsI5Q/mWRrCAd8WiNzmu+3kG2f9BySCGXNK48kH7ws9uQzhZoHYMD/q6 7pXA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=MmO3/GzX/orRz1EK+ezCC+5Hs4cwrkBpc9YnZwjm28s=; b=GFYX6115R4Xn8RMTdAk5BGZfKr4LTqh+SOBva3p7M7NKIyYZNWn1UKSug41KKVC9Fw WXlYTEcQ72gH2Cz8fpQ6R9eVebwNr+TjghNj5UYV2UuJO3W3bZ/eA+bJQyiNlaBBDwCI zhW/15Lm5SrnXT4+up/UCI41LpycVaFNsGoZqq+A5teU1habB5n7WTs1UsI66rJcRjDF V+QKapNebB18jgQXOV/8IURIcI8+x861ki8WAU9n032EG1hyXhhKrb3eqJuJGLgiYCdh b5Tc6I5DYlho/0KYDdR799Tf+b0+2Ccr4XSnTrwKhHlNpAnD/43ilxudOj3etahx86T2 Ge3A== X-Gm-Message-State: AJIora/eZE1ILLCPOr6xtJGlkgvRSuuuGcNPgOFAfkbE1S8IRgCkRqMd 4IchfNxBDnZv556R34S5LRwFng== X-Google-Smtp-Source: AGRyM1tl6C4fxmw53uVJv8bmUnpsH8sSm5S9N/UtAJzl9RbHc/qokSk98gBfTOCvP8xsOYf6FOw//g== X-Received: by 2002:a17:90b:4b84:b0:1ef:27b:d3d1 with SMTP id lr4-20020a17090b4b8400b001ef027bd3d1mr12903722pjb.30.1656598677103; Thu, 30 Jun 2022 07:17:57 -0700 (PDT) Received: from google.com (55.212.185.35.bc.googleusercontent.com. [35.185.212.55]) by smtp.gmail.com with ESMTPSA id x20-20020aa79574000000b0050dc7628183sm14041423pfq.93.2022.06.30.07.17.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 30 Jun 2022 07:17:56 -0700 (PDT) Date: Thu, 30 Jun 2022 07:17:52 -0700 From: Zach O'Keefe To: Peter Xu Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Rongwei Wang , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer Subject: Re: [PATCH v6 08/15] mm/khugepaged: add flag to ignore THP sysfs enabled Message-ID: References: <20220604004004.954674-1-zokeefe@google.com> <20220604004004.954674-9-zokeefe@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aT1IALru; spf=pass (imf21.hostedemail.com: domain of zokeefe@google.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656598678; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MmO3/GzX/orRz1EK+ezCC+5Hs4cwrkBpc9YnZwjm28s=; b=xmWQ1ktECwA9kCRdmNUcbahO8Qp97mBI6jG0/E+Ei0ozPPs0BOjcsyWivqFBtf/xvp4VjJ Dpe+Shx36SCY7X2TYIXG9lIZ8GugHs2CCI/0ZuJPI84l028ZrzaVt2bguZlqKZuEJtPP+o Tl7btfUSIx1jdTSaqdFub8KidvL1wIw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656598678; a=rsa-sha256; cv=none; b=8bECBOqXMVZlTL58fROu6+Wc6XUDF2RQ/lSlTjla/5Wvd3hgA9v9hXvRq1o1R7cGl4+gHS sgPGRtRSSJOMuuaZo3UdoqZAtfzgY3sQBUtNnqVvxP7fWp5c0q3gkiVJPtP2XmHpfUhwMX v7J1fZbqz9L6Q7rDbZvqbvlijG2oZik= X-Stat-Signature: pbkhoxtmdg51nz15zqxazh8ajofxezsz X-Rspamd-Queue-Id: 8654C1C0038 Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aT1IALru; spf=pass (imf21.hostedemail.com: domain of zokeefe@google.com designates 209.85.214.177 as permitted sender) smtp.mailfrom=zokeefe@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam12 X-Rspam-User: X-HE-Tag: 1656598678-124945 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Jun 29 22:32, Peter Xu wrote: > On Wed, Jun 29, 2022 at 06:42:25PM -0700, Zach O'Keefe wrote: > > On Jun 29 19:21, Peter Xu wrote: > > > On Fri, Jun 03, 2022 at 05:39:57PM -0700, Zach O'Keefe wrote: > > > > Add enforce_thp_enabled flag to struct collapse_control that allows context > > > > to ignore constraints imposed by /sys/kernel/transparent_hugepage/enabled. > > > > > > > > This flag is set in khugepaged collapse context to preserve existing > > > > khugepaged behavior. > > > > > > > > This flag will be used (unset) when introducing madvise collapse > > > > context since the desired THP semantics of MADV_COLLAPSE aren't coupled > > > > to sysfs THP settings. Most notably, for the purpose of eventual > > > > madvise_collapse(2) support, this allows userspace to trigger THP collapse > > > > on behalf of another processes, without adding support to meddle with > > > > the VMA flags of said process, or change sysfs THP settings. > > > > > > > > For now, limit this flag to /sys/kernel/transparent_hugepage/enabled, > > > > but it can be expanded to include > > > > /sys/kernel/transparent_hugepage/shmem_enabled later. > > > > > > > > Link: https://lore.kernel.org/linux-mm/CAAa6QmQxay1_=Pmt8oCX2-Va18t44FV-Vs-WsQt_6+qBks4nZA@mail.gmail.com/ > > > > > > > > Signed-off-by: Zach O'Keefe > > > > --- > > > > mm/khugepaged.c | 34 +++++++++++++++++++++++++++------- > > > > 1 file changed, 27 insertions(+), 7 deletions(-) > > > > > > > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > > > > index c3589b3e238d..4ad04f552347 100644 > > > > --- a/mm/khugepaged.c > > > > +++ b/mm/khugepaged.c > > > > @@ -94,6 +94,11 @@ struct collapse_control { > > > > */ > > > > bool enforce_page_heuristics; > > > > > > > > + /* Enforce constraints of > > > > + * /sys/kernel/mm/transparent_hugepage/enabled > > > > + */ > > > > + bool enforce_thp_enabled; > > > > > > Small nitpick that we could have merged the two booleans if they always > > > match, but no strong opinions if you think these two are clearer. Or maybe > > > there's other plan of using them? > > > > > > > + > > > > /* Num pages scanned per node */ > > > > int node_load[MAX_NUMNODES]; > > > > > > > > @@ -893,10 +898,12 @@ static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) > > > > */ > > > > > > > > static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > - struct vm_area_struct **vmap) > > > > + struct vm_area_struct **vmap, > > > > + struct collapse_control *cc) > > > > { > > > > struct vm_area_struct *vma; > > > > unsigned long hstart, hend; > > > > + unsigned long vma_flags; > > > > > > > > if (unlikely(khugepaged_test_exit(mm))) > > > > return SCAN_ANY_PROCESS; > > > > @@ -909,7 +916,18 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, > > > > hend = vma->vm_end & HPAGE_PMD_MASK; > > > > if (address < hstart || address + HPAGE_PMD_SIZE > hend) > > > > return SCAN_ADDRESS_RANGE; > > > > - if (!hugepage_vma_check(vma, vma->vm_flags)) > > > > + > > > > + /* > > > > + * If !cc->enforce_thp_enabled, set VM_HUGEPAGE so that > > > > + * hugepage_vma_check() can pass even if > > > > + * TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG is set (i.e. "madvise" mode). > > > > + * Note that hugepage_vma_check() doesn't enforce that > > > > + * TRANSPARENT_HUGEPAGE_FLAG or TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG > > > > + * must be set (i.e. "never" mode). > > > > + */ > > > > + vma_flags = cc->enforce_thp_enabled ? vma->vm_flags > > > > + : vma->vm_flags | VM_HUGEPAGE; > > > > > > Another nitpick.. > > > > > > We could get a weird vm_flags when VM_NOHUGEPAGE is set. I don't think > > > it'll go wrong since hugepage_vma_check() checks NOHUGEPAGE first, but IMHO > > > we shouldn't rely on that as it seems error prone (e.g. when accidentally > > > moved things around). > > > > > > So maybe nicer to only apply VM_HUGEPAGE if !VM_NOHUGEPAGE? Or pass over > > > "enforce_thp_enabled" into hugepage_vma_check() should work too, iiuc. > > > Passing in the boolean has one benefit that we don't really need the > > > complicated comment above since the code should be able to explain itself. > > > > Hey Peter, thanks again for taking the time to review. > > > > Answering both of the above at the time: > > > > As in this series so far, I've tried to keep context functionally-declarative - > > specifying the intended behavior (e.g. "enforce_page_heuristics") rather than > > adding "if (khugepaged) { .. } else if (madv_collapse) { .. } else if { .. }" > > around the code which, IMO, makes it difficult to follow. Unfortunately, I've > > ran into the 2 problems you've stated here: > > > > 1) *Right now* all the behavior knobs are either off/on at the same time > > 2) For hugepage_vma_check() (now in mm/huge_memory.c and acting as the central > > authority on THP eligibility), things are complicated enough that I > > couldn't find a clean way to describe the parameters of the context without > > explicitly mentioning the caller. > > > > For (2), instead of adding another arg to specify MADV_COLLAPSE's behavior, > > I think we need to package these contexts into a single set of flags: > > > > enum thp_ctx_flags { > > THP_CTX_ANON_FAULT = 1 << 1, > > THP_CTX_KHUGEPAGED = 1 << 2, > > THP_CTX_SMAPS = 1 << 3, > > THP_CTX_MADVISE_COLLAPSE = 1 << 4, > > }; > > > > That will avoid hacking vma flags passed to hugepage_vma_check(). > > > > And, if we have these anyways, I might as well do away with some of the > > (semantically meaningful but functionally redundant) flags in > > struct collapse_control and just specify a single .thp_ctx_flags member. I'm > > not entirely happy with it - but that's what I'm planning. > > > > WDYT? > > Firstly I think I wrongly sent previous email privately.. :( Let me try to > add the list back.. > > IMHO we don't need to worry too much on the "if... else if... else", > because they shouldn't be more complicated than when you spread the > meanings into multiple flags, or how could it be? :) IMHO it should > literally be as simple as applying: > > s/enforce_{A|B|C|...}/khugepaged_initiated/g > > Throughout the patches, then we squash the patches introducing enforce_X. Right, the code today will be virtually identical. The attempt to describe contexts in terms of behaviors is based on an unfounded assumption that successive contexts could reuse said behaviors - but there are currently no plans for other collapsing contexts. > If you worry it's not clear on "what does khugepaged_initiated mean", we > could add whatever comment above the variable explaining A/B/C/D will be > covered when this is set, and we could postpone to do the flag split only > until there're real user. > > Adding these flags could add unnecessary bit-and instructions into the code > generated at last, and if it's only about readability issue that's really > what comment is for? Ya maybe I'm overthinking it and will just do the most straightforward thing for now. As always, thanks for taking the time to review / offer suggestions! Best, Zach > Thanks, > > -- > Peter Xu >