From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75CC9C433EF for ; Thu, 28 Apr 2022 15:37:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B7AE76B00A8; Thu, 28 Apr 2022 11:37:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B02936B00A9; Thu, 28 Apr 2022 11:37:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9552F6B00AA; Thu, 28 Apr 2022 11:37:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 7FDF36B00A8 for ; Thu, 28 Apr 2022 11:37:45 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id 534DE1220A2 for ; Thu, 28 Apr 2022 15:37:45 +0000 (UTC) X-FDA: 79406692890.28.E229AD2 Received: from mail-lf1-f51.google.com (mail-lf1-f51.google.com [209.85.167.51]) by imf20.hostedemail.com (Postfix) with ESMTP id ED3AC1C0071 for ; Thu, 28 Apr 2022 15:37:40 +0000 (UTC) Received: by mail-lf1-f51.google.com with SMTP id bu29so9406963lfb.0 for ; Thu, 28 Apr 2022 08:37:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=atgwk6ZuBK5EUmVdF3Wj8sZVXMGonjnOSWYbWMnvvUE=; b=VqSehxHxBp0A6i/VfLuCzJVfbSQCFpWoYq3wpc28+NW0nsm3+ZJHCxqVX6ss6QQFiv GOfx/TKW/k4mB5TTrNyKhb9yhLMWGL5wH/SVoOOYmSb7rBcMrgejcc3nkciNQiRu4xfQ vD+RspvcW+6k5bE8xJ4LSEVQlu9c6lJo+Le2buC6OwOZC5j3W5YHHh4Xe8zTu6EawBeG JDqxCVRgn/NVlMSUfOKF1LXPp7mElbfiPg0ND7Tgt9hjVZZprfNCbqIimXcYXZ8Wr5q1 BnlubhBgYhpsVhxOnZsW6KK1hoOOnSnh/5sDUZZSesnFRbED41FwUK+gn/x95TCQ0kmg n5rA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=atgwk6ZuBK5EUmVdF3Wj8sZVXMGonjnOSWYbWMnvvUE=; b=UpMLWxrAzLmZQPt/gqZJHC34ah/KOykp+wMeSOR4i9jdKqioNb09vJZ5aQaanuJY+t f0/s25OroT5J89xUMu434G+VMX9JUEJkqZgfvfeahKvPV78Cd0Cl+6ucsnPxCfCkAPzM MSJd5Hp8lCV8FiDRV71d5CE5JytxTg1LjCUMhcxO+EGGNfIR+5SvaQvlU87AX/NpOvfP jMk+w5mJCkUos8vVa2h/GVBg2JOzWUSHV77LMAVzoAJB2aU+IAds/SYeQ/1jH870UIGG KcpPnHSs3u1j4P9eCISgPBV9ID9D5MkMVJ844d7NxNxkadKhWxh/ATEeasm/PVER2ONc 8lEw== X-Gm-Message-State: AOAM533hMUvLypbD4uOLQ1F/BEFq9ry5Y9aM3WZSiFPd6Qqe1nU9rOKP 8jNOZOV9j2M+Df2K7uAGYm0kvaiyK/MSuS9tzCITmQ== X-Google-Smtp-Source: ABdhPJzmO/37qH8OfGoCODT6NBl1OCcH0fFK6RXpdrV1ARgsDwqSKqi2lbNJOnO7NdG1Ny8QiIsGfL7r1VQhhnO1sPc= X-Received: by 2002:a05:6512:3da1:b0:472:28c9:851b with SMTP id k33-20020a0565123da100b0047228c9851bmr7075575lfv.359.1651160262913; Thu, 28 Apr 2022 08:37:42 -0700 (PDT) MIME-Version: 1.0 References: <20220426144412.742113-1-zokeefe@google.com> <20220426144412.742113-4-zokeefe@google.com> In-Reply-To: From: "Zach O'Keefe" Date: Thu, 28 Apr 2022 08:37:06 -0700 Message-ID: Subject: Re: [PATCH v3 03/12] mm/khugepaged: make hugepage allocation context-specific To: Peter Xu Cc: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org, Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , kernel test robot Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: 3ftfb5f4i6q1d7uuu4uotudsbzdsijjj X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: ED3AC1C0071 X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VqSehxHx; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf20.hostedemail.com: domain of zokeefe@google.com designates 209.85.167.51 as permitted sender) smtp.mailfrom=zokeefe@google.com X-HE-Tag: 1651160260-370383 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Apr 28, 2022 at 7:51 AM Peter Xu wrote: > > On Wed, Apr 27, 2022 at 05:51:53PM -0700, Zach O'Keefe wrote: > > > > @@ -1110,13 +1115,14 @@ static void collapse_huge_page(struct mm_struct *mm, > > > > */ > > > > mmap_read_unlock(mm); > > > > > > > > + node = khugepaged_find_target_node(cc); > > > > /* sched to specified node before huage page memory copy */ > > > > if (task_node(current) != node) { > > > > cpumask = cpumask_of_node(node); > > > > if (!cpumask_empty(cpumask)) > > > > set_cpus_allowed_ptr(current, cpumask); > > > > } > > > > - new_page = khugepaged_alloc_page(hpage, gfp, node); > > > > + new_page = cc->alloc_hpage(cc, gfp, node); > > > > > > AFAICT you removed all references of khugepaged_alloc_page() in this patch, > > > then you'd better drop the function for both NUMA and UMA. > > > > > > > Sorry, I'm not sure I follow. In khugepaged context, logic WRT > > khugepaged_alloc_page() is unchanged - it's just called indirectly > > through ->alloc_hpage(). > > Ah you're right, sorry for the confusion. > > > > > > Said that, I think it's actually better to keep them, as they do things > > > useful. For example,AFAICT this ->alloc_hpage() change can leak the hpage > > > alloacted for UMA already so that's why I think keeping them makes sense, > > > then iiuc the BUG_ON would trigger with UMA already. > > > > > > I saw that you've moved khugepaged_find_target_node() into this function > > > which looks definitely good, but if we could keep khugepaged_alloc_page() > > > and then IMHO we could even move khugepaged_find_target_node() into that > > > and drop the "node" parameter in khugepaged_alloc_page(). > > > > > > > I actually had done this, until commit 4272063db38c ("mm/khugepaged: > > sched to numa node when collapse huge page") which forced me to keep > > "node" visible in this function. > > Right, I was looking at my local tree and that patch seems to be very > lately added into -next. > > I'm curious why it wasn't applying to file thps too if it is worthwhile, > since if so that's also a suitable candidate to be moved into the same > hook. I've asked in that thread instead. > > Before that, feel free to keep your code as is. > Thanks for checking on that. On second thought, it seems like we would actually want to move this sched logic into the khupaged hook impl since we probably don't want to be moving around the calling process if in madvise context. > > > > > I'd even consider moving cc->gfp() all into it if possible, since the gfp > > > seems to be always with __GFP_THISNODE anyways. > > > > > > > I would have liked to do this, but the gfp flags are referenced again > > when calling mem_cgroup_charge(), so I couldn't quite get rid of them > > from here. > > Yeah, maybe we could move mem_cgroup_charge() into the hook too? As below > codes are duplicated between file & anon and IMHO they're good candidate to > a new helper already anyway: > > /* Only allocate from the target node */ > gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; > > new_page = khugepaged_alloc_page(hpage, gfp, node); > if (!new_page) { > result = SCAN_ALLOC_HUGE_PAGE_FAIL; > goto out; > } > > if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { > result = SCAN_CGROUP_CHARGE_FAIL; > goto out; > } > count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); > > If we want to generalize it maybe we want to return the "result" instead of > the new page though, so the newpage can be passed in as a pointer. > > There's one mmap_read_unlock(mm) for the anonymous code but I think that > can simply be moved above the chunk. > > No strong opinion here, please feel free to think about what's the best > approach for landing this series. > Thanks for the suggestion. I'll play around with it a little and see what looks good. > [...] > > > > > I could (?) make .gfp of type gfp_t and just update it on every > > khugepaged scan (in case it changed) and also remove the gfp parameter > > for ->alloc_hpage(). > > If that's the case I'd prefer you keep you code as-is; gfp() is perfectly > fine and gfp() is light, I'm afraid that caching thing could make things > complicated. > Ack. Thanks again for your time and thoughts, Peter! Best, Zach > Thanks, > > -- > Peter Xu >