From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C64BC433FE for ; Wed, 2 Nov 2022 20:08:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 860FC8E0003; Wed, 2 Nov 2022 16:08:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 810FE8E0001; Wed, 2 Nov 2022 16:08:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D8E78E0003; Wed, 2 Nov 2022 16:08:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5F4488E0001 for ; Wed, 2 Nov 2022 16:08:22 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2A44A140F18 for ; Wed, 2 Nov 2022 20:08:22 +0000 (UTC) X-FDA: 80089589244.08.C9559E7 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf20.hostedemail.com (Postfix) with ESMTP id B998C1C0009 for ; Wed, 2 Nov 2022 20:08:21 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id l2so17611148pld.13 for ; Wed, 02 Nov 2022 13:08:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Sx8QQTJJYra6kPHU+WuCPovUOVIpybSKQpNWIIcdQyY=; b=gipJh4+ecnNSjOXg15dlwdnBvGur3Ce221IgYBrjjjbuCGBjtllb9Y0I+ZJvNjmhcL ROL0AaxVaT1/X2uLT6iXEYzv/I1mefhS/1Fqvs8ausq2aBP8O0ENDFnUA84lddy6sk9e 9MZuCa85bsgCl3UDNiqWpkLOdxEJerp68su9/1H13IsZAWxGyM4mV5GHq/I0LWJhN6P+ zwyJ3pPbUA4KsoG3pWjt7FIBjLg5WNXQ2ZSc/0BdK6ySG2Dsz7jJldb+S6V0NYExyvZH bO1a+tGiQBAWZXJhI2XsDGkwumZb0zFfHhmYD4jA73EIp9siL3VEZwgbjv/EuwSjEKgp NDFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Sx8QQTJJYra6kPHU+WuCPovUOVIpybSKQpNWIIcdQyY=; b=W9gWgUGY/tb8P0SZMqdUYIaZOkaOG5d+NustsrKLBd1GgMYsC0y5ZhIVPYAuGX9pVO ajrpRcGMWcqbuhjKu5mV2HQMg4CtEXhoT9oeFx3pxkJEAmeIXBP558HyOGPBS7z5NXaM 0ZJr/5JxtpHzP4eqreFOM8/etaDhpVY/j08K1bpAvg56jc+Co5Pz2pWwaIAP94cM/1UJ qxT+m1jEItdm716N4txwfo4HJF427Jme+FcA7tChRA2CtCHPJZSZW3TBfzdZsOoqW9ea RxyP9RQ3z/7dtSXOCvamAW2aycYab23mRvmpjcPmNSPE7vA1TVwodvs9Wld4DkCF7ah+ C6rg== X-Gm-Message-State: ACrzQf0AHGO9Y6YWe+YK5RIFWpUQ74Z2RssuU3OBYRmiCckTbYUYE3Cf cIPlYjBYYAbCwTE32oSTmSovAwDYaXlMETasBTw= X-Google-Smtp-Source: AMsMyM7hEbZCwnUgPz7QHBOb3xUr3DndqBtAzXZpbxygDTqPVKcyYPnqAL1t2ViWEnlKNFNjLywFkYoQlgJ2+UW+f8k= X-Received: by 2002:a17:902:e88e:b0:187:27a7:c8a9 with SMTP id w14-20020a170902e88e00b0018727a7c8a9mr15520603plg.169.1667419700704; Wed, 02 Nov 2022 13:08:20 -0700 (PDT) MIME-Version: 1.0 References: <20221031183122.470962-1-shy828301@gmail.com> In-Reply-To: From: Yang Shi Date: Wed, 2 Nov 2022 13:08:08 -0700 Message-ID: Subject: Re: [PATCH] mm: don't warn if the node is offlined To: "Zach O'Keefe" Cc: Michal Hocko , akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Davidoff , Bob Liu Content-Type: text/plain; charset="UTF-8" ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667419701; a=rsa-sha256; cv=none; b=O6CrUFlo57dmNJFZ/1WvyyY3F+i2pIWjpBOgWWkepnUGQpyAShy2p1suKHJcSaSefybWuN uxEcH7/1ylX87RA3PltUS+x9JunmDSPdIa5V5DDZEVhwbstFycUr2MXDtzqTMjENYDYrNf ZPe+ENT7oJm/wHhU5JKEuuUNxKPL0WM= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gipJh4+e; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=shy828301@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667419701; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Sx8QQTJJYra6kPHU+WuCPovUOVIpybSKQpNWIIcdQyY=; b=zz38/++hmqkaj3LnHKx+EwmDNHDm7mIhVBJkgsERbUGdOmf8uAGD36DpJFn1LG/DgeoZzZ UYWNAZ9YMTQ15lay27YP0skup8Wo8C3whis2icX43VFLgk6Gkbycihk6pw89KhoY7wXAfK RmhNbyLsVb5hTdjr1FFj0+mPGwI/VxM= X-Stat-Signature: ry5nqoyo7awn9jr37qqbzwedzfc4qdgk X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: B998C1C0009 X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=gipJh4+e; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf20.hostedemail.com: domain of shy828301@gmail.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=shy828301@gmail.com X-HE-Tag: 1667419701-871160 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 2, 2022 at 11:59 AM Zach O'Keefe wrote: > > On Wed, Nov 2, 2022 at 11:18 AM Yang Shi wrote: > > > > On Wed, Nov 2, 2022 at 10:47 AM Michal Hocko wrote: > > > > > > On Wed 02-11-22 10:36:07, Yang Shi wrote: > > > > On Wed, Nov 2, 2022 at 9:15 AM Michal Hocko wrote: > > > > > > > > > > On Wed 02-11-22 09:03:57, Yang Shi wrote: > > > > > > On Wed, Nov 2, 2022 at 12:39 AM Michal Hocko wrote: > > > > > > > > > > > > > > On Tue 01-11-22 12:13:35, Zach O'Keefe wrote: > > > > > > > [...] > > > > > > > > This is slightly tangential - but I don't want to send a new mail > > > > > > > > about it -- but I wonder if we should be doing __GFP_THISNODE + > > > > > > > > explicit node vs having hpage_collapse_find_target_node() set a > > > > > > > > nodemask. We could then provide fallback nodes for ties, or if some > > > > > > > > node contained > some threshold number of pages. > > > > > > > > > > > > > > I would simply go with something like this (not even compile tested): > > > > > > > > > > > > Thanks, Michal. It is definitely an option. As I talked with Zach, I'm > > > > > > not sure whether it is worth making the code more complicated for such > > > > > > micro optimization or not. Removing __GFP_THISNODE or even removing > > > > > > the node balance code should be fine too IMHO. TBH I doubt there would > > > > > > be any noticeable difference. > > > > > > > > > > I do agree that an explicit nodes (quasi)round robin sounds over > > > > > engineered. It makes some sense to try to target the prevalent node > > > > > though because this code can be executed from khugepaged and therefore > > > > > allocating with a completely different affinity than the original fault. > > > > > > > > Yeah, the corner case comes from the node balance code, it just tries > > > > to balance between multiple prevalent nodes, so you agree to remove it > > > > IIRC? > > > > > > Yeah, let's just collect all good nodes into a nodemask and keep > > > __GFP_THISNODE in place. You can consider having the nodemask per collapse_control > > > so that you allocate it only once in the struct lifetime. > > > > Actually my intention is more aggressive, just remove that node balance code. > > > > The balancing code dates back to 2013 commit 9f1b868a13ac ("mm: thp: > khugepaged: add policy for finding target node") where it was made to > satisfy "numactl --interleave=all". I don't know why any real > workloads would want this -- but there very well could be a valid use > case. If not, I think it could be removed independent of what we do > with __GFP_THISNODE and nodemask. Hmm... if the code is used for interleave, I don't think nodemask could preserve the behavior IIUC. The nodemask also tries to allocate memory from the preferred node, and fallback to the allowed nodes from nodemask when the allocation fails on the preferred node. But the round robin style node balance tries to distribute the THP on the nodes evenly. And I just thought of __GFP_THISNODE + nodemask should not be the right combination IIUC, right? __GFP_THISNODE does disallow any fallback, so nodemask is actually useless. So I think we narrowed down to two options: 1. Preserve the interleave behavior but bail out if the target node is not online (it is also racy, but doesn't hurt) 2. Remove the node balance code entirely > > Balancing aside -- I haven't fully thought through what an ideal (and > further overengineered) solution would be for numa, but one (perceived > - not measured) issue that khugepaged might have (MADV_COLLAPSE > doesn't have the choice) is on systems with many, many nodes with > source pages sprinkled across all of them. Should we collapse these > pages into a single THP from the node with the most (but could still > be a small %) pages? Probably there are better candidates. So, maybe a > khugepaged-only check for max_value > (HPAGE_PMD_NR >> 1) or something > makes sense. Anyway you have to allocate a THP on one node, I don't think of a better idea to make the node selection fairer. But I'd prefer to wait for real life usecase surfaces. > > > > > > > And as mentioned in other reply it would be really nice to hide this > > > under CONFIG_NUMA (in a standalong follow up of course). > > > > The hpage_collapse_find_target_node() function itself is defined under > > CONFIG_NUMA. > > > > > > > > -- > > > Michal Hocko > > > SUSE Labs