From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C8E9C4828E for ; Fri, 2 Feb 2024 09:55:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF7826B0078; Fri, 2 Feb 2024 04:55:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AA7AF6B007D; Fri, 2 Feb 2024 04:55:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 996AB6B007E; Fri, 2 Feb 2024 04:55:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 890456B0078 for ; Fri, 2 Feb 2024 04:55:14 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1C804C0EB5 for ; Fri, 2 Feb 2024 09:55:14 +0000 (UTC) X-FDA: 81746405748.23.587B08B Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf15.hostedemail.com (Postfix) with ESMTP id E2163A0008 for ; Fri, 2 Feb 2024 09:55:11 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=GwFtaS+x; dkim=pass header.d=suse.com header.s=susede1 header.b=GwFtaS+x; spf=pass (imf15.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706867712; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=s5p422fiooc4HHA5bCEKx8N+v4UtxVyax3SvdbSiWfE=; b=FX/6nEjJkwULVFqpuOrr9JKAx9k9qwc+CHP17iv9ABE5jf7NkN1g/UoCIeu29KHU/skriT Zu5pbGKNypZB09z0JnHH7nTsUqwG2kpMYlMezxZ4eIwVP4/oHrvZix5q0Rxiav3u3E7DVq apBkmqRG/SU+qtAvzv56WHbEytCz24E= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706867712; a=rsa-sha256; cv=none; b=xJQDZwYAJox7AIf1+Hf6+s9XsjDpeg9MYBaKiG0q8cy6h3DhG2k3/rta+ZfltoJ7Q7mVcK XHj/SxK4FiquGJADCsNkXYgcssY4BzoWR8gcyRB7WF+30JnwJaZ4YrS52xBg7o7Rhi+WUX X21CrE5v3Th0BBeIAeIUiRmArsfLaz4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=GwFtaS+x; dkim=pass header.d=suse.com header.s=susede1 header.b=GwFtaS+x; spf=pass (imf15.hostedemail.com: domain of mhocko@suse.com designates 195.135.223.131 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com Received: from imap1.dmz-prg2.suse.org (imap1.dmz-prg2.suse.org [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 24F771F461; Fri, 2 Feb 2024 09:55:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1706867710; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=s5p422fiooc4HHA5bCEKx8N+v4UtxVyax3SvdbSiWfE=; b=GwFtaS+xtSIK/VzRoDiKVLKHt54EWngxyEg2tyTVUVDIo1CAhGQ27AT/HgX3MSkv0RAAIc vqR1SmmCwWR9hp8kuzOaF3qfLAu8/77iKUNuUsuPeZt0uFNZDXLQF409p3bu+eq5x4w4Bp wvf2b45/lw6qwWd3pDAB3uv270x9ZkA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1706867710; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=s5p422fiooc4HHA5bCEKx8N+v4UtxVyax3SvdbSiWfE=; b=GwFtaS+xtSIK/VzRoDiKVLKHt54EWngxyEg2tyTVUVDIo1CAhGQ27AT/HgX3MSkv0RAAIc vqR1SmmCwWR9hp8kuzOaF3qfLAu8/77iKUNuUsuPeZt0uFNZDXLQF409p3bu+eq5x4w4Bp wvf2b45/lw6qwWd3pDAB3uv270x9ZkA= Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 0CDB8139AB; Fri, 2 Feb 2024 09:55:10 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id +76rAP67vGWkVQAAD6G6ig (envelope-from ); Fri, 02 Feb 2024 09:55:10 +0000 Date: Fri, 2 Feb 2024 10:55:05 +0100 From: Michal Hocko To: Baolin Wang Cc: akpm@linux-foundation.org, muchun.song@linux.dev, osalvador@suse.de, david@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [RFC PATCH] mm: hugetlb: remove __GFP_THISNODE flag when dissolving the old hugetlb Message-ID: References: <6f26ce22d2fcd523418a085f2c588fe0776d46e7.1706794035.git.baolin.wang@linux.alibaba.com> <3f31cd89-f349-4f9e-bc29-35f29f489633@linux.alibaba.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Stat-Signature: x548yg7f1m79a84y7ybd53wa3oa8c5db X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: E2163A0008 X-Rspam-User: X-HE-Tag: 1706867711-201142 X-HE-Meta: U2FsdGVkX1/+QY3ce8WgAtk/7BGL5PdA6E62ZPsGjukP/9kXvmYW7oNcLoOCy7/DctqUJSdSomt0aJkV+2sGzQWOr0fMPEPw509WDzpX57M3XpJwWwTuaM4Y1l3LEPNyKZQRmNOxgWuUcZ+lZ4W+E4IFOmr2DlfGSmu84tGgYXGw2Ch9vik0jTVC38p0S0Mcaus55XIqwKyUHaofhyueGW8gECnsuXNCy7jFV0bGVBBv0o1DL43iX8/jj/fIbM6vOIp94BFMM51dWA9fng9s7KgfHQh28jBepKhX/HFIlmMyjGyImOtSUqc6Br336ILvMSMCUMFd9lM0Ucz15V61C5VwHGj1Z2IYScTfw8WjqPAf70+BsDsflsck9My0E8T49whLUAJhO/S/IjYynKG9uYc4B8N7bIHYEq5rMnBGErPRZN89f74aROLs1q2jutCK4qaLON+A2XWM/MOEoXah2AJQQUKErpqJSlh3FkRJuZRxjUKa8NhhkiboBZmxlye93iLL/qWiKkvAfTFIlw7XbF37YAYfePO6dLV5mj7HQnjZJueKjcpcoOJ4cNYxBbj6SXV17Zo/hqsI+FVg2JyKkaUVl9MXEj/ECdEBjC9h5AqaAoj6uVTIGNWLonLcFrd5XR0zL/JjJg1wvvZALGr7KXH/3byZ71a1GLfa68t2VosKJmTY3mX+gBjS8mK/dAP99W5D04BFQRcTAEUwclGZZttHPsnNLn0c8+Pp/YpdUKloIxBOWmUIoqvRzJ2yMxfjmuTj4/Uy0d/Y8aQdwEl1bBbVDOuRhTEhYi1NsO3ziaGgd6i1zR5O0Iz9xcfNUoPyavetaaeIErtpdMxBpk7EYZzyERB2KbTKyqJHOpqbC3so2FylWNp8nqwIO+Vhhz6hY4pDN89073c0VEawScdVkCXN5WZ67loYc/57f48HVbB0pHAlfAoNb2qZBHqdCOrdV/em0LzPu+DtEXYMIfg Q4KCvj6N Of6c+Z8fjgk0tca/qBvnDomYIeAald6ogN6m60aMVBzcETyFEBcngXk2IUjWKkBBlffOTyTYXp/KT++2+PMdEdVFsQ4MytIHflBBWqjw0oaIkJoup6Z52MZd9qcNiY9bbu9q+j+jaxHSWf0EPANy6KdUlzOYjp4ipP3yC1tE7qq47Fwjv6qKLRWGezgdPJqZ0HI2ErgnXADvt7hXaX8H95NF/G/9y18dR39rg2A7mkvs0VHBkiDo+1aoCyQs4RLFBLudVkziyQTv1bLc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri 02-02-24 17:29:02, Baolin Wang wrote: > On 2/2/2024 4:17 PM, Michal Hocko wrote: [...] > > > Agree. So how about below changing? > > > (1) disallow fallbacking to other nodes when handing in-use hugetlb, which > > > can ensure consistent behavior in handling hugetlb. > > > > I can see two cases here. alloc_contig_range which is an internal kernel > > user and then we have memory offlining. The former shouldn't break the > > per-node hugetlb pool reservations, the latter might not have any other > > choice (the whole node could get offline and that resembles breaking cpu > > affininty if the cpu is gone). > > IMO, not always true for memory offlining, when handling a free hugetlb, it > disallows fallbacking, which is inconsistent. It's been some time I've looked into that code so I am not 100% sure how the free pool is currently handled. The above is the way I _think_ it should work from the usability POV. > Not only memory offlining, but also the longterm pinning (in > migrate_longterm_unpinnable_pages()) and memory failure (in > soft_offline_in_use_page()) can also break the per-node hugetlb pool > reservations. Bad > > Now I can see how a hugetlb page sitting inside a CMA region breaks CMA > > users expectations but hugetlb migration already tries hard to allocate > > a replacement hugetlb so the system must be under a heavy memory > > pressure if that fails, right? Is it possible that the hugetlb > > reservation is just overshooted here? Maybe the memory is just terribly > > fragmented though? > > > > Could you be more specific about numbers in your failure case? > > Sure. Our customer's machine contains serveral numa nodes, and the system > reserves a large number of CMA memory occupied 50% of the total memory which > is used for the virtual machine, meanwhile it also reserves lots of hugetlb > which can occupy 50% of the CMA. So before starting the virtual machine, the > hugetlb can use 50% of the CMA, but when starting the virtual machine, the > CMA will be used by the virtual machine and the hugetlb should be migrated > from CMA. Would it make more sense for hugetlb pages to _not_ use CMA in this case? I mean would be better off overall if the hugetlb pool was preallocated before the CMA is reserved? I do realize this is just working around the current limitations but it could be better than nothing. > Due to several nodes in the system, one node's memory can be exhausted, > which will fail the hugetlb migration with __GFP_THISNODE flag. Is the workload NUMA aware? I.e. do you bind virtual machines to specific nodes? -- Michal Hocko SUSE Labs