From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DEA6EC07E9D for ; Tue, 27 Sep 2022 13:58:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E474E6B011D; Tue, 27 Sep 2022 09:58:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF67E6B011E; Tue, 27 Sep 2022 09:58:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBE528E00C1; Tue, 27 Sep 2022 09:58:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B920B6B011D for ; Tue, 27 Sep 2022 09:58:56 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 8318A160E69 for ; Tue, 27 Sep 2022 13:58:56 +0000 (UTC) X-FDA: 79958021472.26.3C1AAE7 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf01.hostedemail.com (Postfix) with ESMTP id 0B03140007 for ; Tue, 27 Sep 2022 13:58:54 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 8B5361FCF5; Tue, 27 Sep 2022 13:58:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1664287133; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=px0sFDh89qomZMV5WztXjGmCBWJe2Ndh6D4h5jyTjxQ=; b=PFEU6f0BOxVvjmuPPfAHpthYreZsZAoyIuCck5vkKfdQ/eAKv+/8BQfGW+IIYWWtjUqDhD s/B+bZMvsPg2fuT99+VvTbHrqbTQLTImERjcW1HTVO8gtIbk8w2gwy9xhAB2UsrXICP4a4 Z7HqJd3fViX4h7UMTU0+oNVUO4F39sk= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 6A8DC139BE; Tue, 27 Sep 2022 13:58:53 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id LG+5F50BM2P2MwAAMHmgww (envelope-from ); Tue, 27 Sep 2022 13:58:53 +0000 Date: Tue, 27 Sep 2022 15:58:52 +0200 From: Michal Hocko To: Abel Wu Cc: Zhongkun He , corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org Subject: Re: [External] Re: [RFC] proc: Add a new isolated /proc/pid/mempolicy type. Message-ID: References: <20220926091033.340-1-hezhongkun.hzk@bytedance.com> <24b20953-eca9-eef7-8e60-301080a17d2d@bytedance.com> <7ac9abce-4458-982b-6c04-f9569a78c0da@bytedance.com> <9a0130ce-6528-6652-5a8e-3612c5de2d96@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <9a0130ce-6528-6652-5a8e-3612c5de2d96@bytedance.com> ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664287135; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=px0sFDh89qomZMV5WztXjGmCBWJe2Ndh6D4h5jyTjxQ=; b=ah6CSYpwcvJ7WT1OZV0JVH9DVnQKJqyplW5OKhlndKCV5uFkJ+H1suj5TBfjgiRYHniAA0 DRWZwhy9ZNwsAMNiF40Tej90cAehg7EDnAb9NytbBfaZK9dMICiM5YVgaZeudwGix+HJyW G5bsOrwLat5bC4Gh4ZFwXyTGgcbIWSM= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=PFEU6f0B; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf01.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664287135; a=rsa-sha256; cv=none; b=2SobhTeGlxpeNaIwTbfotpQCXG+cerKcKpEYX8ysw5yOKTWEjz+N9ggbcMTkI/Vv7DMWMh rk3CenSyGKkNyKdZxav/UH/5+5tm3L0pXQSuouVVkY7jtB3VBbU4tTYim46UweYNDMHVw3 7gdONu9/bajgFsHpytPxLH1mMSV9Hb0= X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=PFEU6f0B; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf01.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Stat-Signature: 8oqg7bc33wn1wogyw5jybqkpr8qax5gh X-Rspamd-Queue-Id: 0B03140007 X-Rspamd-Server: rspam02 X-HE-Tag: 1664287134-52056 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 27-09-22 21:07:02, Abel Wu wrote: > On 9/27/22 6:49 PM, Michal Hocko wrote: > > On Tue 27-09-22 11:20:54, Abel Wu wrote: > > [...] > > > > > Btw.in order to add per-thread-group mempolicy, is it possible to add > > > > > mempolicy in mm_struct? > > > > > > > > I dunno. This would make the mempolicy interface even more confusing. > > > > Per mm behavior makes a lot of sense but we already do have per-thread > > > > semantic so I would stick to it rather than introducing a new semantic. > > > > > > > > Why is this really important? > > > > > > We want soft control on memory footprint of background jobs by applying > > > NUMA preferences when necessary, so the impact on different NUMA nodes > > > can be managed to some extent. These NUMA preferences are given by the > > > control panel, and it might not be suitable to overwrite the tasks with > > > specific memory policies already (or vice versa). > > > > Maybe the answer is somehow implicit but I do not really see any > > argument for the per thread-group semantic here. In other words why a > > new interface has to cover more than the local [sg]et_mempolicy? > > I can see convenience as one potential argument. Also if there is a > > requirement to change the policy in atomic way then this would require a > > single syscall. > > Convenience is not our major concern. A well-tuned workload can have > specific memory policies for different tasks/vmas in one process, and > this can be achieved by set_mempolicy()/mbind() respectively. While > other workloads are not, they don't care where the memory residents, > so the impact they brought on the co-located workloads might vary in > different NUMA nodes. > > The control panel, which has a full knowledge of workload profiling, > may want to interfere the behavior of the non-mempolicied processes > by giving them NUMA preferences, to better serve the co-located jobs. > > So in this scenario, a process's memory policy can be assigned by two > objects dynamically: > > a) the process itself, through set_mempolicy()/mbind() > b) the control panel, but API is not available right now > > Considering the two policies should not fight each other, it sounds > reasonable to introduce a new syscall to assign memory policy to a > process through struct mm_struct. So you want to allow restoring the original local policy if the external one is disabled? Anyway, pidfd_$FOO behavior should be semantically very similar to the original $FOO. Moving from per-task to per-mm is a major shift in the semantic. I can imagine to have a dedicated flag for the syscall to enfore the policy to the full thread group. But having a different semantic is both tricky and also constrained because per-thread binding is then impossible. -- Michal Hocko SUSE Labs