From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EBFFC54EE9 for ; Wed, 28 Sep 2022 03:09:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5D0508E011A; Tue, 27 Sep 2022 23:09:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 57F418E00C1; Tue, 27 Sep 2022 23:09:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 448BF8E011A; Tue, 27 Sep 2022 23:09:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 336E48E00C1 for ; Tue, 27 Sep 2022 23:09:55 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id F1D3A1A034F for ; Wed, 28 Sep 2022 03:09:54 +0000 (UTC) X-FDA: 79960014708.26.9339557 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf18.hostedemail.com (Postfix) with ESMTP id 914F11C000B for ; Wed, 28 Sep 2022 03:09:53 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id jm5so10671037plb.13 for ; Tue, 27 Sep 2022 20:09:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date; bh=zE7FRbXVPXMEH0b3SCAktRXR5ucy9Cwb3p8sNyPms6k=; b=7Gk9byhcd7iaiCUxW9fPwqLZ1MFtkv5Clj4uvCPSsGM1y7EO3tQ93QukOy/hGkhfdm AgOrIMM2SjaloUB63geELSqt32Fn6CqeXF8Y3LKmzq4dRYidmX+Q/v5DyJkCykX8Kepd XQWR8UO3Cb/E1TUXfavDlTFRpCjjMXXqpBBBPDLAKHbXPWmweIaot6zqT0zTfA4UCgbH 72/lXNOoaDhFmTLJqvmv4wL1nsD0rEW8cJCvRXTEgksSXk+S88KUM/AkCEVGYnRI8ss0 T1AFP5cJ5pPcs3PKOu4Rv6JtvSgvyIq2UIiha6M3NczrbtCi42FiC+0X7Yzf5B5sXkJx yTfA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date; bh=zE7FRbXVPXMEH0b3SCAktRXR5ucy9Cwb3p8sNyPms6k=; b=3x+A+SOUSFKte3F+mnWjB4GIF3YatQ9+uqxstRlVhXTZjo7aHI9VY2Etby5H57lggO Gc/tlxL0VUThhRjmx6UdfPMp0ZB2AG/HOnafoyuk0frrrpWshl3cff9pdXEsEQ2pxijI agerbjVUA4Vpq9SCvj7rWlCcHHRqQDkqrphEMHGOZp8U2TOpaIhnxLcSaVO40pbstCKX nGQEjT+4OGroAetrhZQZYPsdHq9l1Gsnxk1LvVtbOIxU03cVCsvGqokicvrlnKBAUrjY ALlkCLF2vWyCe8lWy8P/shou5GjCiSQB0jbzNyxgPSgshDhb2TFHlzTWCs0lPx9C4QOK fJow== X-Gm-Message-State: ACrzQf3n4nafZFNNdoA6w9lE81vN0aDlq+l2MFVoS4wD3WsWkc5raxne pzAjRnkSMCmGbdkFOOjc9vbiew== X-Google-Smtp-Source: AMsMyM59Ac5doia8SG1Tk6ELgAAt60AKGObQlhx6QKt3jqPT3bnUgFIyecNFAzz02Y9xAJ06qbut/A== X-Received: by 2002:a17:902:d50f:b0:178:6505:fae3 with SMTP id b15-20020a170902d50f00b001786505fae3mr30442831plg.54.1664334592382; Tue, 27 Sep 2022 20:09:52 -0700 (PDT) Received: from [10.255.19.83] ([139.177.225.240]) by smtp.gmail.com with ESMTPSA id lt4-20020a17090b354400b00205e940614dsm274076pjb.34.2022.09.27.20.09.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 27 Sep 2022 20:09:51 -0700 (PDT) Message-ID: <4e2aa5c2-3d8c-2a2f-691b-218e23e7271f@bytedance.com> Date: Wed, 28 Sep 2022 11:09:47 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.3.0 Subject: Re: [RFC] proc: Add a new isolated /proc/pid/mempolicy type. Content-Language: en-US To: Michal Hocko Cc: Zhongkun He , corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org References: <20220926091033.340-1-hezhongkun.hzk@bytedance.com> <24b20953-eca9-eef7-8e60-301080a17d2d@bytedance.com> <7ac9abce-4458-982b-6c04-f9569a78c0da@bytedance.com> <9a0130ce-6528-6652-5a8e-3612c5de2d96@bytedance.com> From: Abel Wu In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664334594; a=rsa-sha256; cv=none; b=5V1wbT/2bGNx9PVFAGw4vE3WNic8QMRtX1z+B4eVO7XhOt+8maXujZ4Tr2E58oP0vMEYQ2 ivy+FNTUkJeOeCQnnwOnmTzd46fZf1TfScozwqjd6BedAaYBP9HPFlDihmsX4CfTknF6FL 0tjPkhQTmlDCdCW5d9/o1TGF3wBJsog= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=7Gk9byhc; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf18.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664334594; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zE7FRbXVPXMEH0b3SCAktRXR5ucy9Cwb3p8sNyPms6k=; b=BD1KFouloBJXiPjveHeBkmbr8Qe7WfSyYtTZFGL5z/B19HYsJTZNvlxifOdRYHAlezLOpi cwQ/KGzRNFCKEGHWy5QMZ6JgPqUzDM3qvvKxH9B7bPHKLscA5393F0lurKUR0v+tRPLj4n TTv/ILM6/89e4POQpx/CRQivW5/ahcQ= X-Stat-Signature: 319pne5in7aatwi7ewts7umbjc4mh47w X-Rspamd-Queue-Id: 914F11C000B X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=bytedance-com.20210112.gappssmtp.com header.s=20210112 header.b=7Gk9byhc; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf18.hostedemail.com: domain of wuyun.abel@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=wuyun.abel@bytedance.com X-Rspamd-Server: rspam11 X-HE-Tag: 1664334593-990971 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 9/27/22 9:58 PM, Michal Hocko wrote: > On Tue 27-09-22 21:07:02, Abel Wu wrote: >> On 9/27/22 6:49 PM, Michal Hocko wrote: >>> On Tue 27-09-22 11:20:54, Abel Wu wrote: >>> [...] >>>>>> Btw.in order to add per-thread-group mempolicy, is it possible to add >>>>>> mempolicy in mm_struct? >>>>> >>>>> I dunno. This would make the mempolicy interface even more confusing. >>>>> Per mm behavior makes a lot of sense but we already do have per-thread >>>>> semantic so I would stick to it rather than introducing a new semantic. >>>>> >>>>> Why is this really important? >>>> >>>> We want soft control on memory footprint of background jobs by applying >>>> NUMA preferences when necessary, so the impact on different NUMA nodes >>>> can be managed to some extent. These NUMA preferences are given by the >>>> control panel, and it might not be suitable to overwrite the tasks with >>>> specific memory policies already (or vice versa). >>> >>> Maybe the answer is somehow implicit but I do not really see any >>> argument for the per thread-group semantic here. In other words why a >>> new interface has to cover more than the local [sg]et_mempolicy? >>> I can see convenience as one potential argument. Also if there is a >>> requirement to change the policy in atomic way then this would require a >>> single syscall. >> >> Convenience is not our major concern. A well-tuned workload can have >> specific memory policies for different tasks/vmas in one process, and >> this can be achieved by set_mempolicy()/mbind() respectively. While >> other workloads are not, they don't care where the memory residents, >> so the impact they brought on the co-located workloads might vary in >> different NUMA nodes. >> >> The control panel, which has a full knowledge of workload profiling, >> may want to interfere the behavior of the non-mempolicied processes >> by giving them NUMA preferences, to better serve the co-located jobs. >> >> So in this scenario, a process's memory policy can be assigned by two >> objects dynamically: >> >> a) the process itself, through set_mempolicy()/mbind() >> b) the control panel, but API is not available right now >> >> Considering the two policies should not fight each other, it sounds >> reasonable to introduce a new syscall to assign memory policy to a >> process through struct mm_struct. > > So you want to allow restoring the original local policy if the external > one is disabled? Pretty much, but the internal policies are expected to have precedence over the external ones, since they are set for some reason to meet their specific requirements. The external ones are used only when there is no internal policy active. > > Anyway, pidfd_$FOO behavior should be semantically very similar to the > original $FOO. Moving from per-task to per-mm is a major shift in the > semantic. I can imagine to have a dedicated flag for the syscall to > enforce the policy to the full thread group. But having a different > semantic is both tricky and also constrained because per-thread binding > is then impossible. Agreed. What about a syscall only apply to per-mm? There are precedents like process_madvice(2).