From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 372D6C4332F for ; Fri, 10 Nov 2023 06:18:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9537E4401C2; Fri, 10 Nov 2023 01:18:16 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8DB814401C1; Fri, 10 Nov 2023 01:18:16 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 77C514401C2; Fri, 10 Nov 2023 01:18:16 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 627AF4401C1 for ; Fri, 10 Nov 2023 01:18:16 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 3A3C0A0263 for ; Fri, 10 Nov 2023 06:18:16 +0000 (UTC) X-FDA: 81441039792.13.56DE976 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.115]) by imf06.hostedemail.com (Postfix) with ESMTP id A93FA18001D for ; Fri, 10 Nov 2023 06:18:12 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UTiNPZqJ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf06.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1699597094; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=MSj+SzRyU2Z5pcDf/9dmR6AJ/4ZVBwdr9H/UyJGVa7I=; b=Z5uqXZ3Aue+UDrTY4C4aJnc3kvewzZqmf3InRyYC1RidS590jtJaIuW3f2Ar2oRIQ/cP5L bdTwLUO1KNyaZXV9lV8pF520D1IUgdAUYsi8ZePf4GbCryE3SIlWsJxzBDMvKKctoHXQY7 SUHKN2t6bzqjoWe15EvfdQCNkpAK10Y= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=UTiNPZqJ; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf06.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1699597094; a=rsa-sha256; cv=none; b=Y5oL0/tyDmO+S/Q7+Qza7cUTz18F2XLdbgPmhQMyC48bekSADgQ7ZYbArwKmvjKzvsaBAq DUYPI9/RCwblpRkN9uOJiZRSXjN3mRWF1OxiQQDbTDUZqtXqRzcsMgmojZ9hT7U3aW3a4i HXX0renh1LOCtjoZjmNKRbH9EVtKHF4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1699597092; x=1731133092; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=d3E/urWJ+WAtgq1ZPh+D0M58tD9eQzNWMlKG+spWjAQ=; b=UTiNPZqJEEzc7A5eNYQ6+4UFPcNp2HeSsSCr28CnUFwI1/khAY6qbO9J Sdl//gyeK7XZriS1yq+CTwhitFVZ4MJl8/++4Ip54VJfKiZsh3MxbcIsl feJ2VFNC589qSA0Tz/zUqerbSO4XTsJJ12pVd5JWI5KrfepKMkgAIdMtA lSl5Pw/w3wfG94g4q+vDphZnfRSj8t1m5micjW9+T6YKCfk9ERhaxs99u mgmK0YOTRSj4I0eUXL2uwsAYRRlFcgIXeVK+0v5bQsU0OjTulpHp0FOoQ sI5JlyeG9/zIZhWJBqPupHlp+IoTLtlKqXSfsr3qzmqaCMwJhOePBwNZO A==; X-IronPort-AV: E=McAfee;i="6600,9927,10889"; a="389946762" X-IronPort-AV: E=Sophos;i="6.03,291,1694761200"; d="scan'208";a="389946762" Received: from orsmga001.jf.intel.com ([10.7.209.18]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 22:18:11 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10889"; a="798534270" X-IronPort-AV: E=Sophos;i="6.03,291,1694761200"; d="scan'208";a="798534270" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga001-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Nov 2023 22:18:06 -0800 From: "Huang, Ying" To: Gregory Price Cc: linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-mm@kvack.org, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, akpm@linux-foundation.org, mhocko@kernel.org, tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, corbet@lwn.net, roman.gushchin@linux.dev, shakeelb@google.com, muchun.song@linux.dev, Gregory Price Subject: Re: [RFC PATCH v4 0/3] memcg weighted interleave mempolicy control In-Reply-To: <20231109002517.106829-1-gregory.price@memverge.com> (Gregory Price's message of "Wed, 8 Nov 2023 19:25:14 -0500") References: <20231109002517.106829-1-gregory.price@memverge.com> Date: Fri, 10 Nov 2023 14:16:05 +0800 Message-ID: <87zfzmf80q.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: A93FA18001D X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: znu64a7nm3bb4s7akyd7pudmh9n17yxq X-HE-Tag: 1699597092-676843 X-HE-Meta: U2FsdGVkX1+/BGsrEixpV+FQyyMZpol8+3h3OQwirdvBtz3yMNdvSqgiUANcjl6mrQ06yIZ8Sg7+7Qma9XEp/rcrtHF/5hC8x0vPDKzx/n0p0exBimIrz7qPAeFctJgKZJGf0MchVj0qplQDo0y038ZoIF6E89wJsipGiHMA72pSoKl5oeu9NseIIlwHKklw9Hx53iQGRI/oqKiucgAS8xirbOE3bRtw4FaIRD9wa8b/F35xG+y8SrwzSgt3P+G8Iqh6JuO6qfsGvUgi19gQyi20ewhsWUo56DOtK/5vD+GXl/atrasndZB/pHGumkJgk+mmPnRgCK3am1kQhsNkqIELDvh6vxjPGXRJ0H2BImGqMF2wUy6yaMz4y0hTbbJ0m5rYWr0eRp0Yir5wKd2Cz3McqyueN0f1ewWDMs5hmr80bUanN5E8h89kivzqLqWXi5qo0sJ7e6fiyS5gsv5rCCQncc/SUb+Wn66v8bmvqb6VVoLZItLKkTJAgmb9xgd4QNIvUSh/sGKoQWlTuaoxWTkJci9JSAeVr7Vr8CTakyIFsXIaHDMV0kf0/GNDt7UJsf5DxhR6Et9THVwyKtvij7zCqw1iNQJeFXexrflW0VE/5CsbN3FGUItuQYci7U0U728ckuk5hq5LEDL0sinpsIesQtnqwjWCNaufU2erBi12OQSXWxJQd2nuJ1zKevEPn+NDGh4/5v5JG/CupqTbsYGPCftLHVyquNJOeOv+YP7kBjM+4g8PbkEbhyKg67SRFvfjTWzviJLbu7DZggv2uYOm9nIrgq+4xb0LCnjBZCH4/oUuAxTSTnblVKY1meiGHZGA+/Kz7G4YznKx5/LonrkhaPcvewucHnkA0pYrtlh8e3Td8UX2u5ujKVJXq1eNlCtOyC812hInfCdYK0YZ90Aj21gzWoNkEpOPgRMfgMC9cNa80dpag0U5yj+y7hNj3x5KYQ7cRUeZdAJC5hU OEbqQfh5 neYrQf281TO4IJo27aHHQHM+qMA+eCPuIGwGDj0zMPalLx/1x0OL7XlKxV9BCMoCNk8FIHejlGSUKZ8BJbSxlYxP+CBtnD4XmVULc1eAnZdFgMp3c9EZTQYkKzKe8QYgSox/1GnUhslZzRk1lyDJ7hE7Y0OddLhffIKLoPYT8c14251RTWYgTXP9J5jsqqW+qvb/dPBKUKiPT2FeUT5/Azzq01JoYRAsfmryjejp6whoaAYJP2xHBLnWD8g== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Gregory Price writes: > This patchset implements weighted interleave and adds a new cgroup > sysfs entry: cgroup/memory.interleave_weights (excluded from root). > > The il_weight of a node is used by mempolicy to implement weighted > interleave when `numactl --interleave=...` is invoked. By default > il_weight for a node is always 1, which preserves the default round > robin interleave behavior. IIUC, this makes it almost impossible to set the default weight of a node from the node memory bandwidth information. This will make the life of users a little harder. If so, how about use a new memory policy mode, for example MPOL_WEIGHTED_INTERLEAVE, etc. > Interleave weights denote the number of pages that should be > allocated from the node when interleaving occurs and have a range > of 1-255. The weight of a node can never be 0, and instead the > preferred way to prevent allocation is to remove the node from the > cpuset or mempolicy altogether. > > For example, if a node's interleave weight is set to 5, 5 pages > will be allocated from that node before the next node is scheduled > for allocations. > > # Set node weight for node 0 to 5 > echo 0:5 > /sys/fs/cgroup/user.slice/memory.interleave_weights > > # Set node weight for node 1 to 3 > echo 1:3 > /sys/fs/cgroup/user.slice/memory.interleave_weights > > # View the currently set weights > cat /sys/fs/cgroup/user.slice/memory.interleave_weights > 0:5,1:3 > > Weights will only be displayed for possible nodes. > > With this it becomes possible to set an interleaving strategy > that fits the available bandwidth for the devices available on > the system. An example system: > > Node 0 - CPU+DRAM, 400GB/s BW (200 cross socket) > Node 1 - CXL Memory. 64GB/s BW, on Node 0 root complex > > In this setup, the effective weights for a node set of [0,1] > may be may be [86, 14] (86% of memory on Node 0, 14% on node 1) > or some smaller fraction thereof to encourge quicker rounds > for better overall distribution. > > This spreads memory out across devices which all have different > latency and bandwidth attributes in a way that can maximize the > available resources. > -- Best Regards, Huang, Ying