From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6DD49C4332F for ; Tue, 31 Oct 2023 09:53:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D44916B02C3; Tue, 31 Oct 2023 05:53:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF4296B02C4; Tue, 31 Oct 2023 05:53:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBBA16B02C6; Tue, 31 Oct 2023 05:53:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id A9B146B02C3 for ; Tue, 31 Oct 2023 05:53:46 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 75F26160570 for ; Tue, 31 Oct 2023 09:53:46 +0000 (UTC) X-FDA: 81405294852.25.60BB1AA Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf18.hostedemail.com (Postfix) with ESMTP id 715441C0005 for ; Tue, 31 Oct 2023 09:53:44 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=jXbhZ5At; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698746024; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hldBQeVlnzcxJxh6wOmSdWjQZeq5v7flC9grgdCposU=; b=jKlba+dWbTK7pci3UIITUMIuiN0E6JJGBWgOfKXBPvlOF08B1BVVS3JcFW6NhQYdI3bQzd 2CRP9zDDbuXeoFqlPVnlrHTQ0qu6VDf5BMwSly5o1lUp+lK45OCYSLwqSw0EepC22gdNPV DICElPAlMI5aWS1SMYwwL+Z5zzoeCCA= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=jXbhZ5At; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698746024; a=rsa-sha256; cv=none; b=ztLyJiiomG2rVDpxtL2Vu5FKRQCgMFa6sEiSVi96WiittVbstq+ns00kTVUV8gbtUfZy54 alV2/lUxcvR2mlH0GCzCnG3a1cRTwWo2h1YiJQX6mtz6Yj9RSv4tC7looHUlq4xnCfSBSW pn3yFAF7tkpD60saHj6UVd2WJ4S0zCA= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 3FF341F38A; Tue, 31 Oct 2023 09:53:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1698746022; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=hldBQeVlnzcxJxh6wOmSdWjQZeq5v7flC9grgdCposU=; b=jXbhZ5At6n6gg1nSVJhvzsaRTbKUfxTAg6z30byugTH7CgJY4vtLEdIvBPpGF/tP27lR9z MoblyO7DrwG11ZV3MX0jQwLEgI05yeFeZQRqga0Vue3SJ5OdlFW/+k42xv1jD+MAM8U1WR eLAY64j/+JGgspqmSLOwvHa88ruOnpc= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 17C2F1391B; Tue, 31 Oct 2023 09:53:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id GhjLAqbOQGWDcQAAMHmgww (envelope-from ); Tue, 31 Oct 2023 09:53:42 +0000 Date: Tue, 31 Oct 2023 10:53:41 +0100 From: Michal Hocko To: Gregory Price Cc: linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, linux-mm@kvack.org, ying.huang@intel.com, akpm@linux-foundation.org, aneesh.kumar@linux.ibm.com, weixugc@google.com, apopple@nvidia.com, hannes@cmpxchg.org, tim.c.chen@intel.com, dave.hansen@intel.com, shy828301@gmail.com, gregkh@linuxfoundation.org, rafael@kernel.org, Gregory Price Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave Message-ID: References: <20231031003810.4532-1-gregory.price@memverge.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231031003810.4532-1-gregory.price@memverge.com> X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 715441C0005 X-Stat-Signature: rfs1ds3pi4ashyke47aim3wjnh6661x5 X-HE-Tag: 1698746024-801629 X-HE-Meta: U2FsdGVkX1+QvAcUeE2m593TlsmZoPDe7K1qHMT9/4IMW6Bh0mlr6auLlowXFRgAHAqjqKCRLhQPsQWJBYAUfwwieqTCTe4fg0qF3sBmwq/yJJ0M049PEIM/QemYfyPAsojKfaEm2tzN1T3hdN9xAYP5zMPVYxOSKXLQ6nZYrhJdxEkQwCLYlDbyjx+MFtwBAzEE7dA+hp48dEGJOBhstnwbzMmgEU0y2//kCXJXlItYmAEs2pfHDUxMaX82m5v19u9BURCclRAaeKuIoyT14FwBiWaLbzkpOKr4eqqpwSr76cm2poTyIbgBwvdeTIhEpkN8j1Prg7EbQDXsZ/XtxcadLUaAv7o7rqDfEZFJio18ukCzg93ZfNNGXK3MHpWPwJm1p5/JAoFj9gG6K4rAIpVxJqTUSbwIsC6gNDVqyeGRBfCmZDZpIFBS8ZzH0vdOO3KXaT2QVwG+Yodm0HJ3e/iWchJzXZTwh1+pS8fdidaBP793CD0OiHeebRqnTcrZdkPZs5Lc1JCNHOy5EzvgRRXV88encssHVE/di7hTsKfKjyEWJEACR7FPssbjyQ0tegdP5jA1C7O0siwL3k+28gKqilnk8gjN6fQTGOEjpJctrWXW3uFPjo3YlFa76UctExQk5rK+GgFG3K9noOpRd56mjhhHIpeJMJEFaH49JVs3AnX004N5imKunlpGV5UzqdpnUUZ1RLsOzTONBbtNRBswMZJESgIa4wjOKWeN6Ngjqi9bghmhQlrFbRWS7kB1MftD6lRq628s1v4uj9gb+Om8dfbQV9MpD1z5o//L/GZEj+BoQ0qWktGVzhIRg29Km0OFwusdAOhOk/jY35TDJ89A29HLLxHZFrSVdT7tzUmBeUnp2mPbL7m2LAP1qhc5QLC5ZL4W9Ui4/vqGE9NRj1ZEXZ9qJVvvNMTSRtWUd46lyMYbzMItnIgRG/G/9kTNcrFjwuREDpHiDWh+O9/ GTa4uIzk i73m59gpuKrMbbCFshOkfx2xbWhSUvxiIihERR2WhgLtuuP8aqVRU37EpM+YaD4PdSAbKZprijYeLvHAptP+0GBxJWFNI106d1vLuFwfyp/f6t+GPkwL6/pUPuGP2+AkcJVr69AbPIebpIGIabiDVxQzNLeXAg786ALNcsvxPndsb+B0t7V70KKccsNAmWT12yBHo3ju+F/UHCeBHXPY5rbQTW3NVh9lQvc4p3TyRBNbDOIqcibh3VrAohZrhS/BfUQo0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon 30-10-23 20:38:06, Gregory Price wrote: > This patchset implements weighted interleave and adds a new sysfs > entry: /sys/devices/system/node/nodeN/accessM/il_weight. > > The il_weight of a node is used by mempolicy to implement weighted > interleave when `numactl --interleave=...` is invoked. By default > il_weight for a node is always 1, which preserves the default round > robin interleave behavior. > > Interleave weights may be set from 0-100, and denote the number of > pages that should be allocated from the node when interleaving > occurs. > > For example, if a node's interleave weight is set to 5, 5 pages > will be allocated from that node before the next node is scheduled > for allocations. I find this semantic rather weird TBH. First of all why do you think it makes sense to have those weights global for all users? What if different applications have different view on how to spred their interleaved memory? I do get that you might have a different tiers with largerly different runtime characteristics but why would you want to interleave them into a single mapping and have hard to predict runtime behavior? [...] > In this way it becomes possible to set an interleaving strategy > that fits the available bandwidth for the devices available on > the system. An example system: > > Node 0 - CPU+DRAM, 400GB/s BW (200 cross socket) > Node 1 - CPU+DRAM, 400GB/s BW (200 cross socket) > Node 2 - CXL Memory. 64GB/s BW, on Node 0 root complex > Node 3 - CXL Memory. 64GB/s BW, on Node 1 root complex > > In this setup, the effective weights for nodes 0-3 for a task > running on Node 0 may be [60, 20, 10, 10]. > > This spreads memory out across devices which all have different > latency and bandwidth attributes at a way that can maximize the > available resources. OK, so why is this any better than not using any memory policy rely on demotion to push out cold memory down the tier hierarchy? What is the actual real life usecase and what kind of benefits you can present? -- Michal Hocko SUSE Labs