From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE115C48BC3 for ; Tue, 20 Feb 2024 20:25:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 427C46B0072; Tue, 20 Feb 2024 15:25:37 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3D95E6B0074; Tue, 20 Feb 2024 15:25:37 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 29F886B0075; Tue, 20 Feb 2024 15:25:37 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 192406B0072 for ; Tue, 20 Feb 2024 15:25:37 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id B80DAA07CC for ; Tue, 20 Feb 2024 20:25:36 +0000 (UTC) X-FDA: 81813312672.09.1BE3F71 Received: from mail-pl1-f194.google.com (mail-pl1-f194.google.com [209.85.214.194]) by imf28.hostedemail.com (Postfix) with ESMTP id 2300EC0009 for ; Tue, 20 Feb 2024 20:25:34 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hR+gxWhD; spf=pass (imf28.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.214.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1708460735; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=5xdhZJ9DhrvAcXGwq0nzlagkVpHv12NVQ7HVU66ltWM=; b=C6192vg6ogyN29xfn4AL0FVRMW5/fXYA6uRy230gp+BX9Fyifd7+DiO+KRxl/30zBSyA2a WkD42iXWaqp+stdz96AHwGzoyGwNF78t1mVoDF1A6w63tgJlH9uNoKXreRFhiVAdyFXKzs BV5MnOi82lU5emKhVKE5XF0RG7taLRk= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1708460735; a=rsa-sha256; cv=none; b=jBUgaSq0+hUusA4NID/kCJa/jmxxjRXBp/HmJURmXmkBwRtxEzONenKFs+DsrEnVxY5miV ZpD+/9j9pED+g1XyhFSu3LC7swErvSSeP1/7AFyCiV1zowkB8S5nk6HIEB6Hfe3PxImS2F EX9Bxa/vRmoragTtNnzFqNiXJdc0w3Y= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=hR+gxWhD; spf=pass (imf28.hostedemail.com: domain of gourry.memverge@gmail.com designates 209.85.214.194 as permitted sender) smtp.mailfrom=gourry.memverge@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pl1-f194.google.com with SMTP id d9443c01a7336-1d746ce7d13so40141365ad.0 for ; Tue, 20 Feb 2024 12:25:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1708460733; x=1709065533; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=5xdhZJ9DhrvAcXGwq0nzlagkVpHv12NVQ7HVU66ltWM=; b=hR+gxWhDudMmsUE45IqPDS7aDNtHTKA/ckwUYIDjsZbay/ZIa/xd0gUJr9CNUzUkjL p5o1hn2XOQLnjmzN9E0sV+jccImDUMQOG0QkJ73Yl7RgqtxdJMtObEBnfSxVDP9VWEdY VXfb1tXf/npNVJroCHsnmTwV5UMymH/+KaLtWbJRjwfV4tlo7pVbNMwYErt7awUHnz2L zj/HHk/Z9j4mGz2cNaVO5lR7cBiIgP780nojcTsT5MgNgs3i3A/8AVrQujUaJ/1YPQza hvhkOnm9mi80fsf4eVQyo6AIcSIA8wVkVYhXG63V67Lc5dEhOPpyoNKTtoX0RF3xMa9Q X0ww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708460733; x=1709065533; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5xdhZJ9DhrvAcXGwq0nzlagkVpHv12NVQ7HVU66ltWM=; b=BcjJuMWEXDPLm/Yipd/j86T96MhyBVlrzzuGuM3g3e7Qu1CMuiB/8ZgIx/6Cu/WX6A Wp/KaVaLLmryT02lBu9pMoPv40mLpI2iEnJ0/V9u5kbQ2KySF5ryB92emq2iBVuw9VO7 Xxdlzal1+nCKtluCWZ3e2Rs4E2klFTnUbWPbVt9eMfim+gYHk9lye6FOct0Izlh3c1cG my/jMqcQPT98f+oKUMSRybRwObeUsiB77SRo2zBTEesl4lzoAmM2LzGrQeO+gukpBIad P6YY/niVO/C/ga1x9hbW2Ax37+Xp7j2ihDULSq3mWBK6FlVPoJTBxqM7Hp5oqyxWr6Im x50g== X-Gm-Message-State: AOJu0YzYPKfeI04l0xQ6iwE0/9q4IYV9EgKbByfAja54+/ex3VD5tiSb sFGKilJBl61UGVC+FwWi5K5VD4D9C8cI6mh8maQ6wtPB2CUh85a3RJ1RTEoBWxzu X-Google-Smtp-Source: AGHT+IG5oDu0VthkmfK5qe+5KaMyp+DKjw7HA8sYTjEb/1xCwf3PSGKEt9Uo5htIB5nWDb1IErXfdg== X-Received: by 2002:a17:902:ec89:b0:1db:aa46:4923 with SMTP id x9-20020a170902ec8900b001dbaa464923mr15547816plg.40.1708460733597; Tue, 20 Feb 2024 12:25:33 -0800 (PST) Received: from fedora.mshome.net (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id je13-20020a170903264d00b001db40c0ed33sm6696678plb.61.2024.02.20.12.25.32 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 20 Feb 2024 12:25:33 -0800 (PST) From: Gregory Price X-Google-Original-From: Gregory Price To: linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, ying.huang@intel.com, hannes@cmpxchg.org, dan.j.williams@intel.com, dave.jiang@intel.com, Gregory Price Subject: [RCF 0/1] mm/mempolicy: weighted interleave system default weights Date: Tue, 20 Feb 2024 15:25:28 -0500 Message-Id: <20240220202529.2365-1-gregory.price@memverge.com> X-Mailer: git-send-email 2.39.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 2300EC0009 X-Rspam-User: X-Stat-Signature: ds6bktohn8o7rees5rd3cqjofgzz4p8z X-Rspamd-Server: rspam03 X-HE-Tag: 1708460734-533359 X-HE-Meta: U2FsdGVkX1/xYa1Y07HpNpUum273XQtqqzlsH29lsQOj515/gAFtkMisxwAsgOjrZld664L/RfMtHCvJrQGN7CZ2UOwhqwEqrCyqW1yfJ5GwnG+ADT+3r90MgJJA9Vs3GVuRYXsUBFRRZ/r6EYW/gME5SOiD3uT49pQ23ZkstkSv2TVnx1Vx80g1nMNY56gChrtkhd3P7n9Ni0T7b0L49Ni7U4kAiey2iUxY7OilvkJTEiGBcIdm2zaAjwxc6XRwqey/lUsSYBaqPHRiybzuTYrNCydt4CF04zBhJxDsYN5kD3bw7ap+rhss4ubWl0N7WEypl7OEzJ0fb/GbDOwp5KxtwS3dVXUEE4VuHncZ4Cb/VrK+w7iEo9MwonKbZDaoYGUtbs61cjIGbdsxS3Uy8JbxPeQEYdOakFGT5lhN/+otxGAqp3w28DAtVtez/u/2XI55+3Shy0y3/kXxEm9B816l/7OAUjnEW+f9hpjS4b8Ig3uMs6ygfXlYHXXaDKNzbnFC+/PecUxfhLweEL8FRQctLG6POwENeziefCaZGvteuWMNXM/0qCFHtkovXmZ7e2eoCKzcr3eNSMZfMGbyGebXh84WjMIzWMgwXqiz+ThwpjesMz5WwpSjzGdCNvfGYQfK2x1uAPtYbgoJ/nOAtx6ISQmyBv2u5l7OQIhH5t9aEvHTu+mg+DcptMNPDKJsXBl7LdVok6VyXixkw7WL06QzKvSwvQHstmnzsdTn6T31eNXKD/Ax0mZsJPUesncs1UYDmkTDBf0/JVKdQ7rxRUUQa/LLFWr0fXvW5O0VEUCvl3f9LqoVN45ZFPXRUAccunwQ+zXsFkqLWNVsdJdUhn9MQVals9aNVcj2ne9+nOku75uY1taUzt4SeN2XxcdMRFfZhdgGSogFqSZPfc9TIFnVBLLN4wBtdDh51omXV/gtdA6eZNkecCJIgi9jf/HV1YDExikJG9vb4e6qrMk e2kkqI17 xDk4Vve1j/ZuxGcuhJ8WjZcdeQC/E28X6R+DyUs2A7WoPEbv9wOKedWQhbm7gMFKjKE9pLML1yextLQMq5HZ7rSx44ASUpJ8FMwEsDSEmfGjOCwAxEj+wXNMVPmuBzKlH+6kScx0q6YWb2coZEaKDQZ8yNqgAUIN5yxjN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Weighted interleave added a sysfs interface for users to change the interleave weights based on user input - with a default value of `1` until reasonable system default code could be agreed upon. This RFC series will suggest and solicit ideas for how to generate these system defaults, and lay out some challenges in generating them. Future work on the CXL driver (drivers/cxl) will introduce additional code which registers HMAT information for hotplug memory provided by CXL devices. This RFC does not presently provide that integration, but will after it is upstream. Interfaces introduced: - mempolicy_set_node_perf Called when HMAT data for a node is reported to the system Integration points: - node_set_perf_attrs - for reporting bandwidth info to mempolicy - get_il_weight and weighted interleave allocation interfaces to provide system defaults when applying weighted interleave. New data in mempolicy: - node_bw_table - cached bandwidth information about each node - default_iw_table - the system default interleave weights Note that because there are now multiple tables (default and sysfs), the allocators fetch each weight individually, rather than via memcpy. This means if weights change at runtime (extremely unlikely), the allocators may temporarily see an "incorrect distribution" while the system is being reweighted. This is not harmful (simply inaccurate) and a result of providing a clean way to revert to the system default. v1: Simple GCD reduction of basic bandwidth distribution. Approach: - whenever new coordinates are reported, recalculate all weights - cache each node's min(read, write) bandwidth - calculate the percentage each node's bandwidth is of the whole - use GCD to reduce all percentages down to the minimum possible The approach is simple and fast, and operates well under reasonably well if the numbers reported by HMAT for each node happen to land on easily reducable percentages. For example, a system presenting 88% of its bandwidth on DRAM and 11% of its bandwidth on CXL (floored for simplicity) will end up with default weights of (8:1), which is a preferably small number assigned in each weight. The downside of this approach is that it is susceptible to prime and co-prime numbers keeping interleave weights large (e.g. 89:11 vs 8:1). We prefer finer grained interleaves to prevent large swaths of contiguous memory from landing on the same device. Additionally, this also hides the fact that multi-socket systems experience chokepoints across sockets. For example a 2-socket system with 200GB/s on each socket from DDR does not mean a given socket has an aggregate of 400GB/s of bandwidth. Interconnects between sockets provide less aggregate bandwidth than the DDR they provide access to (e.g. 3 UPI lanes vs 8 DDR channels). So this approach will reduce multi-socket interleave weights to (1:1) by default if all sockets provide the same bandwidth. Signed-off-by: Gregory Price Gregory Price (1): mm/mempolicy: introduce system default interleave weights drivers/acpi/numa/hmat.c | 1 + drivers/base/node.c | 7 +++ include/linux/mempolicy.h | 4 ++ mm/mempolicy.c | 129 ++++++++++++++++++++++++++++++-------- 4 files changed, 116 insertions(+), 25 deletions(-) -- 2.39.1