From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F295C282C6 for ; Fri, 28 Feb 2025 16:24:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 19250280004; Fri, 28 Feb 2025 11:24:54 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 14321280002; Fri, 28 Feb 2025 11:24:54 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 00A2A280004; Fri, 28 Feb 2025 11:24:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id D8AE4280002 for ; Fri, 28 Feb 2025 11:24:53 -0500 (EST) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A1777B24DA for ; Fri, 28 Feb 2025 16:24:53 +0000 (UTC) X-FDA: 83169877266.28.9C572D3 Received: from mail-yb1-f175.google.com (mail-yb1-f175.google.com [209.85.219.175]) by imf03.hostedemail.com (Postfix) with ESMTP id C05CE20010 for ; Fri, 28 Feb 2025 16:24:51 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=O12Z9XEU; spf=pass (imf03.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740759891; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q3ewd6MOD055VnfUmlVAPlmkIaMimMwbkJLyr3SEzLo=; b=BpLDtf4PmZkFcIY94V782I7Wfk0fEWJ0eaSnKpqQUkQ4Osd5bNc3MOHCaLUk5wyI2ZD6c/ vz1Rurlmo+Nwr2pdsNhg7M5noEBZ0LcjWggkSADJrPZnJUJXPGmp4y8sFVnM6Qhu1frBg1 pb+TC4qzk60t9xk4cBbUQ7G5t7bBCMw= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=O12Z9XEU; spf=pass (imf03.hostedemail.com: domain of joshua.hahnjy@gmail.com designates 209.85.219.175 as permitted sender) smtp.mailfrom=joshua.hahnjy@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740759891; a=rsa-sha256; cv=none; b=IV4NQx5VPx4eO/9iQbYd69z0g5SHyYCTLSXB2b6/Ygl6t32KV1npxAM/tm+0i49kmOyyA4 7OU/hE3pI7KBaPEVi9Rx++K71uWj7dMyRvFdYFLmLGZYmlB57TyJ/X4WSQtOxXkqKkO66w Fr9g8ZKQPkFs8XiwUZ6SfeZnPwtQbKQ= Received: by mail-yb1-f175.google.com with SMTP id 3f1490d57ef6-e6090e9a001so2755809276.0 for ; Fri, 28 Feb 2025 08:24:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740759891; x=1741364691; darn=kvack.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=Q3ewd6MOD055VnfUmlVAPlmkIaMimMwbkJLyr3SEzLo=; b=O12Z9XEUuk39zNoUEc0sfdPQTwvKw9/53Vcs/BSLvxwYVWZYmIsuu19HYBqGrowFyz QuaBeYNMnWVbOL/J1lbnU/CpJSoP+f14vrB1vW2tNInw+aiF7rjO7X22WwlbEH4KDgsM 9hB/82UbQyvz6XAJgwL+UoWgy6g30JYpk2qpOwXzvSCAakUVskswv7Fon/A4E+KzXMP7 rtsb9aNDI1fIfOl+qNRxDz2IR9Onemn2IilZW90c5ynRS+OSrWv0KBO8Yv4pb7u4FpoN 6MXVKWHnr7dUDKCP9PAmiDUo4Ho3fCrpKpCSqUaRVWlnqtmegWdfW5kW1Q+JhVdCoH6t A7YQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740759891; x=1741364691; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Q3ewd6MOD055VnfUmlVAPlmkIaMimMwbkJLyr3SEzLo=; b=NE8XwbsdNMnuLtw4xmrlOEdaMSjJLTnw1e7irmH09Hvbzvl1ek+7KKD5tB7Nf8exHj ZsUEI7IJsvRSGXDLpBBfv7jKZvqtmBEgT+7W0+SNdZj+H8I19/Zo0jwGVLSMgucuEHCz aRLn6bHv+YD0ZakxJQeq/DvFF24AfcNbbXyDFRjjOJee5cBm60nU4mtkmA4NMxBczjJI 39AweJlrd0tpUP2PIlUicOGLYcEwfC8xfwfv0MCAbTAMzHpvg8U00vm7vcnqWoeKjED0 Rqi1fAjy5brD9J7RovWxqOYfztTRJ/51c6BjdwvcpTOSh0YIsfFif1YDddrFJFZAtBE3 3Dhg== X-Forwarded-Encrypted: i=1; AJvYcCVrX39JHwZXGWF+0KxjbCeeD0SusMlkaU6s63YGIwwh4RnN76Af+MiK3PxLIof5XVr4SxBWqQOotQ==@kvack.org X-Gm-Message-State: AOJu0YwLkHXA2Db+1h74bizwchAKonWdoT4EEIBgJy3XAy6LebFia4+W 3CayOjsaY6/D9NUhBqLtBS9T2LbRehEB9ROIfNk0INcoVnEdCWvK X-Gm-Gg: ASbGncuXuG8PnyKsdJuNwLQ3iqq24wAuzei2iDoAa9peT2g168Ni8nws7WPa0F+rOFr 9jS6Vy5OMLc5V+9RV4CRkK8m6eoDWycW8vfsSBMntFgdwetmJZS86c8AUR7dsEe2yv0Z3S/kAjz O23JR8sI5PHgP9SM7potoNM7IlEfcGA2y0BU99hehRHavNi3p+nMKpUERsjpXjHMmMAw0cLV3YF Zo/BhSgsH0pYH0s0X7T3fwydF7pJYLgFSx8kyjpn85rRATULtxqFkArsh+njLLSCmPhZdH1bre/ eXJ2GX+/2Es+WTlq17zElGjj X-Google-Smtp-Source: AGHT+IEAM2QLaq1/TAY7na+Pdysw6rSoHZH4Mr0frspMuY5OQC0gIjx7djT9cTfNwuR3W8ueknFuuw== X-Received: by 2002:a05:690c:4b86:b0:6fb:b38e:207e with SMTP id 00721157ae682-6fd4a23912dmr45573817b3.14.1740759890583; Fri, 28 Feb 2025 08:24:50 -0800 (PST) Received: from localhost ([2a03:2880:25ff:5::]) by smtp.gmail.com with ESMTPSA id 00721157ae682-6fd3ca100b3sm7900267b3.13.2025.02.28.08.24.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 28 Feb 2025 08:24:50 -0800 (PST) From: Joshua Hahn To: Yunjeong Mun Cc: honggyu.kim@sk.com, gregkh@linuxfoundation.org, rakie.kim@sk.com, akpm@linux-foundation.org, rafael@kernel.org, lenb@kernel.org, dan.j.williams@intel.com, Jonathan.Cameron@huawei.com, dave.jiang@intel.com, horen.chuang@linux.dev, hannes@cmpxchg.org, linux-kernel@vger.kernel.org, linux-acpi@vger.kernel.org, linux-mm@kvack.org, kernel-team@meta.com, kernel_team@skhynix.com Subject: Re: [PATCH 1/2 v6] mm/mempolicy: Weighted Interleave Auto-tuning Date: Fri, 28 Feb 2025 08:24:45 -0800 Message-ID: <20250228162447.3850305-1-joshua.hahnjy@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20250228064016.1325-1-yunjeong.mun@sk.com> References: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Stat-Signature: uukmskcup4dnhq1mim6h137itu13tnr9 X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: C05CE20010 X-Rspam-User: X-HE-Tag: 1740759891-752965 X-HE-Meta: U2FsdGVkX1+nfnXxCodKPzNESMZmpVT2Hi2P+Owlfx8QduhOnIos9fHqI2qe/LzBfOtDotdzzF7jNV3Z7sKTH0EhXKIajPLQdk9NFawjdQqawVHQMP9lZgw04CRDdzywDADFHY9evh0AF1Nj1+ZvTdcj8OUseczAlqccYB3haWx0TjOCAnt3VvRzI2fQHXSUlo7Uo6g2Quw+i9tLeZENvvgF8IPQ9Y8sc8LrPh7BgEdxOG7xnhcymbjmjz3Thtrjqx6aFq+q3QG0S0/Tq8OH0OjkIXkZY9nmv28HELNx8slsPfvnKG2Jq1PhGQ4T36rAjgLAxzU0tpz0oSKw//RiTBEkgxKTadyJdiFaYDQEmIQ14ak1twqV+4mJfTBOUvpZrzjkiiOfg2QSLjMiXnF3/ilMW9FvJpAqrxK9VmC+QHQWIoA9C6Ob7WzEOoPNPIKJIy2mQzwUpTzDFws1mSx/o4zLFjyalZuoSR5HzsUjOVB7rkWtB1WhebC0PahYAA4TKheLKJOl4wjw+paxY9gYqnfCtZ2dasb82Pe5iKIhKkSrFuI/cLQpCxFQ1JSpBwvmtY66SNY/rPaVCDPVwuhBWq9EjYGnXFeqUwfkMZQdspft6A2FOdt2CcSI+my4AVAMxBButnKe0/pVB1uhk7ZNPbrOyIXbeCy0EeqozzlYlbyLch0xx/AP/WVGAWdKAmtDGcgxHqxyboe5VP7BZWEthpE7lsw9ZWhCgXwIgk2Wcc+0q+jEWOSPZh/tMIf9HkTaxstqZDcv34JAQ+5WfcEJvb2D5sRACS/8ljpRKMRIOZI1O7seiOSyC5JpgIYDlzose8BdbV/zGEL1mXc6lBPj4vfdT+8K9sjHfuJ6ggy0Vqlse3DRhcoUs9BSUopqxRnewBfW8RNgarR8VvwR2aupm0jYhWvq27hCl9xno0iamE5IHtoKoOgjmkd8dfN1KhNbzZVRlIMZRuU3+PSmL3U K8ixCC/I 5uhpwk/x1l8YgHztfPR54VW75bSkuKQ5OLDxOBAJzfGbnTfmlrp9nZaiTVPexvzC/gnqoWWCNWxEUiFpG40oWdjMxA2ujWFp8XL8QU8+1415ZbqVyqm5BO1gX/C3V93zBodvp4BokKRjs/QxU+Dv6zt4QsIcjQmbRoIiXoKHFj0t3Lock7HUzjEesYZCmq63dS+Mjv1hCoE7JKFZjcXmof5bFCf4IpghZAXFa0K8IZ4L2lOCzY3dbVaTgHhSICI9BODxGxYUHa/7G8oSk66exWoCPH84QvMMJcHPBFvQrgvgGmqDcI7nxP+6yaqwAvUqbM1aIs4SrkuoadLyexNLWpYx2RHN4v6zpFpYYDO6N+2k6vdTq4qxW+VEN25pEitEpZT5qsAyT3EQAS0KE5kuB8vDOlB7mKLpt61BRTNmkOOn08Um7Y+AYEp6V4nEdUc8NNgNf60gdknWIvEfiIKS1myT1lFzRjjvD6mcA0HlHOx3AqXe6/k56xroxjjz0ZKnP86jrVstDtYrRTBKRLX40x+Wtm7HPk7sbi9K1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000829, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, 28 Feb 2025 15:39:55 +0900 Yunjeong Mun wrote: Hi Yunjeong, thank you for taking time to review my work! > Hi, Joshua. > > First of all I accidentally sent the wrong email a few hours ago. > Please disregard it. Sorry for the confusion. No worries at all! > On Wed, 26 Feb 2025 13:35:17 -0800 Joshua Hahn wrote: > > [...snip...] > > > > +/* > > + * Convert bandwidth values into weighted interleave weights. > > + * Call with iw_table_lock. > > + */ > > +static void reduce_interleave_weights(unsigned int *bw, u8 *new_iw) > > +{ > > + u64 sum_bw = 0; > > + unsigned int cast_sum_bw, sum_iw = 0; > > + unsigned int scaling_factor = 1, iw_gcd = 1; > > + int nid; > > + > > + /* Recalculate the bandwidth distribution given the new info */ > > + for_each_node_state(nid, N_MEMORY) > > + sum_bw += bw[nid]; > > + > > + for (nid = 0; nid < nr_node_ids; nid++) { > > + /* Set memoryless nodes' weights to 1 to prevent div/0 later */ > > + if (!node_state(nid, N_MEMORY)) { > > + new_iw[nid] = 1; > > + continue; > > + } > > + > > + scaling_factor = 100 * bw[nid]; > > + > > + /* > > + * Try not to perform 64-bit division. > > + * If sum_bw < scaling_factor, then sum_bw < U32_MAX. > > + * If sum_bw > scaling_factor, then bw[nid] is less than > > + * 1% of the total bandwidth. Round up to 1%. > > + */ > > + if (bw[nid] && sum_bw < scaling_factor) { > > + cast_sum_bw = (unsigned int)sum_bw; > > + new_iw[nid] = scaling_factor / cast_sum_bw; > > + } else { > > + new_iw[nid] = 1; > > + } > > + sum_iw += new_iw[nid]; > > + } > > + > > + /* > > + * Scale each node's share of the total bandwidth from percentages > > + * to whole numbers in the range [1, weightiness] > > + */ > > + for_each_node_state(nid, N_MEMORY) { > > + scaling_factor = weightiness * new_iw[nid]; > > + new_iw[nid] = max(scaling_factor / sum_iw, 1); > > + if (nid == 0) > > + iw_gcd = new_iw[0]; > > + iw_gcd = gcd(iw_gcd, new_iw[nid]); > > + } > > + > > + /* 1:2 is strictly better than 16:32. Reduce by the weights' GCD. */ > > + for_each_node_state(nid, N_MEMORY) > > + new_iw[nid] /= iw_gcd; > > +} > > In my understanding, new_iw[nid] values are scaled twice, first to 100 and then to a > weightines value of 32. I think this scaling can be done just once, directly > to weightness value as follows: Yes, you are correct. I want to provide a bit of context on how this patch has changed over time: In the first few iterations of this patch, "weightiness" was actually exposed as a sysfs interface that users could change to change how much they scaled for high numbers (better weight accuracy, but worse page allocation distributon fairness) and small numbers (bigger errors, but better local fairness). The reason why this matters is that we use a heuristic of "round all weights whose weights are less than 1% of the total weight sum to 1%". So if we have bandwidth ratios of 100 : 1000 : 3000 : 4000 : 6000, we have a sum total of 14100. Then 100/14100 is only ~0.7%, and we would want to round it up to 1% before moving on (since weights that are too small don't end up helping). This problem only gets worse for machines with more nodes, and it becomes possible for a node to have something like 0.1% of the total bandwidth. When users could set weightiness to be up to 255, this was problematic, becuase scenarios where weights become 1:255:255:255:255... become possible, where we allocate a single page from one node, then allocate 255 pages from the remaining nr_node_ids - 1 nodes (which is of course, not ideal). However, with weightiness fixed to 32, maybe this heuristic makes less sense, since the worst-case-scenario looks like 1:32:32:32:32... I think this proposed change makes a lot of sense. It does seem silly to have to round twice, and without giving the users the ability to set thier own weightiness value, rounding just once seems to be enough to prevent the worst case scenario. I will incorporate this into a v7. I'm also going to wait a bit for more feedback to come in for this version, so it may be a little bit before I send v7 out : -) Thanks again for your review and the proposed change. Have a great day! Joshua > diff --git a/mm/mempolicy.c b/mm/mempolicy.c > index 50cbb7c047fa..65a7e2baf161 100644 > --- a/mm/mempolicy.c > +++ b/mm/mempolicy.c > @@ -176,47 +176,22 @@ static u8 get_il_weight(int node) > static void reduce_interleave_weights(unsigned int *bw, u8 *new_iw) > { > u64 sum_bw = 0; > - unsigned int cast_sum_bw, sum_iw = 0; > - unsigned int scaling_factor = 1, iw_gcd = 1; > + unsigned int scaling_factor = 1, iw_gcd = 0; > int nid; > > /* Recalculate the bandwidth distribution given the new info */ > for_each_node_state(nid, N_MEMORY) > sum_bw += bw[nid]; > > - for (nid = 0; nid < nr_node_ids; nid++) { > [...snip...] > - /* > - * Try not to perform 64-bit division. > - * If sum_bw < scaling_factor, then sum_bw < U32_MAX. > - * If sum_bw > scaling_factor, then bw[nid] is less than > - * 1% of the total bandwidth. Round up to 1%. > - */ > [...snip...] > - sum_iw += new_iw[nid]; > - } > - > > /* > * Scale each node's share of the total bandwidth from percentages > * to whole numbers in the range [1, weightiness] > */ > for_each_node_state(nid, N_MEMORY) { > - scaling_factor = weightiness * new_iw[nid]; > - new_iw[nid] = max(scaling_factor / sum_iw, 1); > - if (nid == 0) > - iw_gcd = new_iw[0]; > + scaling_factor = weightiness * bw[nid]; > + new_iw[nid] = max(scaling_factor / sum_bw, 1); > + if (!iw_gcd) > + iw_gcd = new_iw[nid]; > iw_gcd = gcd(iw_gcd, new_iw[nid]); > } > > Please let me know how you think about this. > > Best regards, > Yunjeong Sent using hkml (https://github.com/sjp38/hackermail)