From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFA59C0032E for ; Fri, 20 Oct 2023 16:33:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7EAC78D0109; Fri, 20 Oct 2023 12:33:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 79A9A8D0003; Fri, 20 Oct 2023 12:33:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6140A8D0109; Fri, 20 Oct 2023 12:33:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 52B558D0003 for ; Fri, 20 Oct 2023 12:33:45 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2074E14122D for ; Fri, 20 Oct 2023 16:33:45 +0000 (UTC) X-FDA: 81366386010.12.815B40C Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2042.outbound.protection.outlook.com [40.107.220.42]) by imf01.hostedemail.com (Postfix) with ESMTP id 407974001A for ; Fri, 20 Oct 2023 16:33:41 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=memverge.com header.s=selector2 header.b=F3pu49Gx; spf=pass (imf01.hostedemail.com: domain of gregory.price@memverge.com designates 40.107.220.42 as permitted sender) smtp.mailfrom=gregory.price@memverge.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=memverge.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697819622; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8SUxB3vUbMPnwPi1Ud0MQolns51Ic8Eds/tv1AqgrE8=; b=DWo/PHHWJD5lLvbIMaptKLHwXszfNM80Zwcdzj+b54beXA4JpjZkmVwAUoEXwy8OHkYXQW mP3Ag4dg0yC9Lh2TONehFlgZbDz6qxKDkN5+mncYbsidCzmwLiYLm1e3P5JC9Iei1eB8nq QkcyknD1vD0GQDbA0DLgfL4+Rj7iti8= ARC-Authentication-Results: i=2; imf01.hostedemail.com; dkim=pass header.d=memverge.com header.s=selector2 header.b=F3pu49Gx; spf=pass (imf01.hostedemail.com: domain of gregory.price@memverge.com designates 40.107.220.42 as permitted sender) smtp.mailfrom=gregory.price@memverge.com; arc=pass ("microsoft.com:s=arcselector9901:i=1"); dmarc=pass (policy=none) header.from=memverge.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1697819622; a=rsa-sha256; cv=pass; b=pQ651yo/hDTXbvHIvmnyco2pvkQ/F+tW618TLwl8NP/SvrslRP9NdyVZmEFALEIpvCPXj/ zgyPAlg/6OYeC8F+wg8z9R0lansNsb0/Tf2g5p+o76f5hLaO7hftEzqV9aX6y7iRGigPR2 bY5HA8IPvE2mnAPYbsSf7XEpMsM3DQ8= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=aYB8nF8bdQOKIwa97XwBFp35Lg3wFUXzwJMch3U1zwnVRKpDcqjVo25y7JXC246ucOMnhNgaUZ7YeUX7+N/tjvJEvQgd0mJhm+4oxPc1dYz+dgtHSjVYOXjFI4wc+kdK9XJQysaU1w1ojpe5pbsP233j+m5I+hyI5KdXk6yillqxswe71oX7RDyVEY+qKMLQ9l2g77275pWbLmTDsD8w4446a9aSXV/ZqJ11cmvpJq7MFUuVT6MrGb8hCPi3t3XMdmu6TpagXrIvLul7kJu9izZ641USiIZlzcZpC20QvMcUIkzODTkVB8BEWpW6Ii/xOWmFb73O5zTTqCQAD2yrmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8SUxB3vUbMPnwPi1Ud0MQolns51Ic8Eds/tv1AqgrE8=; b=d3t8KWBnT0ontexpZ6znnrDtrF+J0Tr5Voq3KbIPZQkCC+QdhFFBy7es7GVcIM2i++2gjwSPPAvuxr8TuznSe2GJPWqCZ8S+lcP0KlEEvPfPSaAXja/sWmIyHtH9QxBbwpHDh7P1DztEh+/Y7q8G6Dg/ycvmh+Zz0+0yDNz2ykzeQAvDFqJv5NtdH12ICRBChAfHdJ6sGhKg5XwGwwUGDfvO1upxD3Uf6+f+GGkGmc8Xf9RazFFX9aYEfQ1VByeX9+aAhFLvPIRZZsibsXQYlA8K3cQW3pX2O3eniTkDRWregsqaBUHWHs9J7EXgegXHceYZlaNXPwFuyARPhWVo4Q== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=memverge.com; dmarc=pass action=none header.from=memverge.com; dkim=pass header.d=memverge.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=memverge.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=8SUxB3vUbMPnwPi1Ud0MQolns51Ic8Eds/tv1AqgrE8=; b=F3pu49GxJWPCjuH/X0hROSRWhE1D/SmYsPr0x/zZGqkISdY8ocPghlHyNNSqi9llan+HUBZijnT9PIoWal5LxDWUUL4VyiBJoJNU4SD/MsOGMo8vljBqnuB72shX5koQOblLOdZrVtIHgVTqNnaIezxNdQY6LR9OmUfRwg5krT0= Received: from SJ0PR17MB5512.namprd17.prod.outlook.com (2603:10b6:a03:394::19) by PH0PR17MB4640.namprd17.prod.outlook.com (2603:10b6:510:84::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.6933.8; Fri, 20 Oct 2023 16:33:38 +0000 Received: from SJ0PR17MB5512.namprd17.prod.outlook.com ([fe80::3cf6:989a:f717:7c20]) by SJ0PR17MB5512.namprd17.prod.outlook.com ([fe80::3cf6:989a:f717:7c20%4]) with mapi id 15.20.6907.022; Fri, 20 Oct 2023 16:33:37 +0000 Date: Thu, 19 Oct 2023 09:26:15 -0400 From: Gregory Price To: "Huang, Ying" Cc: Gregory Price , linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-cxl@vger.kernel.org, akpm@linux-foundation.org, sthanneeru@micron.com, "Aneesh Kumar K.V" , Wei Xu , Alistair Popple , Dan Williams , Dave Hansen , Johannes Weiner , Jonathan Cameron , Michal Hocko , Tim Chen , Yang Shi Subject: Re: [RFC PATCH v2 0/3] mm: mempolicy: Multi-tier weighted interleaving Message-ID: References: <20231009204259.875232-1-gregory.price@memverge.com> <87o7gzm22n.fsf@yhuang6-desk2.ccr.corp.intel.com> <87pm1cwcz5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87edhrunvp.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fs25g6w3.fsf@yhuang6-desk2.ccr.corp.intel.com> Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <87fs25g6w3.fsf@yhuang6-desk2.ccr.corp.intel.com> X-ClientProxiedBy: PH8PR21CA0019.namprd21.prod.outlook.com (2603:10b6:510:2ce::9) To SJ0PR17MB5512.namprd17.prod.outlook.com (2603:10b6:a03:394::19) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ0PR17MB5512:EE_|PH0PR17MB4640:EE_ X-MS-Office365-Filtering-Correlation-Id: 35bf1aa0-4603-4c5f-a1f9-08dbd18a4faf X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QkUc9n3+2oFg594TF/AkFwLvkDICYt2ZAooasFWuBHuTvi0WWSSRSooGLbwVO5kaskSMKcZU/mFfRWXQ0RZWmyMFr/URU0Q6bs/LKw5gzCtfo9Gk+nddwD0CJCQ7VHG/PI9nuWmsJzqoc6fMmJLmI308nH5dDAgL2DrFjkIVdJzurqASLGxI4Vs44xjp9ushjWKkDzBfefJeA/XPTcYi80tQOsoZQKspFOhc2zxB7AC+fw+KCptOIpBpHV+Ma7aMF39DAuEQMbSZKhzRgNM+M2uw/Z2UuhFqYInCWo8BMGwYEdKHkyQTStXpBwPsB5BO/z1gv6mrfDiFFf9UGudJavTMLhJzhsRcT3O6qoV6GOR7okcQtkzQKwtdbHVnyjEwwNTZPDjmGFnUSyfWRn8jrxkKWxhcpgsEOecySe+pwlJuXpI88xSq1NXXRgJBrgoLv36Pd1XLaH56GUA3LqngoNayCOATGcyfFDr3Z67uOnXp8rvIMUiEh6xs8LFgWGok0Ln9H8PE8ZU7k/APHSqreeqF+FNVHQMdiBJA63zR+Yrzlgo54pEqaYnLjszlHUBNh7fD9688qEXgwzqVPl3XrG9md14VkgfGtuIPJi/l5tpwZxu96UBHqWAUMPogfxTAua/ra3+eA/FqfYaA6I2wew== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:SJ0PR17MB5512.namprd17.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(346002)(136003)(39840400004)(366004)(396003)(376002)(230922051799003)(451199024)(64100799003)(186009)(1800799009)(38100700002)(6506007)(6666004)(83380400001)(6512007)(2616005)(86362001)(2906002)(4326008)(7416002)(44832011)(41300700001)(26005)(8676002)(8936002)(66476007)(316002)(36756003)(66556008)(6916009)(54906003)(66946007)(5660300002)(478600001)(6486002)(16393002)(67856001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?ERVkUfyJ01JTtjBioN4D2LG6XLYVzPgPlZeGl35bwMV4Q58CbQhnLA4hKGQp?= =?us-ascii?Q?XXxLCmGGk7Aj3k9beLG3OXwHC8nhT9HkzwRabFJjvSva1yX6GjZy6g4GF1fA?= =?us-ascii?Q?/BzNMvEVwwbMBQqufQhyAeSC/rdrBVEz4EeibqgtAD0trdExzU2a98n11ooQ?= =?us-ascii?Q?mVjF/hKTjWnzIbW5W14OcZKk+Zckv1ZVElCzELqCfFtW8LnSAqymGce/Ynex?= =?us-ascii?Q?6mN1XuB9WIdpcAxENQXpNRFesJMwPDCUN9P8clVN45pCSvfYdJC8a8GxYbD4?= =?us-ascii?Q?ATYSuKevF+OKs4xlj0O6g0vD8SlSoQK5ofiNZPeyT5Dgr1XtzVHprXDt7JJT?= =?us-ascii?Q?DJCOSmKU2/L6rZKK0VcNX6emrk69i1+cLmLv8om2Ma2tnfHlDrej7zeVCy/l?= =?us-ascii?Q?1+LdWBXgWmKyablqaXSQqShByGs1cE9JyybTV3yVafqWTFnbE4oRVKWJw8aN?= =?us-ascii?Q?1LpSeyu5LbrNh6nG7tKNADyCKkL60TBgX5bTkj7JdnDc1PPFk2hlv46xiG2F?= =?us-ascii?Q?c/JSV2skMYFt1dPXPeZoc/Xrgv7sd6wygITfXrIwP0yO2QIdLP3Jh3HTuqJa?= =?us-ascii?Q?i9FeTWbygLOVlWRZsPEV46jtnFcfywxgNcugDQnkgTNCm3E5lN8GtToarmaC?= =?us-ascii?Q?Fz7qYUjCMTNaNi18+oEW+Vc3oTrzMSytA3EIUKk5WWKZzybyxnnfXuutTXJ+?= =?us-ascii?Q?AAqMQaEms8WxiWRAz/8eHOsbq3fI4YOy+g3Eg3PqCxaDJDy39/jRb225SDIe?= =?us-ascii?Q?C/gvoq9SVkU9p3nfESl2Li14TNAwmYbqwZeN3vo5mFUc6A7bF3WnXxun/nLk?= =?us-ascii?Q?BKKNzOBmwqv0Dnd8UbPZ5sj0FwflAEFziShxBSxlrgeBg+vMn2P+MjWooOGb?= =?us-ascii?Q?9KOMeGynU2V0Fy2cRO9/Urw7QpcZ0sID3hJ5n0vNMFOLQXrVybLr/T0jZ2sD?= =?us-ascii?Q?87q2L/oIOl/ttb4QeOo8+uTylRMq8zC78FvpF9uFlpgbt/5oScxIb+mXOzfz?= =?us-ascii?Q?YDBmLcYXViF5oWpD7lTnmi2q36/WhG+Py08xrWggLFVFdlDCuAshC7cWZRd7?= =?us-ascii?Q?KzUGVyoESEuiEJUfp6qzBQ9fV0TjGPBVD2v+XFcCzf4Xp9u4jAULA5fZxm01?= =?us-ascii?Q?7KTmE7aFqtSIsUrll330612MQXXJ+O+cN2w0Amz1nlIjPOQYsF9BFoTLqwlV?= =?us-ascii?Q?60SxdrHW0IWNYfset/guGE4hfnKksLjDIu8bdeBhkyMs4m08OwTaTpw0vbZX?= =?us-ascii?Q?XSjAm+r1mTpiKFsTpBUUpCa143z+EUlA8o7PqCk424/y3aYon+oP1fhEdLko?= =?us-ascii?Q?6a05/aCFCESigMookYM+5a+MB33QgBP7VwcyVyyBnYJrNSQb5MhkSzbuTHrh?= =?us-ascii?Q?Uq2ex2SNeVa5fV6BtuZxYHsJVJGeRdhoFK81q7pefdUPNhgmeKtwilhN2o3F?= =?us-ascii?Q?QfD16h4C4KdsxajtCCX/dp2PMMLY6MGgWoEHZRoz37+Va9Ysgn2g45pQ1F7o?= =?us-ascii?Q?bzUZhDF+pQHJu37zW5aunAy2NqeKuicLWSUW1H2HU0d+9eMSMrgcU3XrUSzr?= =?us-ascii?Q?Uf5DKD7Ss1iMS3dZDY2267O8qPgjdyb2QRhdJpa7+8HgFafImfwAmrOhBMNa?= =?us-ascii?Q?Gw=3D=3D?= X-OriginatorOrg: memverge.com X-MS-Exchange-CrossTenant-Network-Message-Id: 35bf1aa0-4603-4c5f-a1f9-08dbd18a4faf X-MS-Exchange-CrossTenant-AuthSource: SJ0PR17MB5512.namprd17.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Oct 2023 16:33:37.7306 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 5c90cb59-37e7-4c81-9c07-00473d5fb682 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: a7ZDN19NR7P7ngirqopE77Sv+zaqVYO+x81ykZ/xT+EhhSP/AJ2foNr8I1kH99LtozWdmBCUPl44yNSauRxaTB9l504QyTZNMmtZK0+jcN4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH0PR17MB4640 X-Rspamd-Queue-Id: 407974001A X-Rspam-User: X-Stat-Signature: yboa4yih3j7a878qzyd3fuaptoh11wti X-Rspamd-Server: rspam01 X-HE-Tag: 1697819621-858212 X-HE-Meta: U2FsdGVkX1+pmZWbZvBeTI3POjpMstUyAOJVLWe/qCoCisGyPpdU5sMBd9UXXMk5NMYF2E/9L6ZzJZBfgEoh8MQ6sKoomCaTauvZnLPbrNTfi0HnV91vEQ73rBccXUgaow53SFci3JYWNY1FyZ/9e+4osT8OeQ8QB16WKloeQL+oDCqLTTBUSPrA0EsUd8N9YxF9GWe07QJTVYEPxg+Ycm7UaXonS0JOtp01EWUKWCknhMyayZMp/u+hB9uhkQnNkQTAeyPg3nYzOQP6uIVJ5xmIO5gjo745LDBgRW4D2AjMui2bLGe+hDkY+VjPrUJL8QyHot0YFQWEIRv0oudiJIGxDuTyhunaP0Jtln2/St+Mc4o0ueNXv5roH8TI1numoDqhSgs0w3tPA1YA68/4y6qR9FturN+Divn3gm9IlS4JK3UnLxpRCrt4nS5csvOZVYJ/CY+bmvPsKYKhXrliQ4BfGIIc2gZEGOhpp/drFpSOJpCEM2/7tneYZewppoJwvPhdHKOwxA4bytUNr5wfpV0ElARGkrbKsonv56sv/MY58SulZVRA7WemKVsz1y4uJ1cSqfVnts9071Y+pDsfOYKwP8bCVBh7Zl3bQVXNne6S8e/H0WD08pruvb1hfYlgVjRWCOYZ+kSJ3Dqv/6vo17PPaDFc1cs3bR9ydXEzZmjX9lCQvp+yFwV/2uMycu3vf5BfdHdLXc2rWL2MX0AqX/W1cRIxnA4ZQQvNe2lHZWuFwlRZ0uGqcoCDrt5cwk3/vlI+mcR8K37V/ZBbJNr38sEVdaC47VeZ27Qfu5IZbIKETP3//LYi82+aF2AejXyFJRWEst3rEKVcOq6IFKPFodBFL+/cz5OC2waadEE6VOMSkMpA0hatrElK9MvgPnnXylpEBbMDWoqJx2UJ2HSUiwHpjL1DUvK1IVgQy+3lj2gLgYx0V7uGq6tZqIGf7t03CWIExF7NRRSsrUPUIS4 hV1EvrCV isL00sWv6uUrNgrwXlNNAyhvNSqVvM1e7bWcHlfR3aOv3sWkZIDJkLmxyQfsU1Au0UQjwmFLtNykDUHNVwLCai4wvbqmZzayQ6N7OvgvK7C5i+7DgfVIwqOA44MIvvxr+4JlhvykD+NJCTLxPF3LDRtqIsm6R9VkZZqGq8vjRHw7vZAQEWRZNaXiGqtJG306Oy4yWE9o0L63xggp2ZqcSNlTyuwwn++i4Pa9X67vaQ19r8OT3lwCyKe2i88FQ64NYeNTundHeFhmwGBqLwIAWfaLBTjAttk7eCzbemuwXt6/bAFH7BSkzgKo77XniK2MJ4LANwk0cpSfvRNM5LSMgxsLfxjykT31lxjmJlGBI8NXDOhM28Ga3MKDcbNh0izJEeGpq X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Oct 20, 2023 at 02:11:40PM +0800, Huang, Ying wrote: > Gregory Price writes: > > > [...snip...] > > Example 2: A dual-socket system with 1 CXL device per socket > > === > > CPU Nodes: node0, node1 > > CXL Nodes: node2, node3 (on sockets 0 and 1 respective) > > [...snip...] > > This is similar to example #1, but with one difference: A task running > > on node 0 should not treat nodes 0 and 1 the same, nor nodes 2 and 3. [...snip...] > > This leaves us with weights of: > > > > node0 - 57% > > node1 - 26% > > node2 - 12% > > node3 - 5% > > > > Does the workload run on CPU of node 0 only? This appears unreasonable. Depends. if a user explicitly launches with `numactl --cpunodebind=0` then yes, you can force a task (and all its children) to run on node0. If a workload multi-threaded enough to run on both sockets, then you are right that you'd want to basically limit cross-socket traffic by binding individual threads to nodes that don't cross sockets - if at all feasible this may not be feasible). But at that point, we're getting into the area of numa-aware software. That's a bit beyond the scope of this - which is to enable a coarse grained interleaving solution that can easily be accessed with something like `numactl --interleave` or `numactl --weighted-interleave`. > If the memory bandwidth requirement of the workload is so large that CXL > is used to expand bandwidth, why not run workload on CPU of node 1 and > use the full memory bandwidth of node 1? Settings are NOT one size fits all. You can certainly come up with another scenario in which these weights are not optimal. If we're running enough threads that we need multiple sockets to run them concurrently, then the memory distribution weights become much more complex. Without more precise control over task placement and preventing task migration, you can't really get an "optimal" placement. What I'm really saying is "Task placement is a more powerful function for predicting performance than memory placement". However, user software would need to implement a pseudo-scheduler and explicit data placement to be the most optimized. Beyond this, there is only so much we can do from a `numactl` perspective. tl;dr: We can't get a perfect system here, because getting a best-case for all possible scenarios is an probably undecidable problem. You will always be able to generate an example wherein the system is not optimal. > > If the workload run on CPU of node 0 and node 1, then the cross-socket > traffic should be minimized if possible. That is, threads/processes on > node 0 should interleave memory of node 0 and node 2, while that on node > 1 should interleave memory of node 1 and node 3. This can be done with set_mempolicy() with MPOL_INTERLEAVE and set the nodemask to the what you describe. Those tasks need to also prevent themselves from being migrated as well. But this can absolutely be done. In this scenario, the weights need to be re-calculated to be based on the bandwidth of the nodes in the mempolicy nodemask, which is what i described in the last email. ~Gregory