From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE309D78764 for ; Thu, 21 Nov 2024 14:24:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 54E156B0082; Thu, 21 Nov 2024 09:24:56 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4FC9D6B0083; Thu, 21 Nov 2024 09:24:56 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C3E66B0088; Thu, 21 Nov 2024 09:24:56 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1EBAD6B0082 for ; Thu, 21 Nov 2024 09:24:56 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6A42C1A0FC1 for ; Thu, 21 Nov 2024 14:24:55 +0000 (UTC) X-FDA: 82810323246.14.B828BA8 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) by imf10.hostedemail.com (Postfix) with ESMTP id 00667C0012 for ; Thu, 21 Nov 2024 14:24:29 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=HbPeZtoM; dmarc=none; spf=pass (imf10.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.180 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1732198907; a=rsa-sha256; cv=none; b=vqrZ0m+TzxfgKRMVSd1EauOM6T/RmNdrGVE7o1HIsaPmCYdw7O8VTnZgbj0gWQLvmuvB// +N2hpfun46Twc8fOdg7P9WzLl33gGH+WkPeQjBzYWqjTTtA1J12Ujou6aOZ5iw9/dAcWyq jElk3UEeaqoFROjKVtVhAytZFbIiSoA= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=gourry.net header.s=google header.b=HbPeZtoM; dmarc=none; spf=pass (imf10.hostedemail.com: domain of gourry@gourry.net designates 209.85.160.180 as permitted sender) smtp.mailfrom=gourry@gourry.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1732198907; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=eXhLVbl4qQpLa0iW3juzrX62f3Q2NQ3mbmEindnM0Eg=; b=5ViBc0v4vrYzz8rlI3uCURX3+jt6z9lQU4YEgVVJrB9MB5AohkBB6e6WnwMmLMTn290Cf3 iEQIRMw18m3DiwJBh5u8hOKUI2rxDZXaJAcMrv+QE/5rR5rcY8QZvH8yHgpewLXiGJAEon 8n3kXeIlOsQYqZM831+qCG3b5YZyyjU= Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-460da5a39fdso5786261cf.1 for ; Thu, 21 Nov 2024 06:24:53 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gourry.net; s=google; t=1732199092; x=1732803892; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=eXhLVbl4qQpLa0iW3juzrX62f3Q2NQ3mbmEindnM0Eg=; b=HbPeZtoMPsh7ubsjMoyvNTwWalriYDHfsmM6LG3tCgHyPR5bKX6IS8bUoqsVW/aXdk BB5Vk7Ov1jjDfgkk3GQrLzFgGKH+J+zLzvg+yt5W1RWYQ9UlyyKb1GRCBaYW7Ipz2kxA b2ZnSs7ZYX33KAuAvzyIWV3wMg8ZP25ghq7X6mM7yvJ4kaT35OuoZLoLd2FiC/x1xwHX QenC4yTdfC6jLkVV/vSA9WEoZ1dmLJ9AanFYXMJyDe9vAeiges8i0+0+/GH3l6qdM1yB rv7AngeJa5TBVGVq0G0Xt06lWil+7RQ0EmXvF3/g00S3tCv1EGndHZJ91j3CzywLN0cf wlZg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1732199092; x=1732803892; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=eXhLVbl4qQpLa0iW3juzrX62f3Q2NQ3mbmEindnM0Eg=; b=efC2ycSHnMG1LEYyTp6guwgqGzZ9MeyENbyFs9jvtPfuURFOtTmV2lriYueS6vPu9u c8FbAo3ybMyuTSIFe+084HaqxifWOiucJSeCCPHCJnZVCbucdOWBwGHJCOcKNZYx7eOL 62KSGpnvBpiroRDb0wiGxmcPGRG40TmaXJMAzearPjcOEJQTl7zU38pGu2/eKDuGcqs4 RMNIhrWVF9yVl7CbmstlVplcPgPcifN3Fpi4amKuRkZ1ADhSpJdXf32Cjs8MxlZoWC78 19B05rzbqkRbcjblu1SkVaZCDSVcb7OhnUb5bVlubvoU3HTlGCC5zBt0KNDOlVVp6H/K p3Mg== X-Forwarded-Encrypted: i=1; AJvYcCU2SnSmw0NBDogs5T1VtfQeZlvkfkqqwiMusmInvqGpUmnaBm3lMa76+hyBVnh6EFgxpZTt+eQezA==@kvack.org X-Gm-Message-State: AOJu0Yyajg1/wQf5K6WrHqn4GFja4lIUio3Ahn3lK4x4UfZVALtdczgs lWvc6lrGdoAeRA3Vptfjw8DFzzDrb0Py6rl2lAirnqtELFT8TLxEtOvkZUaP5IM= X-Gm-Gg: ASbGncurJsk/OIVp9JzZArM5xEK1RinWI85hSseVmAMhylYcGMQzNzFjjXOYV09ZnOV FhrP+7rKGpcLzIpoVCb+sxihBF1ELXHih1GW8YXcP3gdnEH5P9GYonLF3OO2JivTIV+bJQET/m4 lkpJqCNyWz6/UoFVaMgKAeSDgqvknenFXbXxtDIfA5dbn8MbRj3EIXIalUomzrmcGndtYT9yy1o 8zpWHT5t6a5IyqMHSZm2tJnjeRdK0VlTC/WoiDFarqxmI6jkSzf3VIZ/hrjiG5e6LWLUK9zmSJn QnckFoE/BgXtDLtyZHQe4Pkd3KhIkzJI5Ew= X-Google-Smtp-Source: AGHT+IHRE8aJtExvgniPZRQd10YOsYz8M+WHKPcQyReVFHmfTMTy/svK7inl9phFxt9maInO03X2sw== X-Received: by 2002:a05:622a:1b06:b0:463:4bb6:bfe2 with SMTP id d75a77b69052e-46479e23879mr95866461cf.50.1732199092344; Thu, 21 Nov 2024 06:24:52 -0800 (PST) Received: from PC2K9PVX.TheFacebook.com (pool-173-79-56-208.washdc.fios.verizon.net. [173.79.56.208]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-464685cee50sm22755671cf.30.2024.11.21.06.24.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 21 Nov 2024 06:24:51 -0800 (PST) Date: Thu, 21 Nov 2024 09:24:43 -0500 From: Gregory Price To: Jonathan Cameron Cc: linux-cxl@vger.kernel.org, linux-mm@kvack.org, linux-perf-users@vger.kernel.org, linux-kernel@vger.kernel.org, linuxarm@huawei.com, tongtiangen@huawei.com, Yicong Yang , Niyas Sait , ajayjoshi@micron.com, Vandana Salve , Davidlohr Bueso , Dave Jiang , Alison Schofield , Ira Weiny , Dan Williams , Alexander Shishkin , Peter Zijlstra , Ingo Molnar , Arnaldo Carvalho de Melo , Mark Rutland , Huang Ying Subject: Re: [RFC PATCH 0/4] CXL Hotness Monitoring Unit perf driver Message-ID: References: <20241121101845.1815660-1-Jonathan.Cameron@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20241121101845.1815660-1-Jonathan.Cameron@huawei.com> X-Stat-Signature: jp37pykadnr7ddsxt6cpzsgq3tcrjfp3 X-Rspamd-Queue-Id: 00667C0012 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1732199069-141356 X-HE-Meta: U2FsdGVkX1/NQj7a4/OcrrYiZdeDS3673Th7f8bpSKGrgVo5K/MmdSwoZE+l8v76IDwvlLkbVv/AC2kNORlnBu9ObV3n1ouX46FWVVvGhls8Ig/CA1KjI+PgEehYfGhhcUmMfRzySF1HirvA59XYFeqbeqUFtexZfB2Np1oPOIZ6LduHiOx2CbyENHksDx33argWSlSgl1chF1j5z6SZUmobbDlgTdDhvyT2/rFWWFFOLg37oY4ndqJdOedmPlPbNTWYEL+GUC6UEQwjSX1KyRBs8pugXZ65Vi5PtB0UFOzmTmECPZtnR5rMIXOBR5vORxceeMcpEVF/68dS1DeTxCLzmbn91SGugcPAhWup6zLUkKST2CpcB53aUFNnwMNzUhTf/0oDDH1BF75L8i+Lg5o4gVsglYQny7KaIpf43o5zXBmMeDEZJaoN6Do7MrWler+0E37zPnjzDTKhuCeqfzvz05ergykqP0W36Gvjmbg9EH3rGs6AwL6+AZw+9gb7kflaAgOkGEq1qbhObxZzBignmHflyMuKFZXREKjs4hJHYqNEnn1RHhBMF4jrpXzUV3JhHdaJFZJDH6V2pqlVyz+a68wapU2KDpWh+6yIf9143WAX84KtiIDDgY++LcazqTNYGTv06AXVvdlfZTBw6Ep63qO90Mdh2Qb+s10t8rb4NvnrJms7jDvV4ctVMoGWD8ZNKLwmTKIbak6YqiWfIKJ2UOxJDHiPHB6g3scjaVG6B/TkTvBjrhHB9Sffg21DhKffTlikZQxkdxvsUHgk9AywCwtQwoINdzh60+Pr381q/Myh48SjWFPCQbsSlK7Z+5gCrKBsAXd5VeRDGjU6JZ1gQs4HSElG5Gjipkp7FRZ0UB9im2LOBH8s6CqQsEICVvG01eXuxC78xf+qkxblm+LhqbZfGlrk1Ri46Hgn9X+fuPjyjZM4xsdVaFqXdizyzKqLFRwlG0q5tMy27yN qJtQO6u5 KmoZWnDtxy/j7TfqBOLqOMWmottuLTxsi3HHpMP1qHW4aSt6vjyF0yIlZVgjm8ZZB4Np/cvQPK/RxVc0S0OQUF+5mhKx8h2qnJ53G9FYlKfZU0fvhcd/xCBDSjr9i6K9S0bZjBAhwNS+sydZRuk2LnLVCBUE2GBWmDh6av2Cr4REfRfHgC+JWBcOl0rcOLiNAyxQw88q8fn+1uP/K6RQ3lkX6+9TVuDPlCg1F4qxHrXNzoQU9/CVyJQMDM1mV7rxfB4v7mQewHlWVAKiHuKLOmugw35issj7L9aKlAGSV/i/C0smkI3TloOQ5M6foI+0c7LqHOfyAZSi+KXKpFlLqbj3eL7eILq12XqYLLTr5UFH1MWFHt8inl+9waUfY1b0NcasoOnUmyL3uDmy0pjWti1cvPpyk9pCOrQXxPKirsBUerQ0LK7ltlEuoxA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Nov 21, 2024 at 10:18:41AM +0000, Jonathan Cameron wrote: > The CXL specification release 3.2 is now available under a click through at > https://computeexpresslink.org/cxl-specification/ and it brings new > shiny toys. > > RFC reason > - Whilst trace capture with a particular configuration is potentially useful > the intent is that CXL HMU units will be used to drive various forms of > hotpage migration for memory tiering setups. This driver doesn't do this > (yet), but rather provides data capture etc for experimentation and > for working out how to mostly put the allocations in the right place to > start with by tuning applications. > > CXL r3.2 introduces a CXL Hotness Monitoring Unit definition. The intent > of this is to provide a way to establish which units of memory (typically > pages or larger) in CXL attached memory are hot. The implementation details > and algorithm are all implementation defined. The specification simply > describes the 'interface' which takes the form of ring buffer of hotness > records in a PCI BAR and defined capability, configuration and status > registers. > > The hardware may have constraints on what it can track, granularity etc > and on how accurately it tracks (e.g. counter exhaustion, inaccurate > trackers). Some of these constraints are discoverable from the hardware > registers, others such as loss of accuracy have no universally accepted > measures as they are typically access pattern dependent. Sadly it is > very unlikely any hardware will implement a truly precise tracker given > the large resource requirements for tracking at a useful granularity. > > There are two fundamental operation modes: > > * Epoch based. Counters are checked after a period of time (Epoch) and > if over a threshold added to the hotlist. > * Always on. Counters run until a threshold is reached, after that the > hot unit is added to the hotlist and the counter released. > > Counting can be filtered on: > > * Region of CXL DPA space (256MiB per bit in a bitmap). > * Type of access - Trusted and non trusted or non trusted only, R/W/RW > > Sampling can be modified by: > > * Downsampling including potentially randomized downsampling. > > The driver presented here is intended to be useful in its own right but > also to act as the first step of a possible path towards hotness monitoring > based hot page migration. Those steps might look like. > > 1. Gather data - drivers provide telemetry like solutions to get that > data. May be enhanced, for example in this driver by providing the > HPA address rather than DPA Unit Address. Userspace can access enough > information to do this so maybe not. > 2. Userspace algorithm development, possibly combined with userspace > triggered migration by PA. Working out how to use different levels > of constrained hardware resources will be challenging. FWIW this is what i was thinking about for this extension: https://lore.kernel.org/all/20240319172609.332900-1-gregory.price@memverge.com/ At least for testing CHMU stuff. So if anyone is poking at testing such things, they can feel free to use that for prototyping. However, I think there is general discomfort around userspace handling HPA/DPA. So it might look more like echo nr_pages > /sys/.../tiering/nodeN/promote_pages rather than handling the raw data from the CHMU to make decisions. > 3. Move those algorithms in kernel. Will require generalization across > different hotpage trackers etc. > In a longer discussion with Dan, we considered something a little more abstract - like a system that monitors bandwidth and memory access stalls and decide to promote X pages from Y device. This carries a pretty tall generalization cost, but it's pretty exciting to say the least. Definitely worth a discussion for later. > > So far this driver just gives access to the raw data. I will probably kick > of a longer discussion on how to do adaptive sampling needed to actually > use these units for tiering etc, sometime soon (if no one one else beats > me too it). There is a follow up topic of how to virtualize this stuff > for memory stranding cases (VM gets a fixed mixture of fast and slow > memory and should do it's own tiering). > Without having looked at the patches yet, I would presume this interface is at least gated to admin/root? (raw data is physical address info) ~Gregory