From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6843BECAAD6 for ; Fri, 26 Aug 2022 08:01:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA004940008; Fri, 26 Aug 2022 04:01:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4E27940007; Fri, 26 Aug 2022 04:01:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1690940008; Fri, 26 Aug 2022 04:01:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 9F5D7940007 for ; Fri, 26 Aug 2022 04:01:10 -0400 (EDT) Received: from smtpin31.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 686EC140D7F for ; Fri, 26 Aug 2022 08:01:10 +0000 (UTC) X-FDA: 79840998300.31.7C1F496 Received: from mail-pj1-f49.google.com (mail-pj1-f49.google.com [209.85.216.49]) by imf24.hostedemail.com (Postfix) with ESMTP id 1A4B618001E for ; Fri, 26 Aug 2022 08:01:09 +0000 (UTC) Received: by mail-pj1-f49.google.com with SMTP id o4so912189pjp.4 for ; Fri, 26 Aug 2022 01:01:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc; bh=ZXxV5H02sTf27cGjR4xFe9u/ajRHs1osMk1ZZ7VKeFA=; b=GZ9AZ3KmTgLjSIPCC9LY9iLyjXG05/sw6lrzkIivXH45QE0379fs/LAe6C0lMHu0wo cf0J0z/P1iD36Iq2KzxKO3pMAKh36CIW8t1q29VJVXWNEXZYkfZRnXoZADo9PSiuH0Cr 4qTdTNOSHOX5ejQSovpO6oP4GRp6d8v7+Dj6GKgYCysZ4GnI+WBZD/1T0g0MBVrpHCG6 sbx0NcdX0JMzRepJgi9zWFSJbtWvNXxuwE6ci5R4ttKYrcUhOjJ+LIxLItLdm2gk1nob PaZTmEfZ+RbWMijQBTI6ELPlGBoGB3n4XOQ+Kqc3y77dUouCKgIDH0siLGpwFz7i6GLL 5vxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc; bh=ZXxV5H02sTf27cGjR4xFe9u/ajRHs1osMk1ZZ7VKeFA=; b=qLT5ElSsYoTa83ES8b1XnPhImfLCSaECQ0Skrn8dby3JfoFklB6wmyuKdjk30uuLkf 6TOZHt1m9xKn5Im0Cg/+qCm2LBhoWd7kKNzmL8aa8wMeASFusOHY/RuLG774nxInoviE tO0ON44pa3o7Na/HoBfIL28ByCIfBXMMmtnL8hqu9cTNWRdfC49PbdhQ2zFdmYogfkF2 WiBvAvXTZX+8AdnahnAoB2o9JNfq/jaekqU+8ihyOmDCPaB2KIDPSRdfG1CUs89mABfn fy9CTB/+k/ukdOzajpCGDvekftI3AiWvttv6/DkZnzwwW82eJ//7jRN4D98qSu9ksZJT l4BQ== X-Gm-Message-State: ACgBeo3yaKwFHgR2ageE3koFpoe2nF5ke0pam0S7qWpkuJ3iIIwWkZRc d6e2T9aU6YsgHAI4ixyB5IvLo1e/z6G/LUVtlb+ZfQ== X-Google-Smtp-Source: AA6agR4Wl3OSQI8TTZZEDjCFzfdhMn02PO6KNg3osP2Wr6z+vXCI1pzBuek699RxCaUvkzjWdvEMDg27yJWC274avaI= X-Received: by 2002:a17:902:d643:b0:172:84c4:d513 with SMTP id y3-20020a170902d64300b0017284c4d513mr2653434plh.138.1661500868782; Fri, 26 Aug 2022 01:01:08 -0700 (PDT) MIME-Version: 1.0 References: <20220825092325.381517-1-aneesh.kumar@linux.ibm.com> <877d2v3h8s.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Wei Xu Date: Fri, 26 Aug 2022 01:00:57 -0700 Message-ID: Subject: Re: [RFC PATCH 1/2] mm/demotion: Expose memory type details via sysfs To: Aneesh Kumar K V Cc: "Huang, Ying" , Linux MM , Andrew Morton , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , Johannes Weiner , jvgediya.oss@gmail.com, Bharata B Rao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=GZ9AZ3Km; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of weixugc@google.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=weixugc@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661500870; a=rsa-sha256; cv=none; b=Tl+1PeiBDTL395qvApKNzlEgdg2aGSSsgLdeYbvLtMvW2iOEud30DFVf/D6S7Da1zxUYlq SKp/Okg54G360sv0VsJppTJ9CRIaikBCkerfIF6S4z7jtkqjNxGyoedL5dmhvF7cvWL/aN p2cvP7iZIxEfUEsAsm1CMWNG4SoihYo= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661500870; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZXxV5H02sTf27cGjR4xFe9u/ajRHs1osMk1ZZ7VKeFA=; b=BCEFtdDNF/83Ma9rw1jkUicR6bdRWMxYUw95q/KntTGez9tX4VewWGV2O6BUtPZCCvUsmu TqVTTIa2j1PvKTwySh0JwLsAWOJOZpkBKYx7XJGx1e7AgpI3BjrhZJ1XzRz1amWF5pFMNr gGYGb3rZMeK6rg1IpTl3m7hkcz/G0HE= X-Stat-Signature: chwort4apjnycib4jqncm7mpp8unurxc Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=GZ9AZ3Km; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf24.hostedemail.com: domain of weixugc@google.com designates 209.85.216.49 as permitted sender) smtp.mailfrom=weixugc@google.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 1A4B618001E X-Rspam-User: X-HE-Tag: 1661500869-125258 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Aug 25, 2022 at 8:00 PM Aneesh Kumar K V wrote: > > On 8/26/22 7:20 AM, Huang, Ying wrote: > > "Aneesh Kumar K.V" writes: > > > >> This patch adds /sys/devices/virtual/memtier/ where all memory tier re= lated > >> details can be found. All allocated memory types will be listed there = as > >> /sys/devices/virtual/memtier/memtypeN/ > > > > Another choice is to make memory types and memory tiers system devices. > > That is, > > > > /sys/devices/system/memory_type/memory_typeN > > /sys/devices/system/memory_tier/memory_tierN > > > > subsys_system_register() documentation says > > * Do not use this interface for anything new, it exists for compatibilit= y > * with bad ideas only. New subsystems should use plain subsystems; and > * add the subsystem-wide attributes should be added to the subsystem > * directory itself and not some create fake root-device placed in > * /sys/devices/system/. > > memtier being a virtual device, I was under the impression that /sys/devi= ces/virtual > is the recommended place. > > > That looks more natural to me. Because we already have "node" and > > "memory" devices there. Why don't you put memory types and memory tier= s > > there? > > > > And, I think we shouldn't put "memory_type" in the "memory_tier" > > directory. "memory_type" isn't a part of "memory_tier". > > > > I was looking consolidating both memory tier and memory type into the sam= e sysfs subsystem. > Your recommendation imply we create two subsystem memory_tier and memtype= . I was > trying to avoid that. May be a generic term like "memory_tiering" can hel= p to > consolidate all tiering related details there? > A generic term "memory_tiering" sounds good to me. Given that this will be a user-facing, stable kernel API, I think we'd better to only add what is most useful for userspace and don't have to mirror the kernel internal data structures in this interface. My understanding is that we haven't fully settled down on how to customize memory tiers from userspace. So we don't have to show memory_type yet, which is a kernel data structure at this point. The userspace does need to know what are the memory tiers and which NUMA nodes are included in each memory tier. How about we provide the "nodelist" interface for each memory tier as in the original proposal? The userspace would also like to know which memory tiers/nodes belong to the top tiers (the promotion targets). We can provide a "toptiers" or "toptiers_nodelist" interface to report that. Both should still be useful even if we decide to add memory_type for memory tier customization. > >> The nodes which are part of a specific memory type can be listed via > >> /sys/devices/system/memtier/memtypeN/nodes. > > > > How about create links to /sys/devices/system/node/nodeN in > > "memory_type". But I'm OK to have "nodes" file too. > > > >> The adistance value of a specific memory type can be listed via > >> /sys/devices/system/memtier/memtypeN/adistance. > >> > >> A directory listing looks like: > >> :/sys/devices/virtual/memtier# tree memtype1 > >> memtype1 > >> =E2=94=9C=E2=94=80=E2=94=80 adistance > > > > Why not just use "abstract_distance"? This is user space interface, > > it's better to be intuitive. > > > >> =E2=94=9C=E2=94=80=E2=94=80 nodes > >> =E2=94=9C=E2=94=80=E2=94=80 subsystem -> ../../../../bus/memtier > >> =E2=94=94=E2=94=80=E2=94=80 uevent > >> > >> Since we will be using struct device to expose details via sysfs, drop= struct > >> kref and use struct device for refcounting the memtype. > >> > > > > Best Regards, > > Huang, Ying >