From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D00D6ECAAA1 for ; Fri, 2 Sep 2022 05:40:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0C941800B6; Fri, 2 Sep 2022 01:40:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 084BC8008D; Fri, 2 Sep 2022 01:40:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E35EC800B6; Fri, 2 Sep 2022 01:40:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id CDFD88008D for ; Fri, 2 Sep 2022 01:40:58 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id A5A481A0412 for ; Fri, 2 Sep 2022 05:40:58 +0000 (UTC) X-FDA: 79866046596.28.07B0B33 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by imf07.hostedemail.com (Postfix) with ESMTP id D9DFE4000D for ; Fri, 2 Sep 2022 05:40:56 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1662097257; x=1693633257; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version:content-transfer-encoding; bh=BfQPvi3TdP7Kz3uTSqeZ22PJmFXw1j3AImJ2haLhM+M=; b=hHyGT7eiABldN2Ckpx15LmY20acE1XtyFWt8odnYQGv+hBDlluLWTD7F 46D+tSXFgJSz0+02KHaJpFnX3TSp+p4Je09LNxbTQfjttEle/TvVHfUb3 NSLR2CxwXEZvuBdTJpeSHGrBFXm3Oxz9eIVEXoEs2etI/6sUC+FYSOvDf oAw02xWYyacsvGJHezgf+xVT8DiodvX5WVQCmEkwOi0UUtefPN4J/2su7 Lg/ctIHka90bTkivjzrPrhCWP2KUmy48LUumw9NO8FJIuCWWD+It8dFvi l2Jkpe5XLvdTrFzSSXpEMWOs7vQLYGnVl9MPjfd2xkHZ5iVER4HBl+sSl A==; X-IronPort-AV: E=McAfee;i="6500,9779,10457"; a="294639746" X-IronPort-AV: E=Sophos;i="5.93,283,1654585200"; d="scan'208";a="294639746" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Sep 2022 22:40:55 -0700 X-IronPort-AV: E=Sophos;i="5.93,283,1654585200"; d="scan'208";a="941158445" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 01 Sep 2022 22:40:51 -0700 From: "Huang, Ying" To: Aneesh Kumar K V Cc: Wei Xu , Johannes Weiner , Linux MM , Andrew Morton , Yang Shi , Davidlohr Bueso , Tim C Chen , Michal Hocko , Linux Kernel Mailing List , Hesham Almatary , Dave Hansen , Jonathan Cameron , Alistair Popple , Dan Williams , jvgediya.oss@gmail.com, Bharata B Rao , Greg Thelen Subject: Re: [PATCH v3 updated] mm/demotion: Expose memory tier details via sysfs References: <20220830081736.119281-1-aneesh.kumar@linux.ibm.com> <87tu5rzigc.fsf@yhuang6-desk2.ccr.corp.intel.com> <87pmgezkhp.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Fri, 02 Sep 2022 13:40:50 +0800 In-Reply-To: (Aneesh Kumar K. V.'s message of "Fri, 2 Sep 2022 10:53:40 +0530") Message-ID: <87fshaz63h.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1662097258; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qa3k8iSdK5wYDjV8ewXtRvniN9Rpuz6SMAGpL2DAhaE=; b=ftPWU56SeazdFtwmeux6l2NgIozauh3Z+MV5Kvm4Tl1GKS3KY+bAfkvGSppgHT5RNkVAEa v/CWYkZqhze0eXJL+kPJ9YVzQwJGa8qA8eTQif6MwTtCBQjNni8AOO9T+xIJlGyAcs/TWR JRwtcJD8Ck1EOjSPiT+71w2Xy9r2mmA= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=hHyGT7ei; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1662097258; a=rsa-sha256; cv=none; b=LRnkDHi9d//oTV5y2bywGk8ARGdnc4XR6bt682+PYnLpY7fkOr6STI4ZRMAtABw4FtqtAp URCyjKQ8zlDnjyjhfyN2IUHxUjm49tP2bvU5/pOrHdLNGSgXxf05yoZFUga5ukYbWUggJ/ uOJ/h6+5Bb39UcpdOlvggIzxV9Land8= Authentication-Results: imf07.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=hHyGT7ei; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.120 as permitted sender) smtp.mailfrom=ying.huang@intel.com X-Rspamd-Queue-Id: D9DFE4000D X-Stat-Signature: tramhe443eum9judarhbmiymq5g13qp6 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1662097256-584427 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Aneesh Kumar K V writes: > On 9/2/22 10:39 AM, Wei Xu wrote: >> On Thu, Sep 1, 2022 at 5:33 PM Huang, Ying wrote: >>> >>> Aneesh Kumar K V writes: >>> >>>> On 9/1/22 12:31 PM, Huang, Ying wrote: >>>>> "Aneesh Kumar K.V" writes: >>>>> >>>>>> This patch adds /sys/devices/virtual/memory_tiering/ where all memor= y tier >>>>>> related details can be found. All allocated memory tiers will be lis= ted >>>>>> there as /sys/devices/virtual/memory_tiering/memory_tierN/ >>>>>> >>>>>> The nodes which are part of a specific memory tier can be listed via >>>>>> /sys/devices/virtual/memory_tiering/memory_tierN/nodes >>>>> >>>>> I think "memory_tier" is a better subsystem/bus name than >>>>> memory_tiering. Because we have a set of memory_tierN devices inside. >>>>> "memory_tier" sounds more natural. I know this is subjective, just my >>>>> preference. >>>>> > > > I missed replying to this earlier. I will keep memory_tiering as subsyste= m name in v4=20 > because we would want it to a susbsystem where all memory tiering related= details can be found > including memory type in the future. This is as per discussion=20 > > https://lore.kernel.org/linux-mm/CAAPL-u9TKbHGztAF=3Dr-io3gkX7gorUunS2Ufs= tudCWuihrA=3D0g@mail.gmail.com I don't think that it's a good idea to mix 2 types of devices in one subsystem (bus). If my understanding were correct, that breaks the driver core convention. >>>>>> >>>>>> A directory hierarchy looks like >>>>>> :/sys/devices/virtual/memory_tiering$ tree memory_tier4/ >>>>>> memory_tier4/ >>>>>> =E2=94=9C=E2=94=80=E2=94=80 nodes >>>>>> =E2=94=9C=E2=94=80=E2=94=80 subsystem -> ../../../../bus/memory_tier= ing >>>>>> =E2=94=94=E2=94=80=E2=94=80 uevent >>>>>> >>>>>> All toptier nodes are listed via >>>>>> /sys/devices/virtual/memory_tiering/toptier_nodes >>>>>> >>>>>> :/sys/devices/virtual/memory_tiering$ cat toptier_nodes >>>>>> 0,2 >>>>>> :/sys/devices/virtual/memory_tiering$ cat memory_tier4/nodes >>>>>> 0,2 >>>>> >>>>> I don't think that it is a good idea to show toptier information in u= ser >>>>> space interface. Because it is just a in kernel implementation >>>>> details. Now, we only promote pages from !toptier to toptier. But >>>>> there may be multiple memory tiers in toptier and !toptier, we may >>>>> change the implementation in the future. For example, we may promote >>>>> pages from DRAM to HBM in the future. >>>>> >>>> >>>> >>>> In the case you describe above and others, we will always have a list = of >>>> NUMA nodes from which memory promotion is not done. >>>> /sys/devices/virtual/memory_tiering/toptier_nodes shows that list. >>> >>> I don't think we will need that interface if we don't restrict promotion >>> in the future. For example, he can just check the memory tier with >>> smallest number. >>> >>> TBH, I don't know why do we need that interface. What is it for? We >>> don't want to expose unnecessary information to restrict our in kernel >>> implementation in the future. >>> >>> So, please remove that interface at least before we discussing it >>> thoroughly. >>=20 >> I have asked for this interface to allow the userspace to query a list >> of top-tier nodes as the targets of userspace-driven promotions. The >> idea is that demotion can gradually go down tier by tier, but we >> promote hot pages directly to the top-tier and bypass the immediate >> tiers. >>=20 >> Certainly, this can be viewed as a policy choice. Given that now we >> have a clearly defined memory tier hierarchy in sysfs and the >> toptier_nodes content can be constructed from this memory tier >> hierarchy and other information from the node sysfs interfaces, I am >> fine if we want to remove toptier_nodes and keep the current memory >> tier sysfs interfaces to the minimal. >> > > > Ok I can do a v4 with toptier_nodes dropped. Thanks! Best Regards, Huang, Ying