From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C32BEE49AC for ; Mon, 21 Aug 2023 23:31:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B9371940014; Mon, 21 Aug 2023 19:31:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B1D1B94000B; Mon, 21 Aug 2023 19:31:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9963D940014; Mon, 21 Aug 2023 19:31:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 8086C94000B for ; Mon, 21 Aug 2023 19:31:03 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 51F88C0144 for ; Mon, 21 Aug 2023 23:31:03 +0000 (UTC) X-FDA: 81149709606.13.5AC487A Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by imf07.hostedemail.com (Postfix) with ESMTP id 650F34001C for ; Mon, 21 Aug 2023 23:31:00 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=l5PtWbqI; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692660661; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=p60VVyGOSjgDZiOhr/2Y0mzA5nduFpQy6wAVZyV2bDA=; b=l2A7/XQmpJSuPnbqp3v9GsP+tQXVWaNVwfy1ISULHKjrj2DLShVK5b3BtchMNwjY6+uN7D YWxNGHT7QDL6tA+IgJnlfSr5l1VB5r51/oMybvkT5zT96/j7hMBn10CCUrJVzXzJMDpqQn 8ZQWSsgBzxEIt2B41LpAE4tx8CDtzg0= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=l5PtWbqI; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692660661; a=rsa-sha256; cv=none; b=zILIWgXty+1DHmUjGUkDbItP/gUXPR61JNIB2M7mmVxkQrHKpUyRwwitqZO1X2AoVueY3T 7snS8QLvO4jCIKfCSSN7Zx/rkCF3WWFvoqhzv/omFHZKOanhxcw7kWZEvY5JQ7hSnF/RJI 1px8LWKk8Kd8OpAysbCfPBdz9oPHIag= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1692660660; x=1724196660; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=EkN9k5iykyBDgII9+Wb82RB1BZVZdD0TgXowwLM8pWs=; b=l5PtWbqISvNxVsfJCkumf8/6RF2tvYIEiw1lm2ETq61vJNCVmsQNILDz g8EJJGaFFZncbFhLcZNosnp3sk+aswTV9VpN1c1ozND2MrFUbl/1AVVGS 5SJI01NLlmJMcVVElMoLHc+YkXuCmjJFG5NWlE4/sfFLKVlEZAU0lrJc+ OA4weEF7vFBquxXCOlT24HZi9EAknlJsx+mM8OpWiNjpYzyaHxrOEs1kS cGjJiJw1FoKNwHfPa6RYxtim9YtyXTA+/xCaCxL5A4CMu8nekTgFAAIdw gZBIFlT59SPPand2YRj6RI+LeAVSoQTtRTBZxcyGFrU5iFmTDLsOZ8WTH Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10809"; a="354052677" X-IronPort-AV: E=Sophos;i="6.01,191,1684825200"; d="scan'208";a="354052677" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2023 16:30:58 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10809"; a="1066803842" X-IronPort-AV: E=Sophos;i="6.01,191,1684825200"; d="scan'208";a="1066803842" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga005-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 21 Aug 2023 16:30:54 -0700 From: "Huang, Ying" To: Alistair Popple Cc: Andrew Morton , , , , , , "Aneesh Kumar K . V" , Wei Xu , Dan Williams , Dave Hansen , "Davidlohr Bueso" , Johannes Weiner , "Jonathan Cameron" , Michal Hocko , Yang Shi , Rafael J Wysocki Subject: Re: [PATCH RESEND 3/4] acpi, hmat: calculate abstract distance with HMAT References: <20230721012932.190742-1-ying.huang@intel.com> <20230721012932.190742-4-ying.huang@intel.com> <87ila8zo80.fsf@nvdebian.thelocal> <87h6psxzak.fsf@yhuang6-desk2.ccr.corp.intel.com> <878ra4wqz0.fsf@nvdebian.thelocal> Date: Tue, 22 Aug 2023 07:28:49 +0800 In-Reply-To: <878ra4wqz0.fsf@nvdebian.thelocal> (Alistair Popple's message of "Mon, 21 Aug 2023 21:53:13 +1000") Message-ID: <87edjwc6vi.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 650F34001C X-Rspam-User: X-Rspamd-Server: rspam02 X-Stat-Signature: fzyufjigcky11orwtkph7paf8at75nzr X-HE-Tag: 1692660660-960513 X-HE-Meta: U2FsdGVkX1/9BEjcK37jMuAee8PltOo/eJejTzy1lhPf7Teh8jiMDwOdEkacORuMjm51djmEIKKLcC2MSh2Mle3s9biYmF+0Pubo/AfPoqD5lYUViZZO+82CeGm4nAnwY6yAWqwBT/Rkhrf5d44E3I/X8ybo6urcVLlJ7XvsTsncB6Y8rN5bE4WSJmVeuZxC8mX5ITmqj16fPpuIeIj+hbVjUQcejPgswZmuD1NKZ3YHOOOVNEGKIQ1676j9uLTlQuMhZH407RFwPC29bxmoOCqHxOnaX5z2dmSaWoV3NjV9HtMQPOTzWIJaBDkQWxJgJYUcQmifISUU+S+di4+s++isHGNoEYlLx3ByHJ9ZXRw+0RW8CDLCCuvlwnDseraQsjAZni9O79y1HUHfZPMB1yMquHFZitfVqe6aWeRj8hzuGcwxDPGpADZPc1vnVJi1czZINKbm9hIJ/Q8PsK+X6dBC4eFKjJVdlyyH1+bH/TEv4MfDNpo5qX+qqdJ8bYfOgDZ4oCa+A01p7Ulsjihl4UrU7+OZJOxciSRi3KcptM2NsthdzE2K15hbSPANL9ensEXTM1AenAqt8uQk5GGt2ObN8XrSXoxJMzOTOcI6xlSgz90nAMKY6JTbX3oO5yTrU9Xbfp1qqEfw4Aibv49oOMDivrg6YNX5BzfmeincEUdsBBuJlDCqQ7vxNJDvXf95pVxAKqeapqqL6t8clabjdtHVaya8LNNpaxtzzvUQrAC02uNjY2zHE9HVQIRyTdGOjNMfloBANJDP+PL987iOxYPKtjfMcin48JqgiVnq/F0jNgKWfG7Sc0bL8U5EmBvza+HiuozQ4DccDx+B+k/B/ODOJe49nOLVUvzKd0PcInsVxK0AmO5N5HwZ5KJ0D6HCNGVplC9e11n5MEnU84Ko/DD0/9EkwtwLtlyJ+eikW6wBTxIgA59AhW79S0Jg6NIjKq2H8o+/a6WT5bEF6dE poig5qSm rWCsvc4WOoRP7HnAmZ/iJZ93C0lKgLnBvrUR6Pd9WhcBw5OrfQ1uiiqS1SRGo8Os0VLoZCENaUgWhAsMGDP/Ji++ALjwAuW+K+eOCwWj43WJPA8HCnL8jfYB7c4ZCan408VBG3LmmwwzJ+KqKXrU8yC17tVaBVnTOUWj46JXLh5inKVlPb3TxW6qxY/WMBGqTucydV8Ln7t8dsXg3HJ1p0rwk2CMreGwAzyaPM+MFeohFVB235kwhxgTucqDk3lojcLnwtMz45UrYuBvzoCPqN5K4gmgwCwFC/PbCbY0RfbJETr6Y+kyVFIaNNCG/4pXtumXC3UdrgrZ8q2Y= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Alistair Popple writes: > "Huang, Ying" writes: > >> Alistair Popple writes: >> >>> Huang Ying writes: >>> >>>> A memory tiering abstract distance calculation algorithm based on ACPI >>>> HMAT is implemented. The basic idea is as follows. >>>> >>>> The performance attributes of system default DRAM nodes are recorded >>>> as the base line. Whose abstract distance is MEMTIER_ADISTANCE_DRAM. >>>> Then, the ratio of the abstract distance of a memory node (target) to >>>> MEMTIER_ADISTANCE_DRAM is scaled based on the ratio of the performance >>>> attributes of the node to that of the default DRAM nodes. >>> >>> The problem I encountered here with the calculations is that HBM memory >>> ended up in a lower-tiered node which isn't what I wanted (at least when >>> that HBM is attached to a GPU say). >> >> I have tested the series on a server machine with HBM (pure HBM, not >> attached to a GPU). Where, HBM is placed in a higher tier than DRAM. > > Good to know. > >>> I suspect this is because the calculations are based on the CPU >>> point-of-view (access1) which still sees lower bandwidth to remote HBM >>> than local DRAM, even though the remote GPU has higher bandwidth access >>> to that memory. Perhaps we need to be considering access0 as well? >>> Ie. HBM directly attached to a generic initiator should be in a higher >>> tier regardless of CPU access characteristics? >> >> What's your requirements for memory tiers on the machine? I guess you >> want to put GPU attache HBM in a higher tier and put DRAM in a lower >> tier. So, cold HBM pages can be demoted to DRAM when there are memory >> pressure on HBM? This sounds reasonable from GPU point of view. > > Yes, that is what I would like to implement. > >> The above requirements may be satisfied via calculating abstract >> distance based on access0 (or combined with access1). But I suspect >> this will be a general solution. I guess that any memory devices that >> are used mainly by the memory initiators other than CPUs want to put >> themselves in a higher memory tier than DRAM, regardless of its >> access0. > > Right. I'm still figuring out how ACPI HMAT fits together but that > sounds reasonable. > >> One solution is to put GPU HBM in the highest memory tier (with smallest >> abstract distance) always in GPU device driver regardless its HMAT >> performance attributes. Is it possible? > > It's certainly possible and easy enough to do, although I think it would > be good to provide upper and lower bounds for HMAT derived adistances to > make that easier. It does make me wonder what the point of HMAT is if we > have to ignore it in some scenarios though. But perhaps I need to dig > deeper into the GPU values to figure out how it can be applied correctly > there. In the original design (page 11 of [1]), [1] https://lpc.events/event/16/contributions/1209/attachments/1042/1995/Live%20In%20a%20World%20With%20Multiple%20Memory%20Types.pdf the default memory tier hierarchy is based on the performance from CPU point of view. Then the abstract distance of a memory type (e.g., GPU HBM) can be adjusted via a sysfs knob (/abstract_distance_offset) based on the requirements of GPU. That's another possible solution. -- Best Regards, Huang, Ying