From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E3125C4332F for ; Wed, 2 Nov 2022 08:46:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C86C8E0002; Wed, 2 Nov 2022 04:46:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 779A18E0001; Wed, 2 Nov 2022 04:46:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63FF18E0002; Wed, 2 Nov 2022 04:46:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 53A798E0001 for ; Wed, 2 Nov 2022 04:46:24 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 25D281A1115 for ; Wed, 2 Nov 2022 08:46:24 +0000 (UTC) X-FDA: 80087870688.18.680730C Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by imf17.hostedemail.com (Postfix) with ESMTP id 15E1740003 for ; Wed, 2 Nov 2022 08:46:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667378783; x=1698914783; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=aS31B0qCdqNEVkcc38xnNSgQkHYgfw1HrY1wqPLFUtU=; b=brK0383t4Ub5ivgJi9+8QDMIrznzQAQcZXxShIb8iVc51kftdUWZI6Lz 1ifo4JPeW9G6r5AlnPTc4WQ6J/UGKegUl26M/SO+RR8oGlOQ1Z1XmS6T0 7wZQiOGhl6oCXUR/rgnGA//dX2xGbFITj7Ein9BzNGKH/JydKNOV5+XVt wedAxj+J6QiF5NksOnq8lilla5tZySwvi4SygSWSU+Bvj7xVZDKlAro5S IBr/THCzhRPCtABSsXhgU/PcTTqHPh9Hyl59nSWhrXRPeulhbNK9rh88i Unipt+U7ncs30cqDluG5uBHo1W8cAb9cgV+f9MHQqQk94QFjRG5kYSuRi A==; X-IronPort-AV: E=McAfee;i="6500,9779,10518"; a="292661597" X-IronPort-AV: E=Sophos;i="5.95,232,1661842800"; d="scan'208";a="292661597" Received: from orsmga002.jf.intel.com ([10.7.209.21]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2022 01:46:21 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10518"; a="634180580" X-IronPort-AV: E=Sophos;i="5.95,232,1661842800"; d="scan'208";a="634180580" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga002-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2022 01:46:17 -0700 From: "Huang, Ying" To: Michal Hocko Cc: Bharata B Rao , Aneesh Kumar K V , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Alistair Popple , Dan Williams , Dave Hansen , Davidlohr Bueso , Hesham Almatary , Jagdish Gediya , Johannes Weiner , Jonathan Cameron , Tim Chen , Wei Xu , Yang Shi Subject: Re: [RFC] memory tiering: use small chunk size and more tiers References: <0d938c9f-c810-b10a-e489-c2b312475c52@amd.com> <87tu3oibyr.fsf@yhuang6-desk2.ccr.corp.intel.com> <07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com> <87leowepz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <878rkuchpm.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkppbx75.fsf@yhuang6-desk2.ccr.corp.intel.com> <877d0dbw13.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 02 Nov 2022 16:45:38 +0800 In-Reply-To: (Michal Hocko's message of "Wed, 2 Nov 2022 09:39:25 +0100") Message-ID: <8735b1bv7x.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667378783; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nBU2uKBZJkhLDSkLei84kmcAK9ewj/ssKD1ALUI5KMw=; b=7Q5d62pG6T7B1k+nNV6/P1KjMMYy3NV7OqGQa/DTXHVQ81mbpqL9ZPHXfo1lXA2MdZmMqW nwWWXQkWhA1JW+f5TZNXqjD+1TTnkhFxH3sdxUcStBOUoe5T6EuqcF5e7SPsezxIiwcnS3 2Jg8fZJY+l48DRBry/2hSI9RfVu0AzQ= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=brK0383t; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf17.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667378783; a=rsa-sha256; cv=none; b=GlmLc95rOm4U7DkhtF4NW0l166WYVUUw2LEScfyw1lLOAXEfS8stwfm+YacGgjGv52HR/g lp/nVx3gd0KkL+hRvKM4c9nmAQu0DGU5UCaMLKL9Ndzu7snfCLXnoHf+GqQlQ7sBO5L6CF ZElLk7UMWihgSj6YrYqhJ3SvsW7qyYY= Authentication-Results: imf17.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=brK0383t; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf17.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.126 as permitted sender) smtp.mailfrom=ying.huang@intel.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 15E1740003 X-Stat-Signature: qtsmtdz1q79ygag844iooc9byktkg8bb X-HE-Tag: 1667378782-916089 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Michal Hocko writes: > On Wed 02-11-22 16:28:08, Huang, Ying wrote: >> Michal Hocko writes: >> >> > On Wed 02-11-22 16:02:54, Huang, Ying wrote: >> >> Michal Hocko writes: >> >> >> >> > On Wed 02-11-22 08:39:49, Huang, Ying wrote: >> >> >> Michal Hocko writes: >> >> >> >> >> >> > On Mon 31-10-22 09:33:49, Huang, Ying wrote: >> >> >> > [...] >> >> >> >> In the upstream implementation, 4 tiers are possible below DRAM. That's >> >> >> >> enough for now. But in the long run, it may be better to define more. >> >> >> >> 100 possible tiers below DRAM may be too extreme. >> >> >> > >> >> >> > I am just curious. Is any configurations with more than couple of tiers >> >> >> > even manageable? I mean applications have been struggling even with >> >> >> > regular NUMA systems for years and vast majority of them is largerly >> >> >> > NUMA unaware. How are they going to configure for a more complex system >> >> >> > when a) there is no resource access control so whatever you aim for >> >> >> > might not be available and b) in which situations there is going to be a >> >> >> > demand only for subset of tears (GPU memory?) ? >> >> >> >> >> >> Sorry for confusing. I think that there are only several (less than 10) >> >> >> tiers in a system in practice. Yes, here, I suggested to define 100 (10 >> >> >> in the later text) POSSIBLE tiers below DRAM. My intention isn't to >> >> >> manage a system with tens memory tiers. Instead, my intention is to >> >> >> avoid to put 2 memory types into one memory tier by accident via make >> >> >> the abstract distance range of each memory tier as small as possible. >> >> >> More possible memory tiers, smaller abstract distance range of each >> >> >> memory tier. >> >> > >> >> > TBH I do not really understand how tweaking ranges helps anything. >> >> > IIUC drivers are free to assign any abstract distance so they will clash >> >> > without any higher level coordination. >> >> >> >> Yes. That's possible. Each memory tier corresponds to one abstract >> >> distance range. The larger the range is, the higher the possibility of >> >> clashing is. So I suggest to make the abstract distance range smaller >> >> to reduce the possibility of clashing. >> > >> > I am sorry but I really do not understand how the size of the range >> > actually addresses a fundamental issue that each driver simply picks >> > what it wants. Is there any enumeration defining basic characteristic of >> > each tier? How does a driver developer knows which tear to assign its >> > driver to? >> >> The smaller range size will not guarantee anything. It just tries to >> help the default behavior. >> >> The drivers are expected to assign the abstract distance based on the >> memory latency/bandwidth, etc. > > Would it be possible/feasible to have a canonical way to calculate the > abstract distance from these characteristics by the core kernel so that > drivers do not even have fall into that trap? Yes. That sounds a good idea. We can provide a function to map from the memory latency/bandwidth to the abstract distance for the drivers. Best Regards, Huang, Ying