From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 503D3C433FE for ; Wed, 2 Nov 2022 08:28:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A9EF8E0002; Wed, 2 Nov 2022 04:28:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 959708E0001; Wed, 2 Nov 2022 04:28:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 820E88E0002; Wed, 2 Nov 2022 04:28:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 723B38E0001 for ; Wed, 2 Nov 2022 04:28:54 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 36BE01A0E29 for ; Wed, 2 Nov 2022 08:28:54 +0000 (UTC) X-FDA: 80087826588.03.A2905BD Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by imf24.hostedemail.com (Postfix) with ESMTP id A3AA3180004 for ; Wed, 2 Nov 2022 08:28:52 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667377732; x=1698913732; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=QOcqKq3LeDc4b3+vhsnKQIF0ZdWtllvhnZ+gxttaUdg=; b=ibvVi27Y+NZ33NXSr+CMZ0Fbd0BodKP4dOlMzIBOKbJiXgtOk61QW9t8 OqndJnrx16ahzI9/dRf/3DH5cyOlOjufRNEUvz4y8U1RPWvzB3iZtV5Vg audU73Od/j5dQvtUO9bL3TkKz4rxdO52NZ6ZAfmaAn11XRe0EEv6RG4T0 EDeyZaFDAPiygaKI9kNoG6pQ3wdrx46TTTlsHNnaUb+3QO+dbNxmGE3sz rTL5z1su2Bcq7VVlwFfpW1vNdc2wP6J1+DA4GAeiNcqsoxExphwg457Ts 65RH/o9zWRkAj/p0g3xNNtwwd1F9PGw2CU7BEBNwLw6Ej5fOFzPlWF63d w==; X-IronPort-AV: E=McAfee;i="6500,9779,10518"; a="289734980" X-IronPort-AV: E=Sophos;i="5.95,232,1661842800"; d="scan'208";a="289734980" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2022 01:28:51 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10518"; a="809194684" X-IronPort-AV: E=Sophos;i="5.95,232,1661842800"; d="scan'208";a="809194684" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Nov 2022 01:28:47 -0700 From: "Huang, Ying" To: Michal Hocko Cc: Bharata B Rao , Aneesh Kumar K V , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton , Alistair Popple , Dan Williams , Dave Hansen , Davidlohr Bueso , Hesham Almatary , Jagdish Gediya , Johannes Weiner , Jonathan Cameron , Tim Chen , Wei Xu , Yang Shi Subject: Re: [RFC] memory tiering: use small chunk size and more tiers References: <59291b98-6907-0acf-df11-6d87681027cc@linux.ibm.com> <8735b8jy9k.fsf@yhuang6-desk2.ccr.corp.intel.com> <0d938c9f-c810-b10a-e489-c2b312475c52@amd.com> <87tu3oibyr.fsf@yhuang6-desk2.ccr.corp.intel.com> <07912a0d-eb91-a6ef-2b9d-74593805f29e@amd.com> <87leowepz6.fsf@yhuang6-desk2.ccr.corp.intel.com> <878rkuchpm.fsf@yhuang6-desk2.ccr.corp.intel.com> <87bkppbx75.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 02 Nov 2022 16:28:08 +0800 In-Reply-To: (Michal Hocko's message of "Wed, 2 Nov 2022 09:17:38 +0100") Message-ID: <877d0dbw13.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ibvVi27Y; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667377733; a=rsa-sha256; cv=none; b=XT43x69A/uvUJzZBG0yFm6h2sFC1tEPunHsv3wixEb43tafO94Yayq9aXEpMxgIiIQ53mV uiDvldh9fdNUpAuQg+xr2bhspU77yuYQNAMZ85Ee7rAePmt0JINGjjx/dKQBP9KvEBweHm /Zcxlvpvy4Yfp3/ddIGGicq1fuKHenQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667377733; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lbPyU3HRZr89ul1rKe1fdTYm1QboK+KH68ocAkdOztE=; b=V6VwAz791sG/nATsOiw07buu2lknJzWxb1MRvSXybDsApBc32kqQeOTu7z710h7BPJ023P q3gfHfx7vBSsuMl9xMUJpwe2sD8fHH0X51RkHgML4SZd1AALFY4jkHcFsdBtKlhzkPsB5L L9/dXrtgbJI1cJTMGolL/fO1Uy1ffE0= X-Stat-Signature: irhswzz7z7bwann13uh79yt8fw83skd3 X-Rspamd-Queue-Id: A3AA3180004 Authentication-Results: imf24.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ibvVi27Y; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com X-Rspamd-Server: rspam03 X-Rspam-User: X-HE-Tag: 1667377732-803826 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Michal Hocko writes: > On Wed 02-11-22 16:02:54, Huang, Ying wrote: >> Michal Hocko writes: >> >> > On Wed 02-11-22 08:39:49, Huang, Ying wrote: >> >> Michal Hocko writes: >> >> >> >> > On Mon 31-10-22 09:33:49, Huang, Ying wrote: >> >> > [...] >> >> >> In the upstream implementation, 4 tiers are possible below DRAM. That's >> >> >> enough for now. But in the long run, it may be better to define more. >> >> >> 100 possible tiers below DRAM may be too extreme. >> >> > >> >> > I am just curious. Is any configurations with more than couple of tiers >> >> > even manageable? I mean applications have been struggling even with >> >> > regular NUMA systems for years and vast majority of them is largerly >> >> > NUMA unaware. How are they going to configure for a more complex system >> >> > when a) there is no resource access control so whatever you aim for >> >> > might not be available and b) in which situations there is going to be a >> >> > demand only for subset of tears (GPU memory?) ? >> >> >> >> Sorry for confusing. I think that there are only several (less than 10) >> >> tiers in a system in practice. Yes, here, I suggested to define 100 (10 >> >> in the later text) POSSIBLE tiers below DRAM. My intention isn't to >> >> manage a system with tens memory tiers. Instead, my intention is to >> >> avoid to put 2 memory types into one memory tier by accident via make >> >> the abstract distance range of each memory tier as small as possible. >> >> More possible memory tiers, smaller abstract distance range of each >> >> memory tier. >> > >> > TBH I do not really understand how tweaking ranges helps anything. >> > IIUC drivers are free to assign any abstract distance so they will clash >> > without any higher level coordination. >> >> Yes. That's possible. Each memory tier corresponds to one abstract >> distance range. The larger the range is, the higher the possibility of >> clashing is. So I suggest to make the abstract distance range smaller >> to reduce the possibility of clashing. > > I am sorry but I really do not understand how the size of the range > actually addresses a fundamental issue that each driver simply picks > what it wants. Is there any enumeration defining basic characteristic of > each tier? How does a driver developer knows which tear to assign its > driver to? The smaller range size will not guarantee anything. It just tries to help the default behavior. The drivers are expected to assign the abstract distance based on the memory latency/bandwidth, etc. And the abstract distance range of a memory tier corresponds to a memory latency/bandwidth range too. So, if the size of the abstract distance range is smaller, the possibility for two types of memory with different latency/bandwidth to clash on the abstract distance range is lower. Clashing isn't a totally disaster. We plan to provide a per-memory-type knob to offset the abstract distance provided by driver. Then, we can move clashing memory types away if necessary. Best Regards, Huang, Ying