From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1BAEFA373D for ; Tue, 1 Nov 2022 03:17:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D1B9A6B0072; Mon, 31 Oct 2022 23:17:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CCC1C6B0073; Mon, 31 Oct 2022 23:17:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBA4E6B0074; Mon, 31 Oct 2022 23:17:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AC8CE6B0072 for ; Mon, 31 Oct 2022 23:17:56 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5DC551C66FD for ; Tue, 1 Nov 2022 03:17:56 +0000 (UTC) X-FDA: 80083414152.21.D13FC0D Received: from mga03.intel.com (mga03.intel.com [134.134.136.65]) by imf28.hostedemail.com (Postfix) with ESMTP id 24A4DC0010 for ; Tue, 1 Nov 2022 03:17:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1667272675; x=1698808675; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=gXFkpBk6Mk0KHfml+Hm4uAnHyTOnb+ioyvEbwdWn2Rs=; b=UaskaCEcVHSAtFOyK1duRQdZQKF/GOJjGiedIog9mrzijQRWEI1wB1I1 bmoSVu6x2BR4YTfvO0Ya3Jo1pscA8+zzT8HsdoIjsSxnxZHXd25w+AEak kJoyzyRWOkGROnHb82Xnu9nugbg2JXrZBq533wxpYMYGEV96+cX6zlf3x trO2rvL5GK+07OFx/Nb+MJDMHL2YnsNAOX7EQXFaC3oZKJU7YvYiwgrta 3A9QVmzY9mdgehOZzuRqS4oGvc6D+hxbsQt5KkgJpRd+ZOuQCBTANyPiT PdxuXLirFvwZnItgFTyxNaFHnyG7aM7FV3BTfVfMS2DKunfYZe7C5W1od Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10517"; a="310761431" X-IronPort-AV: E=Sophos;i="5.95,229,1661842800"; d="scan'208";a="310761431" Received: from fmsmga006.fm.intel.com ([10.253.24.20]) by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2022 20:17:53 -0700 X-IronPort-AV: E=McAfee;i="6500,9779,10517"; a="878961560" X-IronPort-AV: E=Sophos;i="5.95,229,1661842800"; d="scan'208";a="878961560" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga006-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 31 Oct 2022 20:17:49 -0700 From: "Huang, Ying" To: Johannes Weiner , Wei Xu , Jonathan Cameron , Yang Shi , Aneesh Kumar K V Cc: Feng Tang , Andrew Morton , Tejun Heo , Zefan Li , Waiman Long , "linux-mm@kvack.org" , "cgroups@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "Hansen, Dave" , "Chen, Tim C" , "Yin, Fengwei" , Michal Hocko Subject: Re: [PATCH] mm/vmscan: respect cpuset policy during page demotion References: <20221026074343.6517-1-feng.tang@intel.com> <87wn8lkbk5.fsf@yhuang6-desk2.ccr.corp.intel.com> <87o7txk963.fsf@yhuang6-desk2.ccr.corp.intel.com> <87fsf9k3yg.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 01 Nov 2022 11:17:04 +0800 In-Reply-To: <87fsf9k3yg.fsf@yhuang6-desk2.ccr.corp.intel.com> (Ying Huang's message of "Thu, 27 Oct 2022 17:31:35 +0800") Message-ID: <87r0yncqj3.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667272676; a=rsa-sha256; cv=none; b=XdYw89KkPMvcG0xB+52Eoge41a5OFJSMiRlVi7a4Bt4J/rHwLnuVKS8iKyd4a33XReEl0y xd1GTWxeBMNmVxmLCtSIYys5IQf3boE+iPho3FeO8vIS/LVjLW98iVy7e7SSrc09UFrO8o b5lQeFWCIr4RFBk1OvlsVkaA9JH2i9s= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=UaskaCEc; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667272676; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PGIOWqWt+O8YwpMbhON6XuwyATiV58xucqkwzdc4pUQ=; b=DYinwOCYF999EwkdXBnFLBt6MbG/tILQpkik+u1enARMWDyRWeSt1DFMb/v7AsozfOzGHQ cjgfrp1Kn7ceBGSvYPyvHKKZE8AHjEVSTJTPAs2Ejx60nn15fxBQ+vwfKxuO50fNcIUgsh etpFYYKSU6sEKDyRcuL1FSpEkrTOzhE= X-Rspamd-Server: rspam02 X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=UaskaCEc; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf28.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.65 as permitted sender) smtp.mailfrom=ying.huang@intel.com X-Stat-Signature: ud8kp9tn5ww35yr3etsdffagw9p5h9hx X-Rspamd-Queue-Id: 24A4DC0010 X-HE-Tag: 1667272674-624520 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: "Huang, Ying" writes: > Michal Hocko writes: > >> On Thu 27-10-22 15:39:00, Huang, Ying wrote: >>> Michal Hocko writes: >>> >>> > On Thu 27-10-22 14:47:22, Huang, Ying wrote: >>> >> Michal Hocko writes: >>> > [...] >>> >> > I can imagine workloads which wouldn't like to get their memory demoted >>> >> > for some reason but wouldn't it be more practical to tell that >>> >> > explicitly (e.g. via prctl) rather than configuring cpusets/memory >>> >> > policies explicitly? >>> >> >>> >> If my understanding were correct, prctl() configures the process or >>> >> thread. >>> > >>> > Not necessarily. There are properties which are per adddress space like >>> > PR_[GS]ET_THP_DISABLE. This could be very similar. >>> > >>> >> How can we get process/thread configuration at demotion time? >>> > >>> > As already pointed out in previous emails. You could hook into >>> > folio_check_references path, more specifically folio_referenced_one >>> > where you have all that you need already - all vmas mapping the page and >>> > then it is trivial to get the corresponding vm_mm. If at least one of >>> > them has the flag set then the demotion is not allowed (essentially the >>> > same model as VM_LOCKED). >>> >>> Got it! Thanks for detailed explanation. >>> >>> One bit may be not sufficient. For example, if we want to avoid or >>> control cross-socket demotion and still allow demoting to slow memory >>> nodes in local socket, we need to specify a node mask to exclude some >>> NUMA nodes from demotion targets. >> >> Isn't this something to be configured on the demotion topology side? Or >> do you expect there will be per process/address space usecases? I mean >> different processes running on the same topology, one requesting local >> demotion while other ok with the whole demotion topology? > > I think that it's possible for different processes have different > requirements. > > - Some processes don't care about where the memory is placed, prefer > local, then fall back to remote if no free space. > > - Some processes want to avoid cross-socket traffic, bind to nodes of > local socket. > > - Some processes want to avoid to use slow memory, bind to fast memory > node only. Hi, Johannes, Wei, Jonathan, Yang, Aneesh, We need your help. Do you or your organization have requirements to restrict the page demotion target nodes? If so, can you share some details of the requirements? For example, to avoid cross-socket traffic, or to avoid using slow memory. And do you want to restrict that with cpusets, memory policy, or some other interfaces. Best Regards, Huang, Ying