From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82CE4C6FD1F for ; Wed, 20 Mar 2024 06:03:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C33C26B0083; Wed, 20 Mar 2024 02:03:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BBBA16B0088; Wed, 20 Mar 2024 02:03:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A5C386B0089; Wed, 20 Mar 2024 02:03:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8EBA36B0083 for ; Wed, 20 Mar 2024 02:03:19 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 21F071C057B for ; Wed, 20 Mar 2024 06:03:19 +0000 (UTC) X-FDA: 81916374918.23.AB2B660 Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.18]) by imf07.hostedemail.com (Postfix) with ESMTP id F0A4C4001A for ; Wed, 20 Mar 2024 06:03:15 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=EDfb0OTh; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1710914596; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nTCEjQ4GYnnlbd0IJRPDNZOHaygak78LoDtXGh+dPwo=; b=u4jv6zLdqUMTNA4XruMtgGLfpOU/apaCwmFl6Vu2GF7c0MdLyaPalyVgOjMBD1O+rKrlBk 0vmJoSW9zq5+pqYIYoRn3uNb7Sna+nY6MW+XSAs31Y36kVCnQpbsbnRi6F3b2xSbh/kns4 +viJTGbWrR8NmG3YjYku93m91tpy2i8= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=EDfb0OTh; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf07.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.18 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1710914596; a=rsa-sha256; cv=none; b=WdBGiKLYfax74wMg40jNl9xxRj2kQRnzZqrs1mchFTjXIWOa3177Up+80Oe/aFKW7rXqky ZFO4/UEa7kwPlkAMdSeCvWLsvRywMsRsqP2bCEn9RUlGUfRuWnlV6uXigyJE3FcwMPm3tZ unupc008CqloGm0O3THrX3+G3QL+NB4= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1710914596; x=1742450596; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=z1njJyZGUsYE0iBG4Mm7oywOrUx+EDxwyI1v9sCPhrE=; b=EDfb0OThCKo9K1CIvViEIM6F9shccSKzyRNdLGfOfPsNDmZ9KYkVZ9Yd b2v5IKe76hsOdDNFz4qGTxIzT+d9c0XkSZsTZigNu+Bq10yyN2C1Qu2Wt 3YRizrF9XagHMD/jPj3Zw6D34lyRRKgSifDhfHf1ZzAZSZA/kEZpw89C/ KUctB0zo7QwScWHM1DU/nTkabU8l07raSBTXAtSTmMFGm2rvUYKarlz2k CaDp0yDkmWs/eO0qg3Hb3CegPuBPtBjkZeXWnSaH0GR4nEDP5hwc/J8Bg MqauTtYViSX3oVnOzgVxwR34tn3zuIutSscX9C7inuT90OQlwVHAPwAam w==; X-IronPort-AV: E=McAfee;i="6600,9927,11018"; a="5941783" X-IronPort-AV: E=Sophos;i="6.07,139,1708416000"; d="scan'208";a="5941783" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa110.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Mar 2024 23:03:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,139,1708416000"; d="scan'208";a="18775698" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orviesa005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Mar 2024 23:03:11 -0700 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , , , , , , , , Subject: Re: [RFC v3 0/3] move_phys_pages syscall - migrate page contents given In-Reply-To: (Gregory Price's message of "Wed, 20 Mar 2024 00:39:34 -0400") References: <20240319172609.332900-1-gregory.price@memverge.com> <87v85hsjn7.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 20 Mar 2024 14:01:17 +0800 Message-ID: <87r0g5saqa.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: F0A4C4001A X-Stat-Signature: k3ydn7hi5kuy1f6pn4jars4br5xnoik3 X-HE-Tag: 1710914595-204065 X-HE-Meta: U2FsdGVkX19g1ly2cTaLZGvMRiC1tMBmXbd2Wsr0N3THyN11yt7BRILp++AakECwzVHgpoUHA3QVoJBECzb3n4uYp1LbJqhRIhZb9yXfLorCZP6pZT+Q4rSWJB+KlodLDQWgdnQnnZkb8KvRZsTwbG71lSRVV067c9f68EauWGEeGTAbTH6Z7g2D3gZkhBXcfywtd1/20s9nGOBWUeeLs5zejZnfVVLrXHPifKAYxj6eIF1d8yRWeVIzYy9bAntwVSxwod51aZXyod1IhLReEjhXCmmFNzxaXWxcIpbCV57h+D0aSec57fqwFMC2CtIFPe25/PWLlTyfm15waT3ovccy7g1iQ45BHW77/zXRzPTWisnZedgdBViBJ8qL1KUPgoO1jx/J4y1mpi1palR7TOjht4gfUQMfAv7xxDz1/ssUkMYOU+2Kx+tCuj2oCqD2WSlM3S/lLkcYfTNR3Iqz8MIOAo83O3a8Onn0EtNg/35n4aczWmh+wJSZ27gXLSn6bLixvaNaErGKXv7DtSNgIjLQSUftR2qLTpa6GuF5A43AYIgEKvPZApsO9AI8zrNGndXR8clkjDA283pr4VEwiRH12VSiNIM1GY/sSEaeqnuPHw1jLeZebR6URID7coB6GZkAIJD99JPNKcpQj78V0sCy9YPc3xzcax1ddZewcXoJZIQqbg1dqq0O4koiW8fQaMNGuC+P1eSSXoMgwdY7UAnYRN/UbLiRn9qhHpwJCNDJE6HuyrP7LX247oXdL8MzenNclqQ4kMowsSACmY4lP6Z6nHxaSWExEhrUWUymmK0b5uIXaljVqB2ut+q89IP+SAVZOMLNNs8Xfyo52/VCZvR9MjhoiaZpk05gfSSmr6mTCVxUEasqVSIAgIRXuCgXrqjgnpsa7reazmnfKWdesrm2zfW49FTzJdfyciurSAw1D96ERmpruYXbEpJxrKzVpLvi5AgK4jp6gFjwjxe 7eoL0cP8 eyMSiOVWJ9qzXjvVbTTcYv/DvgzYfvtDF/Ml54Yf6xUUKPEf/Jp2rL7x9hmY3O4kz6IPbW1k6jkdzdTnOH85GdcsKeS1msmMWO5FVyHjxySwsD3BAwkVzBGngxWU+FO4SIVNKCERCMmb+4I2jrIAf8jbFINfpL2OM+C/2l9oj2ySE9SvOTeDpbRmZe474uS5QD40m9Sk/jyhSvnkg3CuDkFJBt8LEhbEp7eG5m99TNrQoG0LFSuVnnVqdJLRHWGZW/uipALk4iSo8nTPD4CCxagC/wQ7NOw3gsZmSbMkKwd1EPVoqrIlvwgYfyLEStKGncX43Ke+J9/B/3harplXwUdguuqoB03xxKznO X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Gregory Price writes: > On Wed, Mar 20, 2024 at 10:48:44AM +0800, Huang, Ying wrote: >> Gregory Price writes: >> >> > Doing this reverse-translation outside of the kernel requires considerable >> > space and compute, and it will have to be performed again by the existing >> > system calls. Much of this work can be avoided if the pages can be >> > migrated directly with physical memory addressing. >> >> One difficulty of the idea of the physical address is that we lacks some >> user space specified policy information to make decision. For example, >> users may want to pin some pages in DRAM to improve latency, or pin some >> pages in CXL memory to do some best effort work. To make the correct >> decision, we need PID and virtual address. >> > > I think of this as a second or third order problem. The core problem > right now isn't the practicality of how userland would actually use this > interface - the core problem is whether the data generated by offloaded > monitoring is even worth collecting and operating on in the first place. > > So this is a quick hack to do some research about whether it's even > worth developing the whole abstraction described by Willy. > > This is why it's labeled RFC. I upped a v3 because I know of two groups > actively looking at using it for research, and because the folio updates > broke the old version. It's also easier for me to engage through the > list than via private channels for this particular work. > > > Do I suggest we merge this interface as-is? No, too many concerns about > side channels. However, it's a clean reuse of move_pages code to > bootstrap the investigation, and it at least gets the gears turning. Got it! Thanks for detailed explanation. I think that one of the difficulties of offloaded monitoring is that it's hard to obey these user specified policies. The policies may become more complex in the future, for example, allocate DRAM among workloads. > Example notes from a sidebar earlier today: > > * An interesting proposal from Dan Williams would be to provide some > sort of `/sys/.../memory_tiering/tierN/promote_hot` interface, with > a callback mechanism into the relevant hardware drivers that allows > for this to be abstracted. This could be done on some interval and > some threshhold (# pages, hotness threshhold, etc). > > > The code to execute promotions ends up looking like what I have now > > 1) Validate the page is elgibile to be promoted by walking the vmas > 2) invoking the existing move_pages code > > The above idea can be implemented trivially in userland without > having to plumb through a whole brand new callback system. > > > Sometimes you have to post stupid ideas to get to the good ones :] > -- Best Regards, Huang, Ying