From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1399CC4332F for ; Wed, 12 Oct 2022 13:07:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6A2A46B0071; Wed, 12 Oct 2022 09:07:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 62A1D6B0073; Wed, 12 Oct 2022 09:07:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 47CDF6B0074; Wed, 12 Oct 2022 09:07:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 30A096B0071 for ; Wed, 12 Oct 2022 09:07:52 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E84A8141277 for ; Wed, 12 Oct 2022 13:07:51 +0000 (UTC) X-FDA: 80012324742.26.91C254B Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf23.hostedemail.com (Postfix) with ESMTP id 3E72C140029 for ; Wed, 12 Oct 2022 13:07:50 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id ECC571F381; Wed, 12 Oct 2022 13:07:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1665580068; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=noRw341ciNnXF2Fe3Zu8GlyFJvJlbriboL+oH8utWA8=; b=Hpqv7y7duLf4qQYyRbwbcD9yTwe3/BS2Ob/qn6ah/cL/4k5nGbsLLVJaNvgdka6iCoKHEz HhTrfROTTyzpNtb/vSORr+hLkzWRMgMVxtM+60yAkiUTeqLzLUeYmuElHA14k9rN/neRMG yr7g4hiFxVLMJuR1KQxOvS0QXM9Om+I= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A2C5913ACD; Wed, 12 Oct 2022 13:07:48 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id WS+OJSS8RmM7cAAAMHmgww (envelope-from ); Wed, 12 Oct 2022 13:07:48 +0000 Date: Wed, 12 Oct 2022 15:07:47 +0200 From: Michal Hocko To: Vinicius Petrucci Cc: Frank van der Linden , Zhongkun He , corbet@lwn.net, akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, linux-doc@vger.kernel.org, wuyun.abel@bytedance.com Subject: Re: [RFC] mm: add new syscall pidfd_set_mempolicy() Message-ID: References: <20221010094842.4123037-1-hezhongkun.hzk@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1665580070; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=noRw341ciNnXF2Fe3Zu8GlyFJvJlbriboL+oH8utWA8=; b=hl5+nQXC3J/s/KsYX2Jm2SDRomk8uertsNIeipkk9zbnb8aoiKxIqNDjQebhFzzJ6sc0x9 3vbCUAsSQpS0mdC0/znt56rID+w5327SuqQ0VUUs3D9ClIanlvSdxtWNjHSpvh+w6omuFr ZXnRS/a9rDDNGhiLjdSElx144h8ZsZE= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=Hpqv7y7d; spf=pass (imf23.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1665580070; a=rsa-sha256; cv=none; b=uPM2GK4Xf2BSAGfbpV0fM8sn1VBvMN9dlYv4HBgZGe+TM/3n/OQdyPSjA9F4ysvTxpELFG q2olT4fLRz/CDEKavh7nbrrSJdaC6XnHqIYiKGpTQfm+c/QB9Skpry9hFpQO8/u7hPXP3t dL6Tcnm5wV4wcl/gz1byNNKqu7XHX5s= X-Rspam-User: X-Rspamd-Server: rspam11 Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=Hpqv7y7d; spf=pass (imf23.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Stat-Signature: bkdfa9ozq45xozbfxd9b4wptm9zz3jn4 X-Rspamd-Queue-Id: 3E72C140029 X-HE-Tag: 1665580070-760973 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 12-10-22 07:34:06, Vinicius Petrucci wrote: > > Well, per address range operation is a completely different beast I > > would say. External tool would need to a) understand what that range is > > used for (e.g. stack/heap ranges, mmaped shared files like libraries or > > private mappings) and b) by in sync with memory layout modifications > > done by applications (e.g. that an mmap has been issued to back malloc > > request). Quite a lot of understanding about the specific process. I > > would say that with that intimate knowledge it is quite better to be > > part of the process and do those changes from within of the process > > itself. > > Sorry, this may be a digression, but just wanted to mention a > particular use case from a project I recently collaborated on (to > appear next month at IIWSC 2022: > http://www.iiswc.org/iiswc2022/index.html). > > We carried out a performance analysis of the latest Linux AutoNUMA > memory tiering on graph processing applications. We noticed that hot > pages cannot be properly identified by the reactive approach used by > AutoNUMA due to irregular/random memory access patterns. Yes, I can see how a reactive approach might not be the best fit. Automatic NUMA balancing can help quite a lot where memory regions are accessed consistently. I can imagine situations where the user space agent can tell much better what is the best node to place data when the access pattern is not obvbious or hard to deduce from local metrics. My main argument is though that those are rather specialized and it is much easier to implement the agent as a part of the process as they are unlikely to be generic enough to serve many different processes. I might be wrong in this of course and I am also not saying that pidfd_mbind is a completely unreasonable idea. We just need a strong usecase before going that way. > Thus, as a > POC, we implemented and evaluated a simple idea of having an external > user-level process/agent that, based on prior profiling results of > memory regions, could make more effectively memory chunk/object-based > mappings (instead of page-level allocation/migration) in advance on > either DRAM or CXL/PMEM (via mbind calls). This kind of tiering > solution could deliver up to 2x more performance for graph analytics > workloads. We plan to evaluate other workloads as well. > > Having a feature like "pidfd/process_mbind" would really simplify our > user-level agent implementation moving forward, as right now we are > adding a LD_PRELOAD wrapper (for signal handler) to listen and execute > "mbind" requests from another process. If there's any other > alternative solution to this already (via ptrace?), please let me > know. userfaultfd sounds like the closest match if #PF handling under control of an external agent is viable. -- Michal Hocko SUSE Labs