From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DF727C02192 for ; Wed, 5 Feb 2025 09:33:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7144C28000E; Wed, 5 Feb 2025 04:33:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C342280005; Wed, 5 Feb 2025 04:33:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 563E528000E; Wed, 5 Feb 2025 04:33:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3B1DA280005 for ; Wed, 5 Feb 2025 04:33:50 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 72E2F140A5D for ; Wed, 5 Feb 2025 09:33:38 +0000 (UTC) X-FDA: 83085378558.07.1CBA93C Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) by imf21.hostedemail.com (Postfix) with ESMTP id 36D761C000C for ; Wed, 5 Feb 2025 09:33:35 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=wfYDaqUG; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=FkwIeCEi; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=wfYDaqUG; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=FkwIeCEi; spf=pass (imf21.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738748016; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=sZrWHlg0l0DcsWDKo65yKQaT9zxCA0PVWyLCw6b/sSQ=; b=GQf3Gmsrz6NMP0E0jy+Z4Irw83g20YqIlmuiivM+eg2WBQ4fyLZSnv5kEOQYkTQ7Rk2MTP AmKcClEn0A2BXvNYFfB4vCTURZ4O4CnqcVhA+zeb6jHGdFKF3KhugjntwqcciMv7D/XMln X2fct6ZfTxVNxm4tAsnZ3T8h5KJKYTM= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=wfYDaqUG; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=FkwIeCEi; dkim=pass header.d=suse.de header.s=susede2_rsa header.b=wfYDaqUG; dkim=pass header.d=suse.de header.s=susede2_ed25519 header.b=FkwIeCEi; spf=pass (imf21.hostedemail.com: domain of osalvador@suse.de designates 195.135.223.131 as permitted sender) smtp.mailfrom=osalvador@suse.de; dmarc=pass (policy=none) header.from=suse.de ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738748016; a=rsa-sha256; cv=none; b=gO5ZV7IazZitgJM56V/LX2OiOMyf6GucVHPe0W7qFgxWq3nsCmIU2tgIaXS8sFoJxhqXTV 5g1RKJ2Qu5nFs08gflbi4uL5xCdpATWI24ptfd/fv1OVwCaWFh+RaM8lRMGlHI2g+YzS+J 0c+zYog3D/1ZxyavTu9s7j8xtdBN4M4= Received: from imap1.dmz-prg2.suse.org (unknown [10.150.64.97]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 9057B1F458; Wed, 5 Feb 2025 09:33:34 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1738748014; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sZrWHlg0l0DcsWDKo65yKQaT9zxCA0PVWyLCw6b/sSQ=; b=wfYDaqUGz5tJqoJvQKCy7FuQon/LRkDO1OAM3X6rEXrG2Sm3aUZZy6ml4nT0J4HlYMkgxZ dCT64+T7yV3M8xuHrICYQNY8H84gtP0QJ4vr/j7eQyQqvTL4PGt3cm/Q7x4mEr9PfDJycN Gy7fVVubXHvbHNuXZjkYeKsLzdvtb5E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1738748014; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sZrWHlg0l0DcsWDKo65yKQaT9zxCA0PVWyLCw6b/sSQ=; b=FkwIeCEiBmH4MJ6klOmLXXDvx1tUrpwM9qKwP3myq+2edf/Na+27Z7KQULfIrLdAHzYKaQ P4AI6fsJXGQF7GBg== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_rsa; t=1738748014; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sZrWHlg0l0DcsWDKo65yKQaT9zxCA0PVWyLCw6b/sSQ=; b=wfYDaqUGz5tJqoJvQKCy7FuQon/LRkDO1OAM3X6rEXrG2Sm3aUZZy6ml4nT0J4HlYMkgxZ dCT64+T7yV3M8xuHrICYQNY8H84gtP0QJ4vr/j7eQyQqvTL4PGt3cm/Q7x4mEr9PfDJycN Gy7fVVubXHvbHNuXZjkYeKsLzdvtb5E= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.de; s=susede2_ed25519; t=1738748014; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=sZrWHlg0l0DcsWDKo65yKQaT9zxCA0PVWyLCw6b/sSQ=; b=FkwIeCEiBmH4MJ6klOmLXXDvx1tUrpwM9qKwP3myq+2edf/Na+27Z7KQULfIrLdAHzYKaQ P4AI6fsJXGQF7GBg== Received: from imap1.dmz-prg2.suse.org (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by imap1.dmz-prg2.suse.org (Postfix) with ESMTPS id 2A065139D8; Wed, 5 Feb 2025 09:33:34 +0000 (UTC) Received: from dovecot-director2.suse.de ([2a07:de40:b281:106:10:150:64:167]) by imap1.dmz-prg2.suse.org with ESMTPSA id 5EFdB24wo2ctMQAAD6G6ig (envelope-from ); Wed, 05 Feb 2025 09:33:34 +0000 Date: Wed, 5 Feb 2025 10:33:17 +0100 From: Oscar Salvador To: David Hildenbrand Cc: lsf-pc@lists.linux-foundation.org, Peter Xu , Muchun Song , linux-mm@kvack.org Subject: Re: [LSF/MM/BPF TOPIC] HugeTLB generic pagewalk Message-ID: References: <4c50a439-e2b8-4f54-ba3d-366d0e2961b2@redhat.com> <74ecaa8b-9e94-4ba8-a2f0-a312607516ba@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <74ecaa8b-9e94-4ba8-a2f0-a312607516ba@redhat.com> X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 36D761C000C X-Stat-Signature: skk7jj81ioq876n44mf9nudz9gtfiexe X-HE-Tag: 1738748015-288785 X-HE-Meta: U2FsdGVkX19n6/OVyxZkht+k+iCrLRMDp8aQEyo0/my+QpyxVo3nDDdNsIInUgKCSiXZVVJ0BHdYjChdVRpJQVpdgIoLCyiyvaOBjB0/Vv0FHbQpsZLcvW3fbEzh0cpaxnRU+zxOkg1+/5g/NuYw8gQJxdu3Q0R2HNLpbzm0ytI+NY5+jYL8bq5u4YsCpabu46KsQxHJI7dzx7b0SL4x+Bg/x4REHmJKCIkDVIIUyJUe2qJXEs0d8KXqVK2Z0S4ldj8H8srKF++FGRlsSjY7sXQhNVuWpW2w6qDX+eRIx+CndJmCazx+5E3loyMD37V0D/2WrBdhwSRq3Dw+mOmSi/B5qdlOOOdVdwL6ofUKOu8x2+JoqtB9GIucJ8rVTHkl54ESqJLB0roLjJNTA7NYA5//VQb0I0e47LB1eu/nRbUe9qDFwrdBHsmvKRYgjIhoptl1bbEoNFoVemq1vDCzax06obTX5EeG9AHjb0iyVDwv3xha3uWSUvq8fhJ108bAK3iJxTb0JLqz9BPVpja+5M/PWgJNpQP9dTygS3gUvSl2VjCm+PSOolMq5D3GAt/k7kzM/QrwnlSySwf9muYZDAt+qHiS99AFiL/W8Rx61GAV+mPWOJHFTqfjKBLYvrCt2/ZcjIHwkgKFF/6vPPdkYdwCB4uh+6V2REaGN4JvZ8qAJ59/N3AkctGwFhd+N/fhjTwLhzJxGAbPvPWldPdAwEV1SYFwOIwzGGUXDJ1K+JbpZOWfmTP+dtge8rj5qu34HyY+2BX5tzAnc+YXdCHh9q6xS71jZ5OPRjyLKAqGDOe1nvoSY4Q7co/vHFhaHsFwpa9RxFS76k1sKxy61mBojOeG1ecyIOoklfYIh76CzHY6xij8FRnGmNIs+W6TDvIxPnFVEq3JYqC3Jfl210AJb0MplGJ7EhlOVKZaYzMPyp1WZiGIwXO+iPIqu1dFjd/SrNUBHseEW2FV/tHjeLu 177/4w0O 9e3m/bCbb5OQq/U08IiZoILJ038OTIzm91/6IJE58r1COG2RB8XofZb7VrgfHX8Y7hnte5tLFuD/YGnIiHQ13L/ocPMkzjVH6tC5YnXqzHsq9y521J5g2pkRaMpSEe+0yWvr+mP6FJuPtZFbBp751z3zVZcsHX+ZnYqJxHRLz8VY8HlNK1/u7Q7Nh1EXwKIsNZQ/9X3b4XvuHE44AZxJ/sC+TC7lplfXMml3OoC8KeqatfxyAgCKwToHa/kQnnf9i+37v X-Bogosity: Ham, tests=bogofilter, spamicity=0.000035, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 04, 2025 at 09:40:16PM +0100, David Hildenbrand wrote: > Unfortunately not that easy. You can only have parts of a large folio > mapped. It would be PAGE_SIZE * nr_ptes when batching. I see, you are right. > I think this is the wrong approach. We should replace that pagewalk API > usage by something better that hides all the batching. > > The interface would look similar to folio_walk (no callbacks, handling of > locking), but (a) work on ranges; (b) work also on non-folio entries; and > (c) batch all suitable entries in the range. > > Something like a pt_range_walk_start() that returns a "type" (folio range, > migration entries, swap range, ...) + stores other details (range, level, > ptep, ...) in a structure like folio_walk, to then provide mechanisms to > continue (pt_walk_continue()) to walk or abort (pt_walk_done()) it. Similar > to page_vma_mapped_walk(), but not specific to a given page/folio. > > Then, we would simply process the output of that. With the hope that, for > hugetlb it will just batch all cont-pte / cont-pmd entries into a single > return value. > > That will make the R/O walking as in task_mmu.c easier, hopefully. > > Not so much with PTE/PMD modifications, like damon_mkold_ops ... :( But > maybe that just has to be special-cased for hugetlb, somehow ... Ok, let me see if we are on the same page. You are basically saying that we should replace the existing pagewalk API with something similar to what you described above. I have to confess that when you first mentioned this back in July when I posted the RFC, I felt dishearted, because it implies an even bigger surgery. But having felt the mess that dealing with cont-{pmd,pud}s, and the inability of the existing API to do that in a clean way (without having to teach each and every function about that if needed), maybe it is the only way to do this 1) right and 2) clean. I thought that maybe we can get away and to the batching somehow before calling in the callbacks e.g: at walk_{pud,pmd,pte}_range level, but I am not sure whether 1) that is possible and 2) how ugly it would look. So, given that this is not a really urgent matter, something that needs to be fixed asap, maybe the way to go is to create an API that can deal with all that, abstracting all these details. I am willing to take a shot on this, if we are clear that it makes sense to pursue this road. -- Oscar Salvador SUSE Labs