From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8170BC636CC for ; Tue, 7 Feb 2023 16:44:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0226B6B0107; Tue, 7 Feb 2023 11:44:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F14846B0109; Tue, 7 Feb 2023 11:44:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDCAE6B010A; Tue, 7 Feb 2023 11:44:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id CACBA6B0107 for ; Tue, 7 Feb 2023 11:44:45 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 7A3A4808D8 for ; Tue, 7 Feb 2023 16:44:45 +0000 (UTC) X-FDA: 80441069730.07.C0FBEF4 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf12.hostedemail.com (Postfix) with ESMTP id 55A6240029 for ; Tue, 7 Feb 2023 16:44:43 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=D8hhrBno; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675788283; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+J+3TJkq2X6liW4i4+QmJdujFpFiO2vEGKSI6wCVEhc=; b=C3jkz2YyVIrypmd/XV6eydGO3z8nuXG/5HvbmfQv3JVyci0HurGwORBbLIQ+Ap6E1yGCkc +cvVzTf3PkOMqc9eosnIEoM7cYHSR8D3BN4TmTb+VBlO9d/daZ0hrmKffqcj2IydCX9Tgn CFRI0dqiI7UqE1UQX2vw9OvW56JKj7I= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=D8hhrBno; spf=none (imf12.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675788283; a=rsa-sha256; cv=none; b=6pKtUvw2UjY8fpB6AAxbZmFDsk3d71wsGNMW3j47vuRy+E+cAHFAISDrNjeDXM3FhY9Mc7 dnCj4iULCvTGU2JOzxgjrpInrsWa5oOZcTW+jHOMfp7IcDL3s3d966+fzVNQwqAz76bCkU Insl4npzrReoqd66O2wrqKnf7LSYw6w= DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=+J+3TJkq2X6liW4i4+QmJdujFpFiO2vEGKSI6wCVEhc=; b=D8hhrBno1jSnxEZwJM9glXu1qR +XjJa+ol4l3OrlLe9KoNUCiob4WGSS5jD6C3C85ODS+PfRpOsLCJlm7Vud4VajTjuAzQmG1nqcx0P oEpYZ2S3SbkJrH5HeqeZDlPZhK5TIETtvx+bRDBq3acoRA3p3AxPnRtEUeua/bsg0zMtUGaulIbQy avnotfl+bQR8lsE6T1n1YA7a0xvDNuYIIS5v9Sfce0hokVfXwfX0EInJXVyRbhL+iJfHFJG58v3TC PogGPUFj9WO6QsX93KQXu6WZClhIEEXF2WKw/6VFcZIzqM9UpXa6OPk8BhrjZM9EzPcWqnbzTi3oq n44JsUFg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1pPR56-000O7g-Ll; Tue, 07 Feb 2023 16:44:32 +0000 Date: Tue, 7 Feb 2023 16:44:32 +0000 From: Matthew Wilcox To: Zi Yan Cc: Mike Kravetz , linux-mm@kvack.org, Vishal Moola , Hugh Dickins , Rik van Riel , David Hildenbrand , "Yin, Fengwei" Subject: Re: Folio mapcount Message-ID: References: <8325B718-5179-43F8-B211-26D8DEBF77C2@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <8325B718-5179-43F8-B211-26D8DEBF77C2@nvidia.com> X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 55A6240029 X-Stat-Signature: 8qxn1i31c5n8yw1iyiac1wm7k7wso8kq X-Rspam-User: X-HE-Tag: 1675788283-184139 X-HE-Meta: U2FsdGVkX19KXmdxUFXK1dEbiY4l3xsSamNpn0sRbIsbGqQHLkVgGXvFmatXrgpyT8v45fPr7rtUArKuBIsUIJnlxstwGy17PvZ/RLqu8x12WbldlYcjQoYwOLqFFX+LxNu+Tp2Gdo8odgo45jNP2fyy38yO5vKeSRQdmhSEdtGyfEq3Brk7yyD7laEycFLyM/CAESFZgqQ3cZnlMALTvSnX8xXlsxZZ4RLmQ5brNRGjj7z4aJHa4aLsWUBp4gVrk6Z756VU1QZxn4q2NJZKDGCQOK9cWXRY3XX/5nLeYnDpkOs7Un3fv2w8uFnHdNf054REGGmvxDa9KM5/wy8mB0vxu6urRHr3o9wueM/OmSLljZN/DdZ0D5c1THZmPuissTT7IxfBHmziHn1UNOK39gWGbsC8a5P3ZxsTrcoWIbSQINs8LhkHbCYl6x91d3LE6FIHqfoCBOyCBTyPxfnLnJt26p2zzN+SXZUJ88fYYwXTraLfSAU6J4IVLFSrJyqBlicLeR9onnp4hbLpdZGJgVQKZ4XuUzNbbM8F9w4/1ohranD0N/gfemkFkZ1wfjEv7Qo5aqs8NgLkpyby1RL3f+kI10IRkoiLXet61H6kiSE5ZE3UXSjMbtKTD6y8s8VM9HxqpxYklG/uI1k/2iBZU3n+FV7ympE22cJmiqRSdw6DMeZQugHi8QEW93rVeXS0OqmcxVFvwFe55kNmqErVYcU+I7X06tdQ4T5N5i6QSvYbg/j+W493YFb9y/vJxjxyRfR8e78wO0smhRNf7tohHAtUH85BL5ppoqI7CrJZywtMnoMKlohFt1rqzGtmTJfk5tOjJihiLox9PABUQwyihtOQDcbewNzKbYZcxa8CwIAsfSUTRhoZqAmNjjZzFw628sDNS86gh3t4x8nEDmUWSNf6vhChbGjSqDjtf7HQ1yL64dNyV4wkX2lVfgRimD3eDWK5WNw2/kiL4GAcxDy FDHUww2x GYI5h065JVPJXP8JgDJqD8rWim7z6cepljMS0g5be9nKOSn1J0vOc7OsS8QAfESTwCMzD/Owdg9tB5npvsV3WWdhWDQxISatQOM68gwQgwXmYQoeDTHkPJ/OY3AutVgv329ZxmYxCpeEKoSIZMGb0e7CsfGSruFk5L6wZciNX3wjyCLwgfZa7z0wPYSWHClLOaQ4bgPvHvJcJ7JFwjMhN3Jwn5NTPfuYzswiM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Feb 07, 2023 at 11:19:40AM -0500, Zi Yan wrote: > On 2 Feb 2023, at 10:31, Matthew Wilcox wrote: > > On Wed, Feb 01, 2023 at 07:45:17PM -0800, Mike Kravetz wrote: > >> On 01/24/23 18:13, Matthew Wilcox wrote: > >>> Once we get to the part of the folio journey where we have > >>> one-pointer-per-page, we can't afford to maintain per-page state. > >>> Currently we maintain a per-page mapcount, and that will have to go. > >>> We can maintain extra state for a multi-page folio, but it has to be a > >>> constant amount of extra state no matter how many pages are in the folio. > >>> > >>> My proposal is that we maintain a single mapcount per folio, and its > >>> definition is the number of (vma, page table) tuples which have a > >>> reference to any pages in this folio. > >> > >> Hi Matthew, finally took a look at this. Can you clarify your definition of > >> 'page table' here? I think you are talking about all the entries within > >> one page table page? Is that correct? It certainly makes sense in this > >> context. > >> > >> I have always thought of page table as the entire tree structure starting at > >> *pgd in the mm_struct. So, I was a bit confused. But, I now see elsewhere > >> that 'page table' may refer to either. > > > > Yes, we're pretty sloppy about that. What I had in mind was: > > > > We have a large folio which is mapped at, say, (1.9MB - 2.1MB) in the > > user address space. There are thus multiple PTEs which map it and some > > of those PTEs belong to one PMD and the rest belong to a second PMD. > > It has a mapcount of 2 due to being mapped by PTE entries belonging to > > two PMD tables. If it were mapped at (2.1-2.3MB), it would have a > > mapcount of 1 due to all its PTEs belonging to a single PMD table. > > What is the logic of using PMD as the basic counting unit? Why not use > PTE or PUG? I just cannot understand the goal of doing this. Locking and contiguity. If we try to map a folio across a PMD boundary, we have to have the PTL on both PMDs at the same time (or all PMDs if we support folios larger than PMD_SIZE). Then we have to make two (or more) calls to set_ptes() to populate all the PTEs (so that arches don't have to handle "Oh, I reached the end of the PMD, move to the next one"). Note that I've decided this approach doesn't work because it can't easily tell us "Am I the only VMA which has this folio mapped?" But this was the reason.