From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37030C5475B for ; Fri, 8 Mar 2024 04:37:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADE076B0331; Thu, 7 Mar 2024 23:37:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A8E4E6B0333; Thu, 7 Mar 2024 23:37:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 909956B0334; Thu, 7 Mar 2024 23:37:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7F7256B0331 for ; Thu, 7 Mar 2024 23:37:20 -0500 (EST) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2BDA31A06B0 for ; Fri, 8 Mar 2024 04:37:20 +0000 (UTC) X-FDA: 81872612640.22.E4C936C Received: from NAM12-MW2-obe.outbound.protection.outlook.com (mail-mw2nam12on2077.outbound.protection.outlook.com [40.107.244.77]) by imf08.hostedemail.com (Postfix) with ESMTP id 4169D16000E for ; Fri, 8 Mar 2024 04:37:16 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=QxZ0W5MU; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf08.hostedemail.com: domain of apopple@nvidia.com designates 40.107.244.77 as permitted sender) smtp.mailfrom=apopple@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1709872637; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=WN6rpEOBCngq/2EpSi86+Zl4+zTNVXuqjVn0jS78Wcc=; b=pSgvPujdqNYyWwWEYP5cO+XZIatpnU3r2PC4aHoaBO5F3IHWkMQ6STiON6qpOJvipPq9n/ C/DiZlfZiPT4DB7iRlvqnY6ly4GUbMxAlK7qPnErgnHnl2Txst6ZzQcBNbs/ZftPvvcd1w /9alnqI9gR9z1sjn69AIfn2mNwS9oAc= ARC-Authentication-Results: i=2; imf08.hostedemail.com; dkim=pass header.d=Nvidia.com header.s=selector2 header.b=QxZ0W5MU; dmarc=pass (policy=reject) header.from=nvidia.com; spf=pass (imf08.hostedemail.com: domain of apopple@nvidia.com designates 40.107.244.77 as permitted sender) smtp.mailfrom=apopple@nvidia.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1709872637; a=rsa-sha256; cv=pass; b=kJRt7/dcTQYSX4NFPNYlviH87tG8Y8HcVuBSiMOeSCSg9ru50+dyHHwhbsMjfvSqlndYQT ZJ6xy6/ng+OsQktxz1A6XDr30B04nSAya4iirMMyPXHr7xq1GZvKNgbo7QxrE+Ab17z9V4 lVw+iGhhOkdmnQfsH9RI3ImbooDvbJc= ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=ODeUz4K6fwypEp0doNYzWC2doL/MfbqLRb21Tv+ZElLoAUsG8oyhuCILopQ32X3FlO6XDXStRyXdtga3i842ZkyDTVyJDqbzC+muBN+myIYLXXcDm0B34dnM5RToCYQEWmhlJG/V02lU13D2qvDqemE7kYVZn1Om9GNUxAHnITocBSdxBh6scpJ9WnOFKKW/gveiwPCB8ExUewN4eAH86F9VaeOIrlLfaL3AoZAxl9Apb1Y/L6vrIgK3knPQGZ6Ge/nqrxUFA/lgSRLmIjRk+ILXhM65BlGOgJ7n0Blc2YUPcVc+M+8D1px2MAEBbwUs1Ae2gWXqQJeZeQ/mS/jROg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=WN6rpEOBCngq/2EpSi86+Zl4+zTNVXuqjVn0jS78Wcc=; b=lpF5vajQ2ujGjH3koLIz3P0o7i+Xvut2r2lYWfAlCoXkIBWq5kWNKNkajCMozng6P4Xc7SC5QUdKdCnOleYyhpomF2FoFtsVbsKGvBnD2aNsakqDXHZd3bIyhsVoA1WaLqr0RA0FrkCRNz6+NjG1VOOimGvp9kiSv2avXwoDimkKyyE1mb3XK1hYdF3TAv19eeSpUQob6LjKgTkLgrS27uKPwJssW9pU7wlRrXx3kjGLkxArjQAUzjHct8ny0fWB3Rz6R0qQyAKPJdkxn+17gIvtnYIekg/yPcb9xSkEZ7pM/u7wW+0snFaTCe5iOb+wJMTurlZquAFvVa+aov4o7A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=nvidia.com; dmarc=pass action=none header.from=nvidia.com; dkim=pass header.d=nvidia.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=WN6rpEOBCngq/2EpSi86+Zl4+zTNVXuqjVn0jS78Wcc=; b=QxZ0W5MUU/WyYl+Lo9odPAHxSSEDl6TQlmBkFIQCMMFtA3bhxrRfCBAWwV3+zIq5cCZTIvCzYcvE5wunV6IsyTwAk9ai7UhfD/ktaOHaYRCIDkhu0rbH1WiPmkiPga6VgsVGlCIpjT4jkhfFyDewiAx4bP5fwXXecN8GgqR7dPiSBnkfaFRHHg3FybrcGQa3bW98M4+pcHSmH3hm6JfmYq6ZQVoqQ1SeIgHvCFhDbyK/VROXvg7ikRlQlhNt3azbdArk09Jq21ulK59U98A5RMc8DFmo96PVXTDPP5SgwlLhtwocg+Op811tx9pPoCGBg7Vj946DWGcg/8GkWcCnwA== Received: from DS0PR12MB7726.namprd12.prod.outlook.com (2603:10b6:8:130::6) by LV8PR12MB9207.namprd12.prod.outlook.com (2603:10b6:408:187::15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7362.24; Fri, 8 Mar 2024 04:37:13 +0000 Received: from DS0PR12MB7726.namprd12.prod.outlook.com ([fe80::c5de:1187:4532:de80]) by DS0PR12MB7726.namprd12.prod.outlook.com ([fe80::c5de:1187:4532:de80%7]) with mapi id 15.20.7362.019; Fri, 8 Mar 2024 04:37:13 +0000 User-agent: mu4e 1.10.8; emacs 29.1 From: Alistair Popple To: linux-mm@kvack.org Cc: jhubbard@nvidia.com, rcampbell@nvidia.com, willy@infradead.org, jgg@nvidia.com, dan.j.williams@intel.com, david@fromorbit.com, linux-fsdevel@vger.kernel.org, jack@suse.cz, djwong@kernel.org, hch@lst.de, david@redhat.com Subject: ZONE_DEVICE refcounting Date: Fri, 08 Mar 2024 15:24:35 +1100 Message-ID: <87ttlhmj9p.fsf@nvdebian.thelocal> Content-Type: text/plain X-ClientProxiedBy: SY6PR01CA0070.ausprd01.prod.outlook.com (2603:10c6:10:ea::21) To DS0PR12MB7726.namprd12.prod.outlook.com (2603:10b6:8:130::6) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR12MB7726:EE_|LV8PR12MB9207:EE_ X-MS-Office365-Filtering-Correlation-Id: a8480782-5d06-4b5b-4295-08dc3f296ca2 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 8kBGZ3nNBqna6p5KX7ki8vnXOEOQ6o5mqLzUuSu1mewAwkHC4FBMHmizc0Wg/kQkUxvJ0rwSKQ01G3yTWQUkPxaRCd+67FstG1nCjqalKJsAnK7ux6hMWFEBl3dn+a4urg6wcuKIG4FfyUP42Yq1MxMRh9UcpLIYnd2yG+REH//DeW8NZKf6FkAmUBAS7qG6ss4nT9CP1P9zT9uPb4RS1zJ/5mbEWhXS1JMD7q2lapx0o+6syw51gcQX9ziVa06yGsIVwxBIOCVyAK8gdL73PQx4btD4fZDbWakro8HYBiRkyuxRyXd9MB0g+tBzMjmItAz1/SuzAjowS3lsBFyEZ5LMYSNF3qupghhUL9q9l+euKlBgC/UZJ4yVFNuPpwkytcx7RyRiOxO71VaNms/BnpBXArYRMhcwrrVBWQk48ivAAHAOxjnEQpETrF+8uMFLw1Wk17iY/e9jPcfRNqW3ls4fwrdU3Gs5iAH6922O85CI5vTZ1SU/1J/xCga7LtMJ0iSrnbpxVUwUVCOi1W5qFERbLaBR6+0EZqUJhkHHwN9oUQwEzFhHqqI9PgvTGkekPl0YFcDc5PeBfFDetkcIk9nmdZ5GIfbDhaSpcziQuXibkh2msPlp2+98MhVluP65J5spW2VT43SURGhhbXpk2PfSeHKaUMhQcblrTfdxTQ8= X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:DS0PR12MB7726.namprd12.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230031)(376005)(1800799015);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?mJdCsvPvK3LlTku66YpyejMyr6we2jC0DHxbIAfwMakj80/nnm3hyj4mOr/b?= =?us-ascii?Q?NPz5bRGF13THSMKblGGJnjkCvdCkHVEqVqhS7NlcSuHKhtttrpymvKLbb50c?= =?us-ascii?Q?xiDg7to0cWCbzvEWFTuLGum2oFpR5/8Ot8ClzalhEdEf8E/IHUuJf4muEHDi?= =?us-ascii?Q?Pt02ujyWw5oUNLn1Ug2jKBEBC3sFpBYanUG8UgqX1lbrI5JsvMZWL/Wn72o6?= =?us-ascii?Q?RSMz5MlZ2hZJLnMxYB0LCqT510kBxIu3EmcjPLR/T9XqHoHY+kaeCjA5tMkw?= =?us-ascii?Q?vY5Ri1mSfXrA/+hdPoPtKS38vfrJExBcmCY8mjxxkvTkap+76nKenSvZq9pl?= =?us-ascii?Q?+B5qej6P1i0oyPt69I9O4r5RMVV2dObjqqe6hXzd7w39+99YG9lD6Whp8Jza?= =?us-ascii?Q?hndwP59MTUFxTScjfhTjpooC1Nn46ZsVC7yuL9yNGvOqvWsYCOO2HvOy1jgg?= =?us-ascii?Q?WNyua0fS3IUNchb3ZKnPVxrn46T6ViRoUZWYwu1NRJZAAGyMrYIZEjIWobec?= =?us-ascii?Q?qj7eUap4W+UfygLnM7RSay0FNg43pdlPPYSBxItpB4IU/9HyurSAxoErw2kF?= =?us-ascii?Q?BET1/eJg1pl9CZY6JwUdZeRH12/OysQPj4jF7kAbHl19fAVhSOUCg4D42MAb?= =?us-ascii?Q?Gao3MpZZDVkfG9JatzrWHOa05UKamCJxTtz5dQo3a5WUYtTJOyndo1QRr5bO?= =?us-ascii?Q?Xsc6/SYZuOgQpQbUknYmGt92vIDFRL5A/cEUEqnRtgo1h2p7WOrvddcZEGSi?= =?us-ascii?Q?H63FwALWfA4Sofi0TxRWgcZHW/lOpFPakZaWD2uOcgxesRKoFR8sk6rVixyG?= =?us-ascii?Q?guHU5GSrDDO/pAMRg9/eEoy9Fhn9VWIlZcM2n5pT0yPYyxgn5d1o5JAygU04?= =?us-ascii?Q?jNXnLxxUvzrtm42QR3j5Wgq4TlDdrdDQ+d3qeOo7D2crwrgbHJjkkh41/QkV?= =?us-ascii?Q?xHi+oBSIMaZLFN0qfCuG16F8dtRVhhLXNi8udWGAtXK4rXKxv1tBfvU9pWxd?= =?us-ascii?Q?icTYlJH+/QrGBxt/3msUYkXYW3tBKaEHMOBmdxyjWM2KGHw7n6VJQl77c5MY?= =?us-ascii?Q?AsktbfXhJzmnIplgNpksfms+Iy02gwUf2YbY0LxF1S/bDGiPhNIsXGaIoJub?= =?us-ascii?Q?q737iQYMWpRU89x0nNX43ZIGkfZuPzCCs1eu2nZHWTOMFzqJ2F7BGJ8BC1/+?= =?us-ascii?Q?1fVULxRdD5ATiAZoAEkyM4GFADgOH+dDtXdOAkkmFlufmCkp5F4sPPE0nJaF?= =?us-ascii?Q?7jv+xzpW8XAzw81VmNvSmJRGGxUPwxyNtBOKSy2fpuVTIHGing+c9HXFmcUI?= =?us-ascii?Q?89jDnI369yvZsU1xKFRhMhNKQ3R/WeBgAdegr7fBN1xFcLtAJoH3HArNzrG8?= =?us-ascii?Q?gcbSsD9+YXJk7jrMSeabwOSlgS3c3Irl1HE7+4Nf3KIfORpmmqLy5qiJ9tlL?= =?us-ascii?Q?X8v0NV85OH3XLgclVKBG6oC3AhqMrnF//NSVP1DkBpk7N3DmhLcVi5tAQFi5?= =?us-ascii?Q?aDpEjLK3H1SGHh2VpbAZRKFcp/SUEG3i+6gDVucI5AWQq0gloEts4TEM0jNt?= =?us-ascii?Q?ydnur/CeQ5TWSPyrdfE6VRJQvhM2ugTdeCTwh9lv?= X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-Network-Message-Id: a8480782-5d06-4b5b-4295-08dc3f296ca2 X-MS-Exchange-CrossTenant-AuthSource: DS0PR12MB7726.namprd12.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Mar 2024 04:37:12.9748 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: q4Lioj3jGJEsUe3K5qrknBNAORKloVMGNzZ7kjLcIecwvYl+1NjjzgudISavM6MQgyRwco1cEC+S8z/yuoCkog== X-MS-Exchange-Transport-CrossTenantHeadersStamped: LV8PR12MB9207 X-Rspamd-Queue-Id: 4169D16000E X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: b8rzeuodp399w6j63z6i8tuww7p99bif X-HE-Tag: 1709872636-793139 X-HE-Meta: U2FsdGVkX196HgiONTfw+/cWa0VUCvqSLkdtDRlIPKEsMEHQgG+r/0LBTcUQ8ASoy2Tv1atut4Szuqo2RwdNVCr5xHnftbg8weTO/LeFD5kBsJEcEfY9lRzX003vMsrRTh3LNmXxKFY2pve7WyQFQhj4RgAIEBL9C+SRv6jMtGr3LbwYDrg4Qes14IPUE9uBzjfv8CNFoRtVvX7k4L+JVrcfRbJCZDxWgCK//lZq0umwmQOJEdVwd1mWVfq69yX1biDLNloIrKcw+uvb2yRrFj+pzc0a5vKnszlHHQqP8k/oY8fkBBI7NOz33CGAHzDyYrASCdqJvMsHZPaCEGxLfiNfSvGozh0HsaHMIWAPSwQdAVzjY7Ec1dpNXA1V76a5j1mrT97db6MWYnp5czO8+DdPOIjFTp2Oj+snNQ0zLt+4DHWNrEs8iywupIcyTaBpWaWvOWcvkZq3HYOW9P9nGNOjRQihkOrUTNmBF9QNxRiAScOQyvH/ufGyAfMl3dHgTC2XDpR0lsYr1tY11eid+LBHzWo4Xl7g7Rc7QDcXba5/UC1lro+AbvUn6IJegsWclIkkhFtOyyknMiIW+X3bop2NHYATUkI6VeN8YCAEuzsauC+ATZ8ez9DoZqP9ueycevpw4Zws3CN7O3vTfceW3mYPl2y+BIG44TlSAg5ONePg7wAMQ6chJJAYbI5ADMeR9sSkzn5taEHPo0TYYvf1puZyb/HgunueTOQvmDNdK2e4HGcJLmxB3Lngvd183ZNgVmVKqZmqaOrr80BDqWwPUFLqQlo+nRCIrTVTGHAmW8AKYTroIEkuSL101PU0aG7s9FjTcKWyntk5uFfi0zeXld6cI9WCjhFd2T22b6cGgZ9xOYPjDJdUnUkDNCwY97erYCOIvHfPIC2MTcT7pYt5rJ/wsZljOdRW4LthHOen3f7mxRr2CkwUqozymDVV+WXjfjCwkHnDohKTcIkivBm 7EZDxP6E +mwSCN3khk3qzp+4JTBF6cvCOShsF9Uy6UQBF1EodtNtJP94iTnwWcVIopG7nOqMXDTx0443HwPOQlJuNkb4/MszEaHAJW7hZ3A4+ToMlFrkddJbFTCweci6N4G9ZscphuTZqi0GF8Gmea/mfSOY9vOx8uXvYJ9Bf0sPQblKKNasqU8Q= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi, I have been looking at fixing up ZONE_DEVICE refcounting again. Specifically I have been looking at fixing the 1-based refcounts that are currently used for FS DAX pages (and p2pdma pages, but that's trival). This started with the simple idea of "just subtract one from the refcounts everywhere and that will fix the off by one". Unfortunately it's not that simple. For starters doing a simple conversion like that requires allowing pages to be mapped with zero refcounts. That seems wrong. It also leads to problems detecting idle IO vs. page map pages. So instead I'm thinking of doing something along the lines of the following: 1. Refcount FS DAX pages normally. Ie. map them with vm_insert_page() and increment the refcount inline with mapcount and decrement it when pages are unmapped. 2. As per normal pages the pages are considered free when the refcount drops to zero. 3. Because these are treated as normal pages for refcounting we no longer map them as pte_devmap() (possibly freeing up a PTE bit). 4. PMD sized FS DAX pages get treated the same as normal compound pages. 5. This means we need to allow compound ZONE DEVICE pages. Tail pages share the page->pgmap field with page->compound_head, but this isn't a problem because the LSB of page->pgmap is free and we can still get pgmap from compound_head(page)->pgmap. 6. When FS DAX pages are freed they notify filesystem drivers. This can be done from the pgmap->ops->page_free() callback. 7. We could probably get rid of the pgmap refcounting because we can just scan pages and look for any pages with non-zero references and wait for them to be freed whilst ensuring no new mappings can be created (some drivers do a similar thing for private pages today). This might be a follow-up change. I have made good progress implementing the above, and am reasonably confident I can make it work (I have some tests that exercise these code paths working). However my knowledge of the filesystem layer is a bit thin, so before going too much further down this path I was hoping to get some feedback on the overall direction to see if there are any corner cases or other potential problems I have missed that may prevent the above being practical. If not I will clean my series up and post it as an RFC. Thanks. - Alistair