From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E5C3DECAAD8 for ; Fri, 23 Sep 2022 02:02:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EFC6940008; Thu, 22 Sep 2022 22:02:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49EDF940007; Thu, 22 Sep 2022 22:02:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2CB28940008; Thu, 22 Sep 2022 22:02:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1E48D940007 for ; Thu, 22 Sep 2022 22:02:11 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 01C2F1A01E8 for ; Fri, 23 Sep 2022 02:02:10 +0000 (UTC) X-FDA: 79941700062.07.9642D94 Received: from mga09.intel.com (mga09.intel.com [134.134.136.24]) by imf05.hostedemail.com (Postfix) with ESMTP id 8FF7B100003 for ; Fri, 23 Sep 2022 02:02:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1663898529; x=1695434529; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=l2WpTaxc0IZDM2w1mXAQdy6TEj3Rz/9ynYmXQvtE2ow=; b=IImFpczdP2sy1Zgo2dpiGkjFpiWXQDM9hoGaNQXVQzl/bAqLizUlY3On t5I8UEjeYP4u3dJDjdfURyIRmhyBHjjsimmiZFBQI9rO7sD6VoRxGZPde rRYYIzzLVSoiXxx+XihcS2/sd5Rsu34OC4Ai4hD8kKnKjkO4uNKMs91qS DFqPtpnNWkHv34T6v9kPXrUcZzPzrjWESGDRPfsbWK4xlQq0TxRqLFE7h 4oiutLVpvTkRiwOkMn58o316UC0g/vbBTGxhY6KAzMto7tHQ0fu/NROn/ TPc3N0+9op1xAqA/LhNgv0QJX9r5pgawvVip/cgQRwF7eGevp6AMn0fLk w==; X-IronPort-AV: E=McAfee;i="6500,9779,10478"; a="301365199" X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="301365199" Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 22 Sep 2022 19:02:08 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.93,337,1654585200"; d="scan'208";a="688550786" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by fmsmga004.fm.intel.com with ESMTP; 22 Sep 2022 19:02:01 -0700 Received: from orsmsx603.amr.corp.intel.com (10.22.229.16) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Thu, 22 Sep 2022 19:02:01 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31 via Frontend Transport; Thu, 22 Sep 2022 19:02:01 -0700 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.101) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2375.31; Thu, 22 Sep 2022 19:02:01 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=J61oToOg/Zfv/AHRXKsg6U7NGm1SGjIzBH7kBSZLAHUu0+2tMbdDr+iJD2MoHYEg3J1Dyln3BRu5G8l/7OSd7fbv+OQQ/6G840SbnZgTOxLfZKxzWLc+GjBBKdlPPrYXDbrItQujPV8cWdr5Ym+4rzzYWAHS7GxMqT2J5kMAAQJ0Edb9P+a+4oSs4PDuxcV8asFtJUcLGGrlWclvll8PlNWNLjM44Afs6jFiGueTGXJ/V9xqQeRn/Ldl83HpYwT744oS3XAtaPAFsc6tDOKm3rxBx2dzFCeRJOABMKji/SWTYQRCXYQrukpKx1wqP5qV1UCuOO3aVQAjDvgzpQeGvg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=+5FmAcqJh3W05kne15p7/hGdb8iOzw5GJ2lmFFrur3A=; b=CQ7tvUifQKTcwOHWxIv+AMwuEgQmwCp3Knt2Pksodq3atzAPUVG2FOg+xobCPKa6XwME1B7P8t/bXt6kgAyhUQw5dvzl0DT2oE2GD27hugvtv0gc/fBlpqXo+ZEc7Fe6NW9I2n5ojMWt4Fm5uK9qM4iw2k3VjOa6zQV7s5gCghI6vU1FscE48PURN5z+g74LKped0KFrU0Mo/1OunJmM817RQYSTG+J7eXbu+yWyqlBJjbBJZIpk1bQknzswGixLVlRcTXrDwn76kjeBcSZjWr/Ag4AYgucDr8xv51RoPdwbg9WtwrLQHBdblMsQJv0AJt5X8Bns8w/hNvO3ivb5rQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from MWHPR1101MB2126.namprd11.prod.outlook.com (2603:10b6:301:50::20) by SJ0PR11MB5037.namprd11.prod.outlook.com (2603:10b6:a03:2ac::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5654.20; Fri, 23 Sep 2022 02:01:59 +0000 Received: from MWHPR1101MB2126.namprd11.prod.outlook.com ([fe80::9847:345e:4c5b:ca12]) by MWHPR1101MB2126.namprd11.prod.outlook.com ([fe80::9847:345e:4c5b:ca12%6]) with mapi id 15.20.5654.017; Fri, 23 Sep 2022 02:01:59 +0000 Date: Thu, 22 Sep 2022 19:01:56 -0700 From: Dan Williams To: Dave Chinner , Dan Williams CC: Jason Gunthorpe , , "Matthew Wilcox" , Jan Kara , "Darrick J. Wong" , Christoph Hellwig , John Hubbard , , , , , Subject: Re: [PATCH v2 10/18] fsdax: Manage pgmap references at entry insertion and deletion Message-ID: <632d13949e113_4a6742947c@dwillia2-xfh.jf.intel.com.notmuch> References: <632b2b4edd803_66d1a2941a@dwillia2-xfh.jf.intel.com.notmuch> <632b8470d34a6_34962946d@dwillia2-xfh.jf.intel.com.notmuch> <632ba8eaa5aea_349629422@dwillia2-xfh.jf.intel.com.notmuch> <632bc5c4363e9_349629486@dwillia2-xfh.jf.intel.com.notmuch> <632cd9a2a023_3496294da@dwillia2-xfh.jf.intel.com.notmuch> <20220923013634.GY3600936@dread.disaster.area> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20220923013634.GY3600936@dread.disaster.area> X-ClientProxiedBy: BYAPR21CA0010.namprd21.prod.outlook.com (2603:10b6:a03:114::20) To MWHPR1101MB2126.namprd11.prod.outlook.com (2603:10b6:301:50::20) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: MWHPR1101MB2126:EE_|SJ0PR11MB5037:EE_ X-MS-Office365-Filtering-Correlation-Id: 75ebc668-bb77-4956-24df-08da9d079982 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: BS/Bcl+5npKrHHoU0/paJ3BJug86c4aSthfgKRq85R75kPuDPgyLt2dzE8W1qsd9APALN+E5l8MlEAIXnu5F5ZK0IAKe3Y9uOKorhESBv7mDcoG/3qcDcO5PaBeKQ3OzbngOYhnp2vUy6+XQAbuzMmdcbjTkDAduETcTlb0uVDhKems00SYkvvQ5LjDbH4ZVY1i3xiZC8IJ7PuEJGkdtP8gFDr1CoynQqR/58bPswn0WasxdTeNQVdZZK4ProCsR7zQ/dXxYwcF2DfPn9bNWdC0mPBp2NNRxiFMhosNA2LRK0hHQhvrUb4hmnSV+FANBF/VBltfWemG4qAjVOSPoE2Ly6CjhpPXpq7e7nleO58UvCJUnuIKbcccS55M7slZaNe/7a9WghEkNOT6IIBZy+4nLZNMxS94sDHgocOoSdzAXDdvEO2ArMXf4kWTBpFtQdAfG/+vJVJKL6iVMIcKMbx42/AiIChxWp10r82aYLqLvKdxB+742FTwrCYsSLmFzYd6NOWH/MxONmaOsBSRt6XmfpqUfUa2TvBZUIWyJ4l8TCRIYbFQtTnJ22r46yOUSfo/QW9EIgA3OEJxsEpnTVwU4r5kK61Fpg+8i16kwh1fsB+F3eDmkD+IykU0TmocvE373q4ptaRMjY2Wk4ewEpuV/hfVo8BLILySAWwWFu36+U4YV3J5LX6t9BlFq+E15xqvmFmHOkjGzyDWbBJ+1kQ== X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:MWHPR1101MB2126.namprd11.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230022)(396003)(136003)(376002)(346002)(366004)(39860400002)(451199015)(8936002)(4326008)(82960400001)(66476007)(66556008)(66946007)(38100700002)(86362001)(7416002)(5660300002)(2906002)(6512007)(6666004)(9686003)(26005)(83380400001)(6506007)(186003)(6486002)(478600001)(8676002)(41300700001)(54906003)(110136005)(316002);DIR:OUT;SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?roG1sgajdMGWo26HCP2/eC1dxpk2lGJGzxsl2ZT4nbMrnfH1JrrnOaAUTHhH?= =?us-ascii?Q?bDoYXwziyAGpIsmHAZkREaLVFr/iJXs7szS3O8wEUqCVkfsbmTtFVn+xVHJr?= =?us-ascii?Q?9IMwM5oHwe51+zLuhn80nypulC9qAlarwdc0mUVwoBg6x2h0n1uh+BSFp4N2?= =?us-ascii?Q?Ag8zuGM61X5y/pSFCyORYRwiiF6SPfbuheAI5y1mEsgDgag4WrWKkTb+UJUF?= =?us-ascii?Q?60OdPbvkEFgEX76MBCG5N6mkKwRUzUGIVtu66WaEvHFgcWH531Hj6jn4PWp3?= =?us-ascii?Q?9W8bHIXxRA1ut/PMrNajPTw5QPdlEFP3Jmx2AuZ3ab7U1gDF//5w6mVijB5P?= =?us-ascii?Q?HDq++5K3M5FU0RwnZTB6dLpQ8NlqoOCFVWJQt/keXkJIWr6PLgBFwKR+jZAS?= =?us-ascii?Q?wDS3p98EHvTThd3JAQR6l3QevRbd9zxq0t2QRfaZcShVjhKcLl3ux1MngOdv?= =?us-ascii?Q?042RtWBSorhxg7RnfG1A1qQAGgxZZYLi//CSmGS3Rh0brYJsrFrB/IbR+0an?= =?us-ascii?Q?V8DsP9p+5YR0fRBkOjv6wdzDg+DWV/szhnmfLtQO1D+vrctfAN81lJYQ5AKH?= =?us-ascii?Q?YQ5WzLHhB6fBA7CRq03eKFTmFASzVr/eV9tLwQCiV6roVEUzBDuunaLzynn4?= =?us-ascii?Q?ocT3ktE5hLYQYY5jO6mN04fUuOH9RYnX4Iynlr7026yBxeYIFgzCsVa/h5B8?= =?us-ascii?Q?0E6GY+IlCGW3PdZ6jqyjg+9HrLtIUmW9/lDBL47KxpZ/P4J3xyq7h0UFUOa2?= =?us-ascii?Q?DndLF20o6N6dO5LN54nzSsV6GhWjcXdFjyAwXnFo+uHl1A4AznaKASnG6I6X?= =?us-ascii?Q?HnIiHVpeoNLZUGseoMCuzdR5Z8JJqlrvKjcz6hoib8gyrH0j0SMSMoAwXW91?= =?us-ascii?Q?FVlrEW8onqd91E3cp0ziKApevkIPiSqgOqprN3CiA3rKekE23CA0nHilqtZ7?= =?us-ascii?Q?4jzRr7ZGQyTRE80Prw6LlPUHMF+FNDfQ0gg9PKJkqJyaYeJLNeyGBvSbD9N+?= =?us-ascii?Q?WWxtBQEyNVbptJaJAIGQKyj5E81kCfu6KelhxeTgCwi4ufmG+TRbX29LFR3M?= =?us-ascii?Q?RDtfUATww4XLD6e5k47858vxjSjFTdHS4uwfvZi/nTIdxZ8mFu8jD6xWEbwG?= =?us-ascii?Q?hZ9iuizHbSTTigIS4PBdvVRBrMooqZf7lp+JNJ5a5KIAFsfGRghXkF1/g0jz?= =?us-ascii?Q?AtnUxuAi4H+5e03PlJIyjBGI28Gvojchybzfr3S0B45PnaBYyw8KfZW+Etux?= =?us-ascii?Q?oxgznuMU11GP7YsUefKyAiQlV3wML4nKR7SJhraBm5CcsF4M267mt+SqYd48?= =?us-ascii?Q?ifNDyjDT2395b561I7DwJpMSejFEGqQKYaEXx/xOifXyD24EiA7XWtesvaIv?= =?us-ascii?Q?GBAZHBl3iKiIqxgzG+VD+4ipP+/8tph5kxk8X1WJk16yJK55zDpDJxhS1O7k?= =?us-ascii?Q?PFHyBamJh9ZjYolcTD6hTHmHqfzwVHzhbwGpP9BeONd8W5s00LZ8axdDjGeT?= =?us-ascii?Q?rHLSomp/u4xBcP7l/fcXoswK1z07qr+SSBb7LPpbj7i0ERj2+C2aL0qFcoH4?= =?us-ascii?Q?Efo2OXyp0YMCU4VHpMBkpJYhl7MjXjp7ynKDNcqsIFjkwm89ZQ7M0E8hCa6k?= =?us-ascii?Q?Cw=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 75ebc668-bb77-4956-24df-08da9d079982 X-MS-Exchange-CrossTenant-AuthSource: MWHPR1101MB2126.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 23 Sep 2022 02:01:59.3683 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: JvgfUVQZY8JNQzKBVESJqovBLYLehW0lowB3wewpRJDgp6dyWWGaAzSAnyExyllvOqjCmYGJjBawphN/weNAVERhQlo88f0yZHXV4EPh9RY= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SJ0PR11MB5037 X-OriginatorOrg: intel.com ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1663898530; a=rsa-sha256; cv=fail; b=ELD9n9d46QIC9aSjI9dcn8dn13x2T4A+3YutCA8mac+ryJjkeFjUGPTl11KjyCZJmgobF0 MPNcgXF5DCX8lYGzWpU0hvQceQEbQwX/G7JAsIHdkkd95q2hN2JyeMJnlLabCFwrqZrpa2 kJq1U7I10cB1kGTmqSvWOYjSQT+dWps= ARC-Authentication-Results: i=2; imf05.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=IImFpczd; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf05.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663898530; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+5FmAcqJh3W05kne15p7/hGdb8iOzw5GJ2lmFFrur3A=; b=gFhkW9cdg9D0ufNSS+prTYpfy1gFlSGIsG5/wpT8p1VN5vFzrFsch0b+zA9Wns+8GCZVUO 7ZjWrxz66JJ2dR0BKgT/1HFS9yKDI20szGg3i5hkdr2ry7/08HxMDa7ddQQ7v25X7K3Rmy wJI8wSwRsBu1cN5UBIJ45Qj/yJCbiNA= X-Stat-Signature: bf43roiocee5k57pjfj7t33qib3asqdz X-Rspamd-Queue-Id: 8FF7B100003 Authentication-Results: imf05.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=IImFpczd; dmarc=pass (policy=none) header.from=intel.com; arc=reject ("signature check failed: fail, {[1] = sig:microsoft.com:reject}"); spf=pass (imf05.hostedemail.com: domain of dan.j.williams@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=dan.j.williams@intel.com X-Rspamd-Server: rspam05 X-Rspam-User: X-HE-Tag: 1663898529-182018 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Dave Chinner wrote: > On Thu, Sep 22, 2022 at 02:54:42PM -0700, Dan Williams wrote: > > Jason Gunthorpe wrote: > > > On Wed, Sep 21, 2022 at 07:17:40PM -0700, Dan Williams wrote: > > > > Jason Gunthorpe wrote: > > > > > On Wed, Sep 21, 2022 at 05:14:34PM -0700, Dan Williams wrote: > > > > > > > > > > > > Indeed, you could reasonably put such a liveness test at the moment > > > > > > > every driver takes a 0 refcount struct page and turns it into a 1 > > > > > > > refcount struct page. > > > > > > > > > > > > I could do it with a flag, but the reason to have pgmap->ref managed at > > > > > > the page->_refcount 0 -> 1 and 1 -> 0 transitions is so at the end of > > > > > > time memunmap_pages() can look at the one counter rather than scanning > > > > > > and rescanning all the pages to see when they go to final idle. > > > > > > > > > > That makes some sense too, but the logical way to do that is to put some > > > > > counter along the page_free() path, and establish a 'make a page not > > > > > free' path that does the other side. > > > > > > > > > > ie it should not be in DAX code, it should be all in common pgmap > > > > > code. The pgmap should never be freed while any page->refcount != 0 > > > > > and that should be an intrinsic property of pgmap, not relying on > > > > > external parties. > > > > > > > > I just do not know where to put such intrinsics since there is nothing > > > > today that requires going through the pgmap object to discover the pfn > > > > and 'allocate' the page. > > > > > > I think that is just a new API that wrappers the set refcount = 1, > > > percpu refcount and maybe building appropriate compound pages too. > > > > > > Eg maybe something like: > > > > > > struct folio *pgmap_alloc_folios(pgmap, start, length) > > > > > > And you get back maximally sized allocated folios with refcount = 1 > > > that span the requested range. > > > > > > > In other words make dax_direct_access() the 'allocation' event that pins > > > > the pgmap? I might be speaking a foreign language if you're not familiar > > > > with the relationship of 'struct dax_device' to 'struct dev_pagemap' > > > > instances. This is not the first time I have considered making them one > > > > in the same. > > > > > > I don't know enough about dax, so yes very foreign :) > > > > > > I'm thinking broadly about how to make pgmap usable to all the other > > > drivers in a safe and robust way that makes some kind of logical sense. > > > > I think the API should be pgmap_folio_get() because, at least for DAX, > > the memory is already allocated. The 'allocator' for fsdax is the > > filesystem block allocator, and pgmap_folio_get() grants access to a > > No, the "allocator" for fsdax is the inode iomap interface, not the > filesystem block allocator. The filesystem block allocator is only > involved in iomapping if we have to allocate a new mapping for a > given file offset. > > A better name for this is "arbiter", not allocator. To get an > active mapping of the DAX pages backing a file, we need to ask the > inode iomap subsystem to *map a file offset* and it will return > kaddr and/or pfns for the backing store the file offset maps to. > > IOWs, for FSDAX, access to the backing store (i.e. the physical pages) is > arbitrated by the *inode*, not the filesystem allocator or the dax > device. Hence if a subsystem needs to pin the backing store for some > use, it must first ensure that it holds an inode reference (direct > or indirect) for that range of the backing store that will spans the > life of the pin. When the pin is done, it can tear down the mappings > it was using and then the inode reference can be released. > > This ensures that any racing unlink of the inode will not result in > the backing store being freed from under the application that has a > pin. It will prevent the inode from being reclaimed and so > potentially accessing stale or freed in-memory structures. And it > will prevent the filesytem from being unmounted while the > application using FSDAX access is still actively using that > functionality even if it's already closed all it's fds.... Sounds so simple when you put it that way. I'll give it a shot and stop the gymnastics of trying to get in front of truncate_inode_pages_final() with a 'dax break layouts', just hold it off until final unpin.