From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 789D1C00140 for ; Tue, 2 Aug 2022 18:08:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E00D58E0001; Tue, 2 Aug 2022 14:08:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB0666B0072; Tue, 2 Aug 2022 14:08:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C02588E0001; Tue, 2 Aug 2022 14:08:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id A8AEB6B0071 for ; Tue, 2 Aug 2022 14:08:14 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 75174140564 for ; Tue, 2 Aug 2022 18:08:14 +0000 (UTC) X-FDA: 79755436908.21.78843DB Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2076.outbound.protection.outlook.com [40.107.220.76]) by imf12.hostedemail.com (Postfix) with ESMTP id 069A44010F for ; Tue, 2 Aug 2022 18:08:12 +0000 (UTC) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=NVEd1Nb5l0TDgEnbO+cxsAYaDs8AhfQ7O2h8nfz2MtOH/IOpFjyUi328bsbC0F44Xn6tuBjytkF/ihUzVaXUaiAeSZ/B/udY32kYy03cngxdkc5ggx/u5/BRyximz7YPVWEEnqBeC32IU7YQ7eSjP9lLF5wms9FCXNVr2wKxRlOjqlj0ZuMYMcs3wHsNmBiXw4TL4FDqc6K+ZPTIhVkBZIyQeLI00TjlqdkqKvdxWQ+KEs/qif47Z1yNfFiIbkIhZ6Mo5LFLf9jzsnPzmYGz4QrjFHTQTLeF0jFkK+FhHx9Dxit1kZKuwV9df1b7hNTm/dXQgyoCQJCxfSKY4rNvkg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=7VsNaj12jhqFOgLCBejhnfW+kbvKXW0SdrSScaY2JnA=; b=BbB7akEwYmxOj6MKa8zUD+VBPboKtYO0xZPpq7WVwNvYytbNAkR3VxNphApjlbqcxMgQitg2kkNgry93MA2biRr0E4OyR4biot+CwaYvk3ioyrW8uWP5eQiec53hrpowLUykKnSBZ/rkL/i5PSTGLFHWknYVl3dkmuNhtb0EJemJG3cDCl4weKZU6b5gN15be2NLbZPKUDS9ps/FAdJMXKLp31G6b++eeE0WCUPlupDoMiBtqxlaT8oRoQAHAMqPsMvDD9DAzUMklGgkumdfG0oghwn9gUTw7WrEF/bvlMLF1O7joQcvSSAhKKz4UoWzCe11I2cYe1njpsYxczJ83A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=memverge.com; dmarc=pass action=none header.from=memverge.com; dkim=pass header.d=memverge.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=memverge.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=7VsNaj12jhqFOgLCBejhnfW+kbvKXW0SdrSScaY2JnA=; b=VGylj1f4+YxjK+AFoOD/JGV4ry1xzLkazAXYH8RddVVzxBB0dnkyTIZo15afsGdn6YNdq0OgeghlnSjGeN7ft7SXaOmQ/SYn9d6Go4P4saqEUa5mwIbfr4H9RfGFQn2i8WMfyvx5GK7vsFjQEtaTkWoUTjjMiAqNVcD0IxtWqew= Received: from PH0PR17MB4922.namprd17.prod.outlook.com (2603:10b6:510:d7::22) by MWHPR17MB1184.namprd17.prod.outlook.com (2603:10b6:300:8c::7) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5504.14; Tue, 2 Aug 2022 18:08:08 +0000 Received: from PH0PR17MB4922.namprd17.prod.outlook.com ([fe80::5ca7:2b89:7f14:b6fb]) by PH0PR17MB4922.namprd17.prod.outlook.com ([fe80::5ca7:2b89:7f14:b6fb%8]) with mapi id 15.20.5482.016; Tue, 2 Aug 2022 18:08:08 +0000 Date: Tue, 2 Aug 2022 23:37:44 +0530 From: Srinivas Aji To: linux-nvdimm , Linux MM Cc: Dan Williams , David Hildenbrand , Vivek Goyal , David Woodhouse , "Gowans, James" , Yue Li , Beau Beauchamp Subject: [RFC PATCH 2/4] device-dax: Add framework for keeping persistent data in DAX KMEM Message-ID: References: Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-ClientProxiedBy: MA1PR01CA0163.INDPRD01.PROD.OUTLOOK.COM (2603:1096:a00:71::33) To PH0PR17MB4922.namprd17.prod.outlook.com (2603:10b6:510:d7::22) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 1dd37549-4873-49db-5a07-08da74b1f42a X-MS-TrafficTypeDiagnostic: MWHPR17MB1184:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: QdAGOmhE2FNCZSowgdPjbEMxLMcSNKadKm9xTNkTW8iACh7/w642uhAJta1HenOEanyRwEFvJs6Y49UKITWsDa1GJrf32xaYmcN98HH8lWFfiJ9/3ug3sIguzbf5hqEGF5MNlnbw6ZBdTYoTHMMx2ByRjRWw++l0M4TMvUUxef5/tGU27SvHvQ1L+H7FpEcSLfrLkMVmmyrrQENwoeTQGwyrQLCrB9YNOSYovIb2tniSjYxhazp7XFD0B8T2sKFi9UjpXBthzsMkrG0O0IAg+eKRDd639EJ5gP9+GQ7aJVweWKiZEUr3GeL04HaKY0IDOMTYTcVlPVSyOhUiRLOI6I5bx+zmoxzm22NAEARqfmadnAMY9Wj9Z++cH2cRo6SdjSMtmMzqUBGr5keJZOGzcjmTS2Sl0afYSwN+mBV4QUwDN+H8q0+J2j9s4AJ040rtC+UE0eaisVdVK4MDRnYjn1V+k1FwgOvjntgBTAwbthFtPhaMCTYXbVi8POyAiLazHubX/zB0uf+h02B+qYSEwYbZ44duFqQqxF5sP5FU5gCgqRu5FfakM8BirJVhIFEqaUMSVX8i05S29Exa8XhFAeGNbnT6QyuU4PIBUbWchCpjg1DHg0WakBWl1yibPWKD3h/FEElMsgLUWeqx7mreIne5I+RG1nHhkcoPiqXDLxXxRvawkPakCVwblQmqjfSNjUXbDddSLsz41zDKDFFhd7QlpGYf/1Y5wlriFTi3n9E6qIYxDFHyr95mLchvSCo5nd1QapzBowxsLhEYAgcDFIMGRd9VKc5GTJSkOtc5EbW0Z7RH2UnaB36Df27SNl5m X-Forefront-Antispam-Report: CIP:255.255.255.255;CTRY:;LANG:en;SCL:1;SRV:;IPV:NLI;SFV:NSPM;H:PH0PR17MB4922.namprd17.prod.outlook.com;PTR:;CAT:NONE;SFS:(13230016)(366004)(346002)(396003)(376002)(136003)(39840400004)(478600001)(83380400001)(36756003)(186003)(5660300002)(86362001)(30864003)(107886003)(2616005)(54906003)(6486002)(8936002)(316002)(44832011)(110136005)(41300700001)(6506007)(2906002)(6666004)(66476007)(66556008)(66946007)(26005)(8676002)(4326008)(38350700002)(6512007)(52116002)(38100700002)(67856001);DIR:OUT;SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?VruFUZIqHVxMzhNIAiiC6134YUHcJtz4PKcoXhE6m498UVMwtozT9l/P66G9?= =?us-ascii?Q?MLG1BYG/agvWSkDec8j+2QOhezLRTTWS5ftORnbo2w8ab9WcNWsy+hUX3uS7?= =?us-ascii?Q?x1P6KRLF7TsCqJl9AVkpLMOkudQO+4qdPF9quZpOAosNetvvhEoN/I1smbEL?= =?us-ascii?Q?q3Skmf6z0ITg8GYRykpO3M1mZ77DCD/GAXJDQJTBjB6IALlvNkYpiq7fZkAG?= =?us-ascii?Q?enSdckyQIChDQMX57UUJ8hzwLIz7HxfbDiZcw4MM6bxsMu/Q6QfidkKQbuRh?= =?us-ascii?Q?/HaPnZ4/Lyom4iu+jp+tiPafZlD8uRy3Ap6bGaQjEDwLWMODo9xpsKJhmXz3?= =?us-ascii?Q?I/J8xclVrWdrDzEdAoyVdbI8GfG2/o3XDQ7tCQKq6Oe2F+MY+KUHD63nodZw?= =?us-ascii?Q?1JCLMlcTUrPBYoZVz+wLbWj9LRP5u8N/w5OlQt1xLac8yqbA9WMsHaGatMOF?= =?us-ascii?Q?E7RNPMLyrWmapZCp45kL1ISBzDRfwIfHYQx6b0ZUVjJKpzx3GroVi7xL24qr?= =?us-ascii?Q?aTsL1LAyMkXsYzrg1zTJwmXSH7k7GgZW1uFGVgsOvdz3gqfL8dbJs33vAAeM?= =?us-ascii?Q?BcG/Ix6+rh1axOyC43QDpE1FHb4HsGLRzWoAorqb38VGHp16+dsL08jYKskO?= =?us-ascii?Q?IZF+9KlNnfOSHNWndDJ0Nu2EmPSa4SXJBg9yxUuedVN3KDw+HxoPXuPpxLTH?= =?us-ascii?Q?eZY9rxWt6WBRXQbixRXD9LkK3tkG/lG32KGP2Lu08qmVCJAH4yEWgxQF4QPm?= =?us-ascii?Q?hIehAr7rQVX1YBR49XyzKDy/54RAbz/+f26FMeLUDN2URIL3BZGhBgxag1OQ?= =?us-ascii?Q?FO/dokZ8DPec4gBg50SwLduDPZ8/kxbbjy5+eSZL0IrOGv/FtGISzhutBWlI?= =?us-ascii?Q?iq4/52DZ7Cvd2ATrgV2AO80xZC+Kt++RDi+uLxR0xEEw09Zumd9U2bu1349h?= =?us-ascii?Q?VrPwaypXZcr/NhqrMgsyyDdtRo2gUlHdVObEcfx/mgnfpr5OduLs7Rwa+xAp?= =?us-ascii?Q?0X5FQzt8F7v45oFsUayl7Of0+0XzOkAZVAaLxR+R7cSlb3GHRlcemIPpw1YT?= =?us-ascii?Q?Y5o3aSym4ncMFVK4LKrT+SJhERx/RoodIwyX2ZFN5b3U+NIJColZ6/fr6xod?= =?us-ascii?Q?dSa/iha9hOY0XIQwi96SP9ltehN21uFYf3PTiHVDlEe7Aqyh/WqPG5b3lpxM?= =?us-ascii?Q?a91th/2iBe/ge5KWRK24l0jm58XNJXTtPKMzOLQI1SkzB5JR+a6wM3vvHfp4?= =?us-ascii?Q?XZzhGaZzYHANvTGJnCzd4hqWkiF5wfEj4xIs+cuB5IIRCl3iUk60J/VrF6X+?= =?us-ascii?Q?a2A/wNkBQbdTSQlhOZ/qTfNU8NApGvBhxwxHfaQHLJndPK+uOBBZ28dhqL1L?= =?us-ascii?Q?GV6wqR3Dg3VBsALqjxUDcVAgX4HWd98uCYYQi0vbP2iHkzOSFEcIBPQmRS1S?= =?us-ascii?Q?QGsFzIVDoMkxNXZDDeqlKJEURE913LP388lkwwGhlCB2tar/x4XP41xEnRU6?= =?us-ascii?Q?cwaeNlUGAa0hCbXWsldpNDYGCPmXxQj0PQXDAjMsS4S05OOLIKcDX4bXzI9z?= =?us-ascii?Q?5m5VMsg07TPAACTtRqh+vbE9vk6/7/Nu7bR5pnVc5kR6+coDLKBKf6PhhLoU?= =?us-ascii?Q?hg=3D=3D?= X-OriginatorOrg: memverge.com X-MS-Exchange-CrossTenant-Network-Message-Id: 1dd37549-4873-49db-5a07-08da74b1f42a X-MS-Exchange-CrossTenant-AuthSource: PH0PR17MB4922.namprd17.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Aug 2022 18:08:08.2358 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 5c90cb59-37e7-4c81-9c07-00473d5fb682 X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: ngMFMp+9PBAY9lCiwl7wyF7iqAacJED+32kN6iRK/U61LBCABzzS/3yOnvmzjj8N8JPDQT77HlREX9DjC1LTsweOULEORhPKNAPuG4b8MTQ= X-MS-Exchange-Transport-CrossTenantHeadersStamped: MWHPR17MB1184 ARC-Seal: i=2; s=arc-20220608; d=hostedemail.com; t=1659463693; a=rsa-sha256; cv=pass; b=7HGFOiB6Ew3JZ3s/7Dx5j4ewz9BsCtjwl2JDpTtSj56gVruolEtqWwAybh+t+6jDgmCbmf wTDtj62BRDPPxVSSqpwi8Wpr4KTCPcvm2vPmfFjA5ft2Op7sej98ant9IQnP06bYRjpLgq ggSShDhTJVGQenjXfDRfn5nXZSypHqg= ARC-Authentication-Results: i=2; imf12.hostedemail.com; dkim=pass header.d=memverge.com header.s=selector2 header.b=VGylj1f4; spf=pass (imf12.hostedemail.com: domain of srinivas.aji@memverge.com designates 40.107.220.76 as permitted sender) smtp.mailfrom=srinivas.aji@memverge.com; dmarc=pass (policy=none) header.from=memverge.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") ARC-Message-Signature: i=2; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659463693; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7VsNaj12jhqFOgLCBejhnfW+kbvKXW0SdrSScaY2JnA=; b=EPXXJNMCs8rSpQZmhLT4UQyL3NKCXA72tlCTDHdvLihiOWXcihAxg5xuUNowQ5V6FGKF8w E6nOJG/leoHNbiBjAqknjKK0pJiuiCZg/rL2T7drx0HUFlx7ALI5PBbv2TLl027r0DW6sE TwWcVdUcSXUdAQm7LImoWn3xq0xxaIo= X-Rspamd-Server: rspam04 X-Stat-Signature: d3ji4wh3fuygztq8agdpacjiyy3ph1dd Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=memverge.com header.s=selector2 header.b=VGylj1f4; spf=pass (imf12.hostedemail.com: domain of srinivas.aji@memverge.com designates 40.107.220.76 as permitted sender) smtp.mailfrom=srinivas.aji@memverge.com; dmarc=pass (policy=none) header.from=memverge.com; arc=pass ("microsoft.com:s=arcselector9901:i=1") X-Rspamd-Queue-Id: 069A44010F X-Rspam-User: X-HE-Tag: 1659463692-827328 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: DAX memory treated as system RAM will be checked for persistent data and passed to the appropriate plugin. The plugin is provided with functions for accessing pages by device offset, and also for allocating and freeing pages from the DAX device provided memory. Add a config option CONFIG_DEV_DAX_KMEM_PERSIST which controls this feature. The plugin framework allows multiple formats for the persistent data. The contents of the initial block determine which plugin is used. A module parameter, persist_format_type, to the kmem module sets the plugin type which is used to format any hotplugged DAX which does not have a recognized initial block. Note: With just this change but without futher patches which implement plugins for persistence, adding a DAX device as KMEM will only add it as fully allocated. Limitation: Adding a DAX device as KMEM will succeed only if this memory is the only memory in its NUMA node. Signed-off-by: Srinivas Aji --- drivers/dax/Kconfig | 13 ++ drivers/dax/kmem.c | 266 ++++++++++++++++++++++++++++++++++++- drivers/dax/kmem_persist.h | 43 ++++++ 3 files changed, 320 insertions(+), 2 deletions(-) create mode 100644 drivers/dax/kmem_persist.h diff --git a/drivers/dax/Kconfig b/drivers/dax/Kconfig index 5fdf269a822e..837178b841b6 100644 --- a/drivers/dax/Kconfig +++ b/drivers/dax/Kconfig @@ -66,4 +66,17 @@ config DEV_DAX_KMEM Say N if unsure. +config DEV_DAX_KMEM_PERSIST + tristate "KMEM PERSIST: persistent storage together with kmem" + default DEV_DAX_KMEM + depends on DEV_DAX_KMEM + help + Support using a DAX device as system memory while also allowing + persistent data on it. This is done by treating all the + persistent data as pre-allocated when we add the DAX device + as system memory and further acquiring and releasing persistent + data through memory allocate and free. + + Say N if unsure. + endif diff --git a/drivers/dax/kmem.c b/drivers/dax/kmem.c index a37622060fff..df7cfc8ace78 100644 --- a/drivers/dax/kmem.c +++ b/drivers/dax/kmem.c @@ -13,6 +13,9 @@ #include #include "dax-private.h" #include "bus.h" +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST +#include "kmem_persist.h" +#endif /* Memory resource name used for add_memory_driver_managed(). */ static const char *kmem_name; @@ -38,9 +41,23 @@ static int dax_kmem_range(struct dev_dax *dev_dax, int i, struct range *r) struct dax_kmem_data { const char *res_name; int mgid; +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST + unsigned long total_len; + struct kmem_persist_ops *persist_ops; + void *persist_data; +#endif struct resource *res[]; }; +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST +static int kmem_persist_probe(struct dev_dax *dev_dax, + struct kmem_persist_ops **persist_ops, + void **persist_data); +static int kmem_persist_cleanup(struct dev_dax *dev_dax, + struct kmem_persist_ops *persist_ops, + void *persist_data); +#endif + static int dev_dax_kmem_probe(struct dev_dax *dev_dax) { struct device *dev = &dev_dax->dev; @@ -48,6 +65,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) struct dax_kmem_data *data; int i, rc, mapped = 0; int numa_node; + mhp_t mhp_flags; /* * Ensure good NUMA information for the persistent memory. @@ -62,6 +80,18 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) return -EINVAL; } +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST + /* + * Check if NUMA node has any memory already + */ + if (node_online(numa_node) && node_present_pages(numa_node) != 0) { + dev_warn(dev, + "rejecting DAX region on numa_node with existing memory: numa_node %d, existing pages %lu\n", + numa_node, node_present_pages(numa_node)); + return -EINVAL; + } +#endif + for (i = 0; i < dev_dax->nr_range; i++) { struct range range; @@ -92,6 +122,15 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) if (rc < 0) goto err_reg_mgid; data->mgid = rc; +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST + data->total_len = total_len; +#endif + + mhp_flags = MHP_NID_IS_MGID +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST + | MHP_ALLOCATE +#endif + ; for (i = 0; i < dev_dax->nr_range; i++) { struct resource *res; @@ -130,8 +169,7 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) * this as RAM automatically. */ rc = add_memory_driver_managed(data->mgid, range.start, - range_len(&range), kmem_name, MHP_NID_IS_MGID); - + range_len(&range), kmem_name, mhp_flags); if (rc) { dev_warn(dev, "mapping%d: %#llx-%#llx memory add failed\n", i, range.start, range.end); @@ -147,6 +185,14 @@ static int dev_dax_kmem_probe(struct dev_dax *dev_dax) dev_set_drvdata(dev, data); +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST + rc = kmem_persist_probe(dev_dax, + &data->persist_ops, + &data->persist_data); + if (rc) + dev_err(dev, "Cannot setup kmem persistent data\n"); +#endif + return 0; err_request_mem: @@ -165,6 +211,18 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) struct device *dev = &dev_dax->dev; struct dax_kmem_data *data = dev_get_drvdata(dev); +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST + /* + * TODO:This is probably the wrong place to call this. We need to + * call this before the blocks are marked offline, but after + * ensuring no new allocations. + */ + if (kmem_persist_cleanup(dev_dax, data->persist_ops, + data->persist_data)) { + dev_err(dev, "Block device cannot be freed.\n"); + return; + } +#endif /* * We have one shot for removing memory, if some memory blocks were not * offline prior to calling this function remove_memory() will fail, and @@ -214,6 +272,210 @@ static void dev_dax_kmem_remove(struct dev_dax *dev_dax) } #endif /* CONFIG_MEMORY_HOTREMOVE */ +#ifdef CONFIG_DEV_DAX_KMEM_PERSIST +struct page *dax_kmem_index_to_page(unsigned long page_index, + struct dev_dax *dev_dax) +{ + struct device *dev = &dev_dax->dev; + struct dax_kmem_data *data = dev_get_drvdata(dev); + int i; + unsigned long page_offset = 0; + + for (i = 0; i < dev_dax->nr_range; i++) { + struct resource *r = data->res[i]; + unsigned long page_len = (r->end + 1 - r->start) >> PAGE_SHIFT; + + if (page_offset + page_len <= page_index) { + page_offset += page_len; + continue; + } + return pfn_to_page((r->start >> PAGE_SHIFT) + + (page_index - page_offset)); + } + return NULL; +} + +unsigned long dax_kmem_num_pages(struct dev_dax *dev_dax) +{ + struct device *dev = &dev_dax->dev; + struct dax_kmem_data *data = dev_get_drvdata(dev); + + return data->total_len >> PAGE_SHIFT; +} + +struct page *dax_kmem_alloc_page(struct dev_dax *dev_dax, + unsigned long *page_index) +{ + struct device *dev = &dev_dax->dev; + struct dax_kmem_data *data = dev_get_drvdata(dev); + int i; + unsigned long page_offset = 0; + u64 phys; + struct page *page = + alloc_pages_node(dev_dax->target_node, + GFP_NOIO | __GFP_ZERO | __GFP_THISNODE, + 0); + if (!page) + return NULL; + + phys = __pfn_to_phys(page_to_pfn(page)); + + for (i = 0; i < dev_dax->nr_range; i++) { + struct resource *r = data->res[i]; + unsigned long page_len = (r->end + 1 - r->start) >> PAGE_SHIFT; + + if (phys >= r->start && phys <= r->end) { + *page_index = + page_offset + ((phys - r->start) >> PAGE_SHIFT); + break; + } + page_offset += page_len; + } + if (i == dev_dax->nr_range) { + dev_err(dev, "Allocated page not in DAX range. Freeing.\n"); + __free_page(page); + page = NULL; + } + + return page; +} + +static int persist_format_type = -1; +module_param(persist_format_type, int, 0644); + +/* + * Forcibly format new KMEM with persist_format_type. This can cause loss + * of existing persistent data, so this should be replaced with some + * other mechanism for reformatting. + */ +static bool persist_format_force; +module_param(persist_format_force, bool, 0644); + +static LIST_HEAD(persist_types); +static DEFINE_MUTEX(persist_types_lock); + +int kmem_persist_type_register(struct kmem_persist_ops *ops) +{ + mutex_lock(&persist_types_lock); + ops->ref_count = 0; + list_add_tail(&ops->next, &persist_types); + mutex_unlock(&persist_types_lock); + return 0; +} + +int kmem_persist_type_unregister(struct kmem_persist_ops *ops) +{ + mutex_lock(&persist_types_lock); + if (ops->ref_count != 0) { + mutex_unlock(&persist_types_lock); + return -1; + } + list_del(&ops->next); + mutex_unlock(&persist_types_lock); + return 0; +} + +int kmem_persist_probe(struct dev_dax *dev_dax, + struct kmem_persist_ops **persist_ops, + void **persist_data) +{ + struct device *dev = &dev_dax->dev; + struct kmem_persist_superblock *super; + enum kmem_persist_type ptype; + bool format = false; + bool ptype_found = false; + struct kmem_persist_ops *ops; + void *data; + struct list_head *pos; + int rc; + + super = kmap_local_page(dax_kmem_index_to_page(0, dev_dax)); + + if (super->magic != kmem_persist_magic) { + if (persist_format_type == -1) { + dev_err(dev, "kmem unformatted for persistence\n"); + kunmap_local(super); + return -EINVAL; + } + ptype = persist_format_type; + format = true; + } else { + ptype = super->type; + } + + mutex_lock(&persist_types_lock); + list_for_each(pos, &persist_types) { + ops = list_entry(pos, struct kmem_persist_ops, next); + if (ops->type == ptype) { + ops->ref_count++; + ptype_found = true; + break; + } + } + mutex_unlock(&persist_types_lock); + + if (!ptype_found) { + dev_err(dev, "No persistence module with type %d\n", ptype); + kunmap_local(super); + return -EINVAL; + } + + if (format) { + rc = ops->format(dev_dax); + if (rc || + super->magic != kmem_persist_magic || + super->type != persist_format_type + ) { + dev_err(dev, + "Error formatting kmem persistence type %d\n", + ptype); + mutex_lock(&persist_types_lock); + ops->ref_count--; + mutex_unlock(&persist_types_lock); + + kunmap_local(super); + return rc; + } + } + + kunmap_local(super); + super = NULL; + + rc = ops->probe(dev_dax, &data); + if (rc) { + dev_err(dev, "Error initializing kmem persistence type %d\n", + ptype); + return rc; + } + *persist_ops = ops; + *persist_data = data; + return 0; +} + +int kmem_persist_cleanup(struct dev_dax *dev_dax, + struct kmem_persist_ops *pops, + void *persist_data) +{ + int rc; + struct device *dev = &dev_dax->dev; + + rc = pops->cleanup(dev_dax, persist_data); + + if (rc) { + dev_err(dev, "Error cleaning up kmem persistence type %d\n", + pops->type); + return rc; + } + + mutex_lock(&persist_types_lock); + pops->ref_count--; + mutex_unlock(&persist_types_lock); + + return 0; +} + +#endif /* CONFIG_DEV_DAX_KMEM_PERSIST */ + static struct dax_device_driver device_dax_kmem_driver = { .probe = dev_dax_kmem_probe, .remove = dev_dax_kmem_remove, diff --git a/drivers/dax/kmem_persist.h b/drivers/dax/kmem_persist.h new file mode 100644 index 000000000000..dd651025f28c --- /dev/null +++ b/drivers/dax/kmem_persist.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* + * Copyright(c) 2022 MemVerge. All rights reserved. + */ +#ifndef __KMEM_PERSIST_H__ +#define __KMEM_PERSIST_H__ + +struct page; +struct dev_dax; + +enum kmem_persist_type { + KMEM_PERSIST_NONE = 0, +}; + + +struct kmem_persist_ops { + enum kmem_persist_type type; + int (*format)(struct dev_dax *dev_dax); + int (*probe)(struct dev_dax *dev_dax, void **data); + int (*cleanup)(struct dev_dax *dev_dax, void *data); + int ref_count; + struct list_head next; +}; + +static const unsigned long kmem_persist_magic = 0x4b4d454d50455253L; // KMEMPERS + +struct kmem_persist_superblock { + unsigned long magic; + enum kmem_persist_type type; +} __packed; + +int kmem_persist_type_register(struct kmem_persist_ops *ops); + +int kmem_persist_type_unregister(struct kmem_persist_ops *ops); + + +struct page *dax_kmem_index_to_page(unsigned long page_index, + struct dev_dax *dev_dax); +unsigned long dax_kmem_num_pages(struct dev_dax *dev_dax); +struct page *dax_kmem_alloc_page(struct dev_dax *dev_dax, + unsigned long *page_index); + +#endif -- 2.30.2