From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 275F8C19F29 for ; Tue, 2 Aug 2022 18:03:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B19AD8E0002; Tue, 2 Aug 2022 14:03:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AC7FD6B0072; Tue, 2 Aug 2022 14:03:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 98FBF8E0002; Tue, 2 Aug 2022 14:03:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 8C4DE6B0071 for ; Tue, 2 Aug 2022 14:03:15 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 62991121028 for ; Tue, 2 Aug 2022 18:03:15 +0000 (UTC) X-FDA: 79755424350.27.964B61A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id D7EAA1C0118 for ; Tue, 2 Aug 2022 18:03:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1659463394; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hBJc3AWnfmH/frQkMRNfcCdscnzDJx/oe4FkmPBrDuA=; b=d+3vUtU6Z6RIRyVBrZs3p8+4xLoXHnNnIOynjG+Bw1+s/9Zt4RXLChHSlN0UXUFiNWwqcr MlcWjeX6L3MsqywykjMcH3ylIg+w3QLZT72jgn4ap7FwIhbuw/D/wrAhUGQsZaqSWgBL6T +ebW9OA015omqGC9pTUo0ZN2mhn4hPw= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-155-DTjNGZxvP5WlAcuani4h4Q-1; Tue, 02 Aug 2022 14:03:13 -0400 X-MC-Unique: DTjNGZxvP5WlAcuani4h4Q-1 Received: by mail-wm1-f69.google.com with SMTP id h189-20020a1c21c6000000b003a2fdf9bd2aso6391065wmh.8 for ; Tue, 02 Aug 2022 11:03:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:cc:references:from:organization:in-reply-to :content-transfer-encoding; bh=hBJc3AWnfmH/frQkMRNfcCdscnzDJx/oe4FkmPBrDuA=; b=60/VjkQjzQdW+X1m3ShR6xBwJIrNULWsnj5cUyFQPXHQ20vQ7+Z9Iwy6LwQkdGjEDB TGQzga07jc+LlKodxa5azSh9lheQKTYnXE5tyHiIVI0Dh4F8wIsQC6r2bmazrdrn1dKp O+HyxdBqBgkvvPBpbiGe8mzRnKUVu7x5DKsO3Aq8YuCZ0rWI39d2wZBptG2Jik2TjyDH Gzk9AebqkK/vHgbbJLTNw0ZXLJDqt3XvJ1aOTKh59DLvTYA8sMe1sKxlf9Zos1oZWIXX nQA6N10MUZNCU12HyMd8YNS5FwY4MsHN7loCSgxbCeFT1wfpv1h4JnlAVrnNYyyQN2HU kg+Q== X-Gm-Message-State: ACgBeo3Cl3TsiZAcJajkMLYWCP2s2cNGFRD2S+9XKV5nqVf9dBL3PruG filAJk+gUongQkLjP1LoMwGbxxlhlPQzc++ShuRPW95h6cvxbHoalqcE8voSHm4Vg6JiOiUDtWV SY8M+F7QdWkM= X-Received: by 2002:a5d:4b8d:0:b0:21f:cf60:c9e6 with SMTP id b13-20020a5d4b8d000000b0021fcf60c9e6mr10311008wrt.707.1659463392019; Tue, 02 Aug 2022 11:03:12 -0700 (PDT) X-Google-Smtp-Source: AA6agR7fKVjOgvsLbiZ/cV27RBLfa18wNrjC/OPWV4z7cWYMP9Sf09WQplSkaRpaqhVbZRZnDlp3/Q== X-Received: by 2002:a5d:4b8d:0:b0:21f:cf60:c9e6 with SMTP id b13-20020a5d4b8d000000b0021fcf60c9e6mr10310994wrt.707.1659463391742; Tue, 02 Aug 2022 11:03:11 -0700 (PDT) Received: from ?IPV6:2003:cb:c707:3800:8435:659e:f80:9b3d? (p200300cbc70738008435659e0f809b3d.dip0.t-ipconnect.de. [2003:cb:c707:3800:8435:659e:f80:9b3d]) by smtp.gmail.com with ESMTPSA id z12-20020a05600c03cc00b003a2eacc8179sm18923232wmd.27.2022.08.02.11.03.11 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 Aug 2022 11:03:11 -0700 (PDT) Message-ID: <5b5266fa-87e0-2db4-5da6-6f8f299c7cdb@redhat.com> Date: Tue, 2 Aug 2022 20:03:10 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [RFC PATCH 0/4] Allow persistent data on DAX device being used as KMEM To: Srinivas Aji , linux-nvdimm , Linux MM Cc: Dan Williams , Vivek Goyal , David Woodhouse , "Gowans, James" , Yue Li , Beau Beauchamp References: From: David Hildenbrand Organization: Red Hat In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659463395; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hBJc3AWnfmH/frQkMRNfcCdscnzDJx/oe4FkmPBrDuA=; b=TaKpnvWxyQ+rzNBMnoCuZKs/oRwcZgx+539eGPoieUOHF7QEnWQfKF/URNHYETwEideXuC 0yXGLkQwzl6U++//s8YrUI0q/tDfgN4SN8l0ra6cCVTZTTtbSCTcrP1azJrn7S8t03eEaK QWOoayrvVOb+c0O6s5FwZSHFNQX5a6Q= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=d+3vUtU6; spf=pass (imf18.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659463395; a=rsa-sha256; cv=none; b=cHrVnpBqfZLcocxHwN0/26Mb35OrS/HbTlFhwYWjcFHm6nZMdbR7Af3T7cbCQiKH2ylXmu 14HcSfrX3yNntVsMriocbvKyUbqXIpFJpNCbueNdM2aVvNe6dPvlQZFHmhg90DvHWWBqN5 ljwWn3cUKJieRTquvm/j/sFRPCVldZ8= X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=d+3vUtU6; spf=pass (imf18.hostedemail.com: domain of david@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=david@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: 5gwsdb4uayocdpzn85aubfbt6x8hxfz6 X-Rspamd-Queue-Id: D7EAA1C0118 X-Rspamd-Server: rspam10 X-HE-Tag: 1659463394-665860 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 02.08.22 19:57, Srinivas Aji wrote: > Linux supports adding a DAX driver managed memory region as system > memory using the KMEM driver (from version 5.1). We would like to use > a persistent addressable memory segment as system memory and > simultaneously for storing some persistent data. > > Motivation: It is already possible to partition an NVDIMM device for > RAM and storage by creating separate regions on the device and using > one of them with KMEM and another as fsdax. This patch set is a start > to trying to get zero copy snapshots of processes which are using the > DAX device as RAM. That requires dynamically sharing pages between > process RAM and the storage within a single NVDIMM region. > > To do this, we add a layer for handling the persistent data which does > the following: > > 1. When a DAX device is added as KMEM, mark all the memory as > allocated and pass it up to a module which is aware of the storage > layout. > > 2. This module scans the memory, identifies the unused parts, and > frees those memory pages. > > 3. Further memory from this device is allocated using the kernel > memory allocation API. The memory allocation API currently allows > the allocation to be limited only based on NUMA node. So this > feature works only when the DAX device used as KMEM is the only > memory from its NUMA node. > > 4. Discarding of blocks previously used for persistent data results in > those blocks being freed to system memory. > > As an example, we implement a simple persistence module using the > above framework to provide a block device. A block device assumes all > blocks are always available, but in this case we have to get the > blocks through the memory allocation API, at an offset not under our > control. To provide block device semantics, we maintain an array which > maps the logical block number to the real physical page, if one > exists. Block device Trim/Discard support is used to mark blocks as > unused. > > While we have the block device here as an example, a memory filesystem > might be a more useful implementation. I am not sure if any of the > existing in-memory filesystem structures are suited for > persistence. Any suggestions for this are appreciated. > > Srinivas Aji (4): > mm/memory_hotplug: Add MHP_ALLOCATE flag which treats hotplugged > memory as allocated Without seeing the actual patches, I am very skeptical that this is the right approach, especially regarding memory onlining/offlining. virtio-mem achieves something similar (yet different) by hooking into generic_online_page(). From there, you can control what should actually happen with memory that is getting onlined (e.g., free them to the buddy or treat them differently). Did you evaluate going that path? -- Thanks, David / dhildenb