From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C42A9C433FE for ; Mon, 7 Dec 2020 19:32:21 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 376B5238D7 for ; Mon, 7 Dec 2020 19:32:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 376B5238D7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=intel.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 558078D000C; Mon, 7 Dec 2020 14:32:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 508648D0001; Mon, 7 Dec 2020 14:32:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F6038D000C; Mon, 7 Dec 2020 14:32:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 278CE8D0001 for ; Mon, 7 Dec 2020 14:32:20 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id DDAFA180AD807 for ; Mon, 7 Dec 2020 19:32:19 +0000 (UTC) X-FDA: 77567482398.26.lake11_2501d3c273e0 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id B5FFF1804B669 for ; Mon, 7 Dec 2020 19:32:19 +0000 (UTC) X-HE-Tag: lake11_2501d3c273e0 X-Filterd-Recvd-Size: 5928 Received: from mail-ej1-f68.google.com (mail-ej1-f68.google.com [209.85.218.68]) by imf47.hostedemail.com (Postfix) with ESMTP for ; Mon, 7 Dec 2020 19:32:18 +0000 (UTC) Received: by mail-ej1-f68.google.com with SMTP id d17so21153970ejy.9 for ; Mon, 07 Dec 2020 11:32:18 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mEesS3OI54HKVDbciJY01YZ/Otnyn7nILQ7ygAOmgxw=; b=S7HQZhJ9qSfgRFeBQ/VI28rCEZ0AfT+LagJrORVZEpyQI+IsauFJPZPKcstnNe9KqP E9dEdOhjnbRu4qdedgy7a/2Lk+8UBtZYusjrNtGz8GfGYXbaSGsQ+ZBsNFTS1KXCRiM5 DFmCco1qg0r2qOoW++FPNbTfOc5F/059ESmISx/fN4YY75Bfn7371RU2sGUPYCt4m19W G0RG1fS35aU8JbtHZL2Z6a5xmnb5OyXlDCHm1gABi6S93ZMK1cROG+DC2DjFpBmHhsD1 vUJqCZIB16ZvME9K3ZeQOzqjJOqPlVeyukGq6E4dcYt6HgsPCUv44Ievc2j8VflP2Mu2 MjIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mEesS3OI54HKVDbciJY01YZ/Otnyn7nILQ7ygAOmgxw=; b=CWT+L+Wg5+VUKKxbO0vUmPspGjChw2J3GkM5CgEHhDLOrvv0NAR2jnQXyI4MwuZ/re k//rc2CLXgkV4VAnMkQFB2BC1mwE2N59p/gkUVrH3XSToup8u7A31mn/iKnu06Mhl5Db wdAFTeXneqJThPiFLAjTY2H0nbOgVqsWou/xJpCuEkwDpo96vJYCweBIxvRIPmM0Iyka 673RFFD4aaWt9kdCNFU8d1QRSJeDINaRmDMFuw8EqH3IAWpVKnoLxU0OjXPEChJiUIhU tAs1eeC1JTd9SFvHCJgFHwlQbs2ljvqgF10Vwuwmrh6mQwcmsKOURaBVGHXYQVzV44Mn o/JQ== X-Gm-Message-State: AOAM532y+GYIRkOV0GRDjx+x7LKLtvkJwCRWKH1zw7i1Qhasc8fDAaff /N5kqjAnG+sHjDSn26t+4haTvst8gXqpUglu4t+j/Q== X-Google-Smtp-Source: ABdhPJx5TeZNSqN3ip9Hy7T4c21kaCHiRPLpQlPgDb6J2CMXhw6hKQwEMBJ3WgD8dM5pAXawaTU2xRXz3vPlPxQJP7g= X-Received: by 2002:a17:906:2707:: with SMTP id z7mr14807988ejc.418.1607369537299; Mon, 07 Dec 2020 11:32:17 -0800 (PST) MIME-Version: 1.0 References: <33a1c4ca-9f78-96ca-a774-3adea64aaed3@redhat.com> In-Reply-To: <33a1c4ca-9f78-96ca-a774-3adea64aaed3@redhat.com> From: Dan Williams Date: Mon, 7 Dec 2020 11:32:15 -0800 Message-ID: Subject: Re: [RFC V2 00/37] Enhance memory utilization with DMEMFS To: David Hildenbrand Cc: yulei zhang , Linux MM , Andrew Morton , linux-fsdevel , KVM list , Linux Kernel Mailing List , Naoya Horiguchi , Al Viro , Paolo Bonzini , Joao Martins , Randy Dunlap , Sean J Christopherson , Xiao Guangrong , Wanpeng Li , Haiwei Li , Yulei Zhang Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 7, 2020 at 4:03 AM David Hildenbrand wrote: > > On 07.12.20 12:30, yulei.kernel@gmail.com wrote: > > From: Yulei Zhang > > > > In current system each physical memory page is assocaited with > > a page structure which is used to track the usage of this page. > > But due to the memory usage rapidly growing in cloud environment, > > we find the resource consuming for page structure storage becomes > > more and more remarkable. So is it possible that we could reclaim > > such memory and make it reusable? > > > > This patchset introduces an idea about how to save the extra > > memory through a new virtual filesystem -- dmemfs. > > > > Dmemfs (Direct Memory filesystem) is device memory or reserved > > memory based filesystem. This kind of memory is special as it > > is not managed by kernel and most important it is without 'struct page'. > > Therefore we can leverage the extra memory from the host system > > to support more tenants in our cloud service. > > "is not managed by kernel" well, it's obviously is managed by the > kernel. It's not managed by the buddy ;) > > How is this different to using "mem=X" and mapping the relevant memory > directly into applications? Is this "simply" a control instance on top > that makes sure unprivileged process can access it and not step onto > each others feet? Is that the reason why it's called a "file system"? > (an example would have helped here, showing how it's used) > > It's worth noting that memory hotunplug, memory poisoning and probably > more is currently fundamentally incompatible with this approach - which > should better be pointed out in the cover letter. > > Also, I think something similar can be obtained by using dax/hmat > infrastructure with "memmap=", at least I remember a talk where this was > discussed (but not sure if they modified the firmware to expose selected > memory as soft-reserved - we would only need a cmdline parameter to > achieve the same - Dan might know more). There is currently the efi_fake_mem parameter that can add the "EFI_MEMORY_SP" attribute on EFI platforms: efi_fake_mem=4G@9G:0x40000 ...this results in a /dev/dax instance that can be further partitioned via the device-dax sub-division facility merged for 5.10. That could be generalized to something else for non-EFI platforms, but there has not been a justification to go that route yet. Joao pointed this out in a previous posting of DMEMFS, and I have yet to see an explanation of incremental benefit the kernel gains from having yet another parallel memory management interface.