linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: lsf-pc@lists.linux-foundation.org
Cc: linux-mm@kvack.org
Subject: [LSF/MM TOPIC] Guest memory without struct page
Date: Fri, 14 Feb 2020 21:32:11 +0000	[thread overview]
Message-ID: <1be38ae3-d51e-2661-d0ab-6ad8baefe804@oracle.com> (raw)

All system RAM is tracked by a metadata structure called 'struct page' which
amounts to 64bytes and represents a certain page granualarity. On x86 (or
systems which PAGE_SIZE is 4K) this data structure represents a total of 1.5%
overhead of total capacity.

For hypervisors -- specially those without vhost/PV-devices, and just VFs --
persistent/volatile memory is largely assigned to userspace without kernel
taking part in any of it's I/O paths, except for VFIO. 1.5% may not seem like
much, but it is still a total of 16G per Tb just for struct page, which is a lot
considering the hypervisor won't need it and instead should be used to create
more guests (=Happy Users).

The RFC patches submitted here [0] approach this through device-dax given the
interface it provides already for VMMs and also given that this is too a source
of overhead for non-volatile memory assigned to guests. Essentially it extends
device-dax to create a PFNMAP vma with special pages (while adding support for
huge special pages). host memory would be limited through some form of mem=X,
efi_fake_mem=Y@X:0x40000 or memmap=Y@X-1+0xefffffff i.e. dedicate Y amount for
guests memory.

Should vhost-{net,scsi,etc} be used, we copy from/to guest memory (which works
today for vhost-net, and easily adjusted for vhost-scsi), or perhaps explore
dynamically creating/freeing struct pages on GUP temporary pinning.

This topic would be to brainstorm the idea/proposal and also discuss
alternatives/pitfalls/limitations/other-usecases(*).

Regards,
  Joao

(*) To some extent there might be a similarity to '"Secret" memory userspace
APIs' subitem of this previously submitted topic[1] given that the guest memory
in the described topic isn't part of the direct map.

[0]
https://lore.kernel.org/linux-mm/20200110190313.17144-1-joao.m.martins@oracle.com/
[1] https://lore.kernel.org/linux-mm/20200206165900.GD17499@linux.ibm.com/



                 reply	other threads:[~2020-02-14 21:32 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1be38ae3-d51e-2661-d0ab-6ad8baefe804@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox