linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: John Hubbard <jhubbard@nvidia.com>
To: <jglisse@redhat.com>, <lsf-pc@lists.linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	<linux-fsdevel@vger.kernel.org>, <linux-block@vger.kernel.org>,
	<linux-mm@kvack.org>, lsf-pc <lsf-pc@lists.linux-foundation.org>
Subject: Re: [Lsf-pc][LSF/MM/BPF TOPIC] Generic page write protection
Date: Wed, 22 Jan 2020 10:27:08 -0800	[thread overview]
Message-ID: <174cd3a0-43e5-d8bd-5cc3-d562f5727283@nvidia.com> (raw)
In-Reply-To: <20200122023222.75347-1-jglisse@redhat.com>

Adding: lsf-pc

On 1/21/20 6:32 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
> 
> 
> Provide a generic way to write protect page (à la KSM) to enable new mm
> optimization:
>     - KSM (kernel share memory) to deduplicate pages (for file
>       back pages too not only anonymous memory like today)
>     - page duplication NUMA (read only duplication) in multiple
>       different physical page. For instance share library code
>       having a copy on each NUMA node. Or in case like GPU/FPGA
>       duplicating memory read only inside the local device memory.
>     ...
> 
> Note that this write protection is intend to be broken at anytime in
> reasonable time (like KSM today) so that we never block more than
> necessary anything that need to write to the page.
> 
> 
> The goal is to provide a mechanism that work for both anonymous and
> file back memory. For this we need to a pointer inside struct page.
> For anonymous memory KSM uses the anon_vma field which correspond
> to mapping field for file back pages.
> 
> So to allow generic write protection for file back pages we need to
> avoid relying on struct page mapping field in the various kernel code
> path that do use it today.
> 
> The page->mapping fields is use in 5 different ways:
>  [1]- Functions operating on file, we can get the mapping from the file
>       (issue here is that we might need to pass the file down the call-
>       stack)
> 
>  [2]- Core/arch mm functions, those do not care about the file (if they
>       do then it means they are vma related and we can get the mapping
>       from the vma). Those functions only want to be able to walk all
>       the pte point to the page (for instance memory compaction, memory
>       reclaim, ...). We can provide the exact same functionality for
>       write protected pages (like KSM does today).
> 
>  [3]- Block layer when I/O fails. This depends on fs, for instance for
>       fs which uses buffer_head we can update buffer_head to store the
>       mapping instead of the block_device as we can get the block_device
>       from the mapping but not the mapping from the block_device.
> 
>       So solving this is mostly filesystem specific but i have not seen
>       any fs that could not be updated properly so that block layer can
>       report I/O failures without relying on page->mapping
> 
>  [4]- Debugging (mostly procfs/sysfs files to dump memory states). Those
>       do not need the mapping per say, we just need to report page states
>       (and thus write protection information if page is write protected).
> 
>  [5]- GUP (get user page) if something calls GUP in write mode then we
>       need to break write protection (like KSM today). GUPed page should
>       not be write protected as we do not know what the GUPers is doing
>       with the page.
> 
> 
> Most of the patchset deals with [1], [2] and [3] ([4] and [5] are mostly
> trivial).
> 
> For [1] we only need to pass down the mapping to all fs and vfs callback
> functions (this is mostly achieve with coccinelle). Roughly speaking the
> patches are generated with following pseudo code:
> 
> add_mapping_parameter(func)
> {
>     function_add_parameter(func, mapping);
> 
>     for_each_function_calling (caller, func) {
>         calling_add_parameter(caller, func, mapping);
> 
>         if (function_parameters_contains(caller, mapping|file))
>             continue;
> 
>         add_mapping_parameter(caller);
>     }
> }
> 
> passdown_mapping()
> {
>     for_each_function_in_fs (func, fs_functions) {
>         if (!function_body_contains(func, page->mapping))
>             continue;
> 
>         if (function_parameters_contains(func, mapping|file))
>             continue;
> 
>         add_mapping_parameter(func);
>     }
> }
> 
> For [2] KSM is generalized and extended so that both anonymous and file
> back pages can be handled by a common write protected page case.
> 
> For [3] it depends on the filesystem (fs which uses buffer_head are
> easily handled by storing mapping into the buffer_head struct).
> 
> 
> To avoid any regression risks the page->mapping field is left intact as
> today for non write protect pages. This means that if you do not use the
> page write protection mechanism then it can not regress. This is achieve
> by using an helper function that take the mapping from the context
> (current function parameter, see above on how function are updated) and
> the struct page. If the page is not write protected then it uses the
> mapping from the struct page (just like today). The only difference
> between before and after the patchset is that all fs functions that do
> need the mapping for a page now also do get it as a parameter but only
> use the parameter mapping pointer if the page is write protected.
> 
> Note also that i do not believe that once confidence is high that we
> always passdown the correct mapping down each callstack, it does not
> mean we will be able to get rid of the struct page mapping field.
> 
> I posted patchset before [*1] and i intend to post an updated patchset
> before LSF/MM/BPF. I also talked about this at LSF/MM 2018. I still
> believe this will a topic that warrent a discussion with FS/MM and
> block device folks.
> 
> 
> [*1] https://lwn.net/Articles/751050/
>      https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
> [*2] https://lwn.net/Articles/752564/
> 
> 
> To: lsf-pc@lists.linux-foundation.org
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-block@vger.kernel.org
> Cc: linux-mm@kvack.org
> 
> 

thanks,
-- 
John Hubbard
NVIDIA


      parent reply	other threads:[~2020-01-22 18:27 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-22  2:32 [LSF/MM/BPF " jglisse
2020-01-22  4:28 ` Gao Xiang
2020-01-22  5:21   ` Jerome Glisse
2020-01-22  5:52     ` Gao Xiang
2020-01-22  6:09       ` Jerome Glisse
2020-01-22  6:21         ` Gao Xiang
2020-01-22  4:41 ` John Hubbard
2020-01-22 18:27 ` John Hubbard [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=174cd3a0-43e5-d8bd-5cc3-d562f5727283@nvidia.com \
    --to=jhubbard@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lsf-pc@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox