From: John Hubbard <jhubbard@nvidia.com>
To: <jglisse@redhat.com>, <lsf-pc@lists.linux-foundation.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
<linux-fsdevel@vger.kernel.org>, <linux-block@vger.kernel.org>,
<linux-mm@kvack.org>, lsf-pc <lsf-pc@lists.linux-foundation.org>
Subject: Re: [Lsf-pc][LSF/MM/BPF TOPIC] Generic page write protection
Date: Wed, 22 Jan 2020 10:27:08 -0800 [thread overview]
Message-ID: <174cd3a0-43e5-d8bd-5cc3-d562f5727283@nvidia.com> (raw)
In-Reply-To: <20200122023222.75347-1-jglisse@redhat.com>
Adding: lsf-pc
On 1/21/20 6:32 PM, jglisse@redhat.com wrote:
> From: Jérôme Glisse <jglisse@redhat.com>
>
>
> Provide a generic way to write protect page (à la KSM) to enable new mm
> optimization:
> - KSM (kernel share memory) to deduplicate pages (for file
> back pages too not only anonymous memory like today)
> - page duplication NUMA (read only duplication) in multiple
> different physical page. For instance share library code
> having a copy on each NUMA node. Or in case like GPU/FPGA
> duplicating memory read only inside the local device memory.
> ...
>
> Note that this write protection is intend to be broken at anytime in
> reasonable time (like KSM today) so that we never block more than
> necessary anything that need to write to the page.
>
>
> The goal is to provide a mechanism that work for both anonymous and
> file back memory. For this we need to a pointer inside struct page.
> For anonymous memory KSM uses the anon_vma field which correspond
> to mapping field for file back pages.
>
> So to allow generic write protection for file back pages we need to
> avoid relying on struct page mapping field in the various kernel code
> path that do use it today.
>
> The page->mapping fields is use in 5 different ways:
> [1]- Functions operating on file, we can get the mapping from the file
> (issue here is that we might need to pass the file down the call-
> stack)
>
> [2]- Core/arch mm functions, those do not care about the file (if they
> do then it means they are vma related and we can get the mapping
> from the vma). Those functions only want to be able to walk all
> the pte point to the page (for instance memory compaction, memory
> reclaim, ...). We can provide the exact same functionality for
> write protected pages (like KSM does today).
>
> [3]- Block layer when I/O fails. This depends on fs, for instance for
> fs which uses buffer_head we can update buffer_head to store the
> mapping instead of the block_device as we can get the block_device
> from the mapping but not the mapping from the block_device.
>
> So solving this is mostly filesystem specific but i have not seen
> any fs that could not be updated properly so that block layer can
> report I/O failures without relying on page->mapping
>
> [4]- Debugging (mostly procfs/sysfs files to dump memory states). Those
> do not need the mapping per say, we just need to report page states
> (and thus write protection information if page is write protected).
>
> [5]- GUP (get user page) if something calls GUP in write mode then we
> need to break write protection (like KSM today). GUPed page should
> not be write protected as we do not know what the GUPers is doing
> with the page.
>
>
> Most of the patchset deals with [1], [2] and [3] ([4] and [5] are mostly
> trivial).
>
> For [1] we only need to pass down the mapping to all fs and vfs callback
> functions (this is mostly achieve with coccinelle). Roughly speaking the
> patches are generated with following pseudo code:
>
> add_mapping_parameter(func)
> {
> function_add_parameter(func, mapping);
>
> for_each_function_calling (caller, func) {
> calling_add_parameter(caller, func, mapping);
>
> if (function_parameters_contains(caller, mapping|file))
> continue;
>
> add_mapping_parameter(caller);
> }
> }
>
> passdown_mapping()
> {
> for_each_function_in_fs (func, fs_functions) {
> if (!function_body_contains(func, page->mapping))
> continue;
>
> if (function_parameters_contains(func, mapping|file))
> continue;
>
> add_mapping_parameter(func);
> }
> }
>
> For [2] KSM is generalized and extended so that both anonymous and file
> back pages can be handled by a common write protected page case.
>
> For [3] it depends on the filesystem (fs which uses buffer_head are
> easily handled by storing mapping into the buffer_head struct).
>
>
> To avoid any regression risks the page->mapping field is left intact as
> today for non write protect pages. This means that if you do not use the
> page write protection mechanism then it can not regress. This is achieve
> by using an helper function that take the mapping from the context
> (current function parameter, see above on how function are updated) and
> the struct page. If the page is not write protected then it uses the
> mapping from the struct page (just like today). The only difference
> between before and after the patchset is that all fs functions that do
> need the mapping for a page now also do get it as a parameter but only
> use the parameter mapping pointer if the page is write protected.
>
> Note also that i do not believe that once confidence is high that we
> always passdown the correct mapping down each callstack, it does not
> mean we will be able to get rid of the struct page mapping field.
>
> I posted patchset before [*1] and i intend to post an updated patchset
> before LSF/MM/BPF. I also talked about this at LSF/MM 2018. I still
> believe this will a topic that warrent a discussion with FS/MM and
> block device folks.
>
>
> [*1] https://lwn.net/Articles/751050/
> https://cgit.freedesktop.org/~glisse/linux/log/?h=generic-write-protection-rfc
> [*2] https://lwn.net/Articles/752564/
>
>
> To: lsf-pc@lists.linux-foundation.org
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: linux-fsdevel@vger.kernel.org
> Cc: linux-block@vger.kernel.org
> Cc: linux-mm@kvack.org
>
>
thanks,
--
John Hubbard
NVIDIA
prev parent reply other threads:[~2020-01-22 18:27 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-22 2:32 [LSF/MM/BPF " jglisse
2020-01-22 4:28 ` Gao Xiang
2020-01-22 5:21 ` Jerome Glisse
2020-01-22 5:52 ` Gao Xiang
2020-01-22 6:09 ` Jerome Glisse
2020-01-22 6:21 ` Gao Xiang
2020-01-22 4:41 ` John Hubbard
2020-01-22 18:27 ` John Hubbard [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=174cd3a0-43e5-d8bd-5cc3-d562f5727283@nvidia.com \
--to=jhubbard@nvidia.com \
--cc=aarcange@redhat.com \
--cc=jglisse@redhat.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lsf-pc@lists.linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox