From: John Hubbard <jhubbard@nvidia.com>
To: Jan Kara <jack@suse.cz>, john.hubbard@gmail.com
Cc: Matthew Wilcox <willy@infradead.org>,
Michal Hocko <mhocko@kernel.org>,
Christopher Lameter <cl@linux.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
Dan Williams <dan.j.williams@intel.com>,
linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
linux-rdma <linux-rdma@vger.kernel.org>,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v2 6/6] mm: page_mkclean, ttu: handle pinned pages
Date: Mon, 2 Jul 2018 14:07:44 -0700 [thread overview]
Message-ID: <b64bda3d-903d-c3b9-f315-bf7a7302e425@nvidia.com> (raw)
In-Reply-To: <20180702101542.fi7ndfkg5fpzodey@quack2.suse.cz>
On 07/02/2018 03:15 AM, Jan Kara wrote:
> On Sun 01-07-18 17:56:54, john.hubbard@gmail.com wrote:
>> diff --git a/mm/memory-failure.c b/mm/memory-failure.c
>> index 9d142b9b86dc..c4bc8d216746 100644
>> --- a/mm/memory-failure.c
>> +++ b/mm/memory-failure.c
>> @@ -931,6 +931,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn,
>> int kill = 1, forcekill;
>> struct page *hpage = *hpagep;
>> bool mlocked = PageMlocked(hpage);
>> + bool skip_pinned_pages = false;
>
> I'm not sure we can afford to wait for page pins when handling page
> poisoning. In an ideal world we should but... But I guess this is for
> someone understanding memory poisoning better to judge.
OK, then until I hear otherwise, in the next version I'll set
skipped_pinned_pages = true here, based on the idea that it's probably
better to be sure we don't hang while trying to remove a bad page. It's
hard to achieve perfection in the presence of a memory failure.
>
>> diff --git a/mm/rmap.c b/mm/rmap.c
>> index 6db729dc4c50..c137c43eb2ad 100644
>> --- a/mm/rmap.c
>> +++ b/mm/rmap.c
>> @@ -879,6 +879,26 @@ int page_referenced(struct page *page,
>> return pra.referenced;
>> }
>>
>> +/* Must be called with pinned_dma_lock held. */
>> +static void wait_for_dma_pinned_to_clear(struct page *page)
>> +{
>> + struct zone *zone = page_zone(page);
>> +
>> + while (PageDmaPinnedFlags(page)) {
>> + spin_unlock(zone_gup_lock(zone));
>> +
>> + schedule();
>> +
>> + spin_lock(zone_gup_lock(zone));
>> + }
>> +}
>
> Ouch, we definitely need something better here. Either reuse the
> page_waitqueue() mechanism or create at least a global wait queue for this
> (I don't expect too much contention on the waitqueue and even if there
> eventually is, we can switch to page_waitqueue() when we find it). But
> this is a no-go...
Yes, no problem. At one point I had a separate bit waiting queue, which was
only a few lines of code to do, but I dropped it because I thought that maybe
it was overkill. I'll put it back in.
>
>> +
>> +struct page_mkclean_info {
>> + int cleaned;
>> + int skipped;
>> + bool skip_pinned_pages;
>> +};
>> +
>> static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
>> unsigned long address, void *arg)
>> {
>> @@ -889,7 +909,24 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma,
>> .flags = PVMW_SYNC,
>> };
>> unsigned long start = address, end;
>> - int *cleaned = arg;
>> + struct page_mkclean_info *mki = (struct page_mkclean_info *)arg;
>> + bool is_dma_pinned;
>> + struct zone *zone = page_zone(page);
>> +
>> + /* Serialize with get_user_pages: */
>> + spin_lock(zone_gup_lock(zone));
>> + is_dma_pinned = PageDmaPinned(page);
>
> Hum, why do you do this for each page table this is mapped in? Also the
> locking is IMHO going to hurt a lot and we need to avoid it.
>
> What I think needs to happen is that in page_mkclean(), after you've
> cleared all the page tables, you check PageDmaPinned() and wait if needed.
> Page cannot be faulted in again as we hold page lock and so races with
> concurrent GUP are fairly limited. So with some careful ordering & memory
> barriers you should be able to get away without any locking. Ditto for the
> unmap path...
>
I guess I was thinking about this backwards. It would work much better if
we go ahead and write protect or unmap first, let things drain, and wait later.
Very nice!
thanks,
--
John Hubbard
NVIDIA
next prev parent reply other threads:[~2018-07-02 21:08 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-02 0:56 [PATCH v2 0/6] mm/fs: gup: don't unmap or drop filesystem buffers john.hubbard
2018-07-02 0:56 ` [PATCH v2 1/6] mm: get_user_pages: consolidate error handling john.hubbard
2018-07-02 10:17 ` Jan Kara
2018-07-02 21:34 ` John Hubbard
2018-07-02 0:56 ` [PATCH v2 2/6] mm: introduce page->dma_pinned_flags, _count john.hubbard
2018-07-02 0:56 ` [PATCH v2 3/6] mm: introduce zone_gup_lock, for dma-pinned pages john.hubbard
2018-07-02 0:56 ` [PATCH v2 4/6] mm/fs: add a sync_mode param for clear_page_dirty_for_io() john.hubbard
2018-07-02 2:11 ` kbuild test robot
2018-07-02 4:40 ` John Hubbard
2018-07-02 2:47 ` kbuild test robot
2018-07-02 4:40 ` John Hubbard
2018-07-02 0:56 ` [PATCH v2 5/6] mm: track gup pages with page->dma_pinned_* fields john.hubbard
2018-07-02 2:11 ` kbuild test robot
2018-07-02 2:58 ` kbuild test robot
2018-07-02 5:05 ` John Hubbard
2018-07-02 9:53 ` Jan Kara
2018-07-02 20:43 ` John Hubbard
2018-07-03 0:08 ` Christopher Lameter
2018-07-03 4:30 ` John Hubbard
2018-07-03 17:08 ` Christopher Lameter
2018-07-03 17:36 ` John Hubbard
2018-07-03 17:48 ` Christopher Lameter
2018-07-03 18:48 ` John Hubbard
2018-07-04 10:43 ` Jan Kara
2018-07-05 14:17 ` Christopher Lameter
2018-07-09 13:49 ` Jan Kara
2018-07-02 0:56 ` [PATCH v2 6/6] mm: page_mkclean, ttu: handle pinned pages john.hubbard
2018-07-02 10:15 ` Jan Kara
2018-07-02 21:07 ` John Hubbard [this message]
2018-07-02 5:54 ` [PATCH v2 0/6] mm/fs: gup: don't unmap or drop filesystem buffers John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b64bda3d-903d-c3b9-f315-bf7a7302e425@nvidia.com \
--to=jhubbard@nvidia.com \
--cc=cl@linux.com \
--cc=dan.j.williams@intel.com \
--cc=jack@suse.cz \
--cc=jgg@ziepe.ca \
--cc=john.hubbard@gmail.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mhocko@kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox