From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-qk0-f198.google.com (mail-qk0-f198.google.com [209.85.220.198]) by kanga.kvack.org (Postfix) with ESMTP id 560216B0282 for ; Mon, 2 Jul 2018 17:08:47 -0400 (EDT) Received: by mail-qk0-f198.google.com with SMTP id j189-v6so18101227qkf.0 for ; Mon, 02 Jul 2018 14:08:47 -0700 (PDT) Received: from hqemgate16.nvidia.com (hqemgate16.nvidia.com. [216.228.121.65]) by mx.google.com with ESMTPS id f8-v6si10457112qtj.168.2018.07.02.14.08.46 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Mon, 02 Jul 2018 14:08:46 -0700 (PDT) Subject: Re: [PATCH v2 6/6] mm: page_mkclean, ttu: handle pinned pages References: <20180702005654.20369-1-jhubbard@nvidia.com> <20180702005654.20369-7-jhubbard@nvidia.com> <20180702101542.fi7ndfkg5fpzodey@quack2.suse.cz> From: John Hubbard Message-ID: Date: Mon, 2 Jul 2018 14:07:44 -0700 MIME-Version: 1.0 In-Reply-To: <20180702101542.fi7ndfkg5fpzodey@quack2.suse.cz> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: owner-linux-mm@kvack.org List-ID: To: Jan Kara , john.hubbard@gmail.com Cc: Matthew Wilcox , Michal Hocko , Christopher Lameter , Jason Gunthorpe , Dan Williams , linux-mm@kvack.org, LKML , linux-rdma , linux-fsdevel@vger.kernel.org On 07/02/2018 03:15 AM, Jan Kara wrote: > On Sun 01-07-18 17:56:54, john.hubbard@gmail.com wrote: >> diff --git a/mm/memory-failure.c b/mm/memory-failure.c >> index 9d142b9b86dc..c4bc8d216746 100644 >> --- a/mm/memory-failure.c >> +++ b/mm/memory-failure.c >> @@ -931,6 +931,7 @@ static bool hwpoison_user_mappings(struct page *p, unsigned long pfn, >> int kill = 1, forcekill; >> struct page *hpage = *hpagep; >> bool mlocked = PageMlocked(hpage); >> + bool skip_pinned_pages = false; > > I'm not sure we can afford to wait for page pins when handling page > poisoning. In an ideal world we should but... But I guess this is for > someone understanding memory poisoning better to judge. OK, then until I hear otherwise, in the next version I'll set skipped_pinned_pages = true here, based on the idea that it's probably better to be sure we don't hang while trying to remove a bad page. It's hard to achieve perfection in the presence of a memory failure. > >> diff --git a/mm/rmap.c b/mm/rmap.c >> index 6db729dc4c50..c137c43eb2ad 100644 >> --- a/mm/rmap.c >> +++ b/mm/rmap.c >> @@ -879,6 +879,26 @@ int page_referenced(struct page *page, >> return pra.referenced; >> } >> >> +/* Must be called with pinned_dma_lock held. */ >> +static void wait_for_dma_pinned_to_clear(struct page *page) >> +{ >> + struct zone *zone = page_zone(page); >> + >> + while (PageDmaPinnedFlags(page)) { >> + spin_unlock(zone_gup_lock(zone)); >> + >> + schedule(); >> + >> + spin_lock(zone_gup_lock(zone)); >> + } >> +} > > Ouch, we definitely need something better here. Either reuse the > page_waitqueue() mechanism or create at least a global wait queue for this > (I don't expect too much contention on the waitqueue and even if there > eventually is, we can switch to page_waitqueue() when we find it). But > this is a no-go... Yes, no problem. At one point I had a separate bit waiting queue, which was only a few lines of code to do, but I dropped it because I thought that maybe it was overkill. I'll put it back in. > >> + >> +struct page_mkclean_info { >> + int cleaned; >> + int skipped; >> + bool skip_pinned_pages; >> +}; >> + >> static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma, >> unsigned long address, void *arg) >> { >> @@ -889,7 +909,24 @@ static bool page_mkclean_one(struct page *page, struct vm_area_struct *vma, >> .flags = PVMW_SYNC, >> }; >> unsigned long start = address, end; >> - int *cleaned = arg; >> + struct page_mkclean_info *mki = (struct page_mkclean_info *)arg; >> + bool is_dma_pinned; >> + struct zone *zone = page_zone(page); >> + >> + /* Serialize with get_user_pages: */ >> + spin_lock(zone_gup_lock(zone)); >> + is_dma_pinned = PageDmaPinned(page); > > Hum, why do you do this for each page table this is mapped in? Also the > locking is IMHO going to hurt a lot and we need to avoid it. > > What I think needs to happen is that in page_mkclean(), after you've > cleared all the page tables, you check PageDmaPinned() and wait if needed. > Page cannot be faulted in again as we hold page lock and so races with > concurrent GUP are fairly limited. So with some careful ordering & memory > barriers you should be able to get away without any locking. Ditto for the > unmap path... > I guess I was thinking about this backwards. It would work much better if we go ahead and write protect or unmap first, let things drain, and wait later. Very nice! thanks, -- John Hubbard NVIDIA