From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 99DA2CA9EAE for ; Tue, 29 Oct 2019 17:13:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 39E88204FD for ; Tue, 29 Oct 2019 17:13:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="rW6w3jhr" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 39E88204FD Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D8DCF6B0003; Tue, 29 Oct 2019 13:13:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D3F886B0008; Tue, 29 Oct 2019 13:13:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C07366B000A; Tue, 29 Oct 2019 13:13:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0141.hostedemail.com [216.40.44.141]) by kanga.kvack.org (Postfix) with ESMTP id A18D86B0003 for ; Tue, 29 Oct 2019 13:13:21 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with SMTP id 3B30F2473 for ; Tue, 29 Oct 2019 17:13:21 +0000 (UTC) X-FDA: 76097468202.27.songs90_1d324e97d791a X-HE-Tag: songs90_1d324e97d791a X-Filterd-Recvd-Size: 10077 Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) by imf09.hostedemail.com (Postfix) with ESMTP for ; Tue, 29 Oct 2019 17:13:20 +0000 (UTC) Received: by mail-ed1-f66.google.com with SMTP id l25so11351952edt.6 for ; Tue, 29 Oct 2019 10:13:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=rAvxRniH/e1DGRMtGC628pCOZ7WbiS6h8AAApjY40mk=; b=rW6w3jhrlXNu/59TL/TQFnmwO4R/1p367tzGUOLSj3DcL5xfJm+1P1uwxWhMPs7t4i ZVMizf2GFmWuEywV2vPUA8Ilii/JgYyEF4X0wk349RW38hBHoZ93FJsZ2xjWh/7/8lJ6 eYDn3W2IqAyBBgadWGoFgCS/iKO4gANjdNi5PzcMm71ekZz0Mq93XTebuH+7l52j3qPr OcdHU3VG0lDqW5HUymRC5PYRi67HDJag3H+8DhpYfrZu3G508TppIJXzw80us+y5vch+ cpWZGLkTir/5bB0sBasMJrYYQaAwlIZwYhAo0vUv+0RP4FYZO/L4i7Z+NI7+KvyY7LKx Su+A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=rAvxRniH/e1DGRMtGC628pCOZ7WbiS6h8AAApjY40mk=; b=K6zNzZmwTApsN4hj6EPzlAjvnferjjvl9Pk38u8jyGiXVMmaKwbuVTnG7I8vPPMZxf MhSZUy8hCzqGRh8wFXNc61HcLxc5pUlB05C6Mo0o+1jiNtyDOgUp5/gNR4dz4ZKqQDxG bOmAUqnJiXm5n1/Z6kb0OBfa5Daapez0m2AuhxsPZ4hzaNzd97YcNE1k7xDQmRvJ3dR1 ISyDLG19C+ESmZciuvXmq/nWmvElQ2vBReMmXSY65XNb5aiVAJgBXgneSQSMQvuulPbY 1Odd7XcQo9uaSwKVnN9/UoBu3C0dnJCre78nC13o7aWhrHAZwUIEIglM0a2uKbSfe7Ix fI8g== X-Gm-Message-State: APjAAAUmr65ht+Ho+IV9xPhohZTU/BTQy9HIpgBVO6TonTfRHHc8SuHR N9eohngvZBlJUGpTf48lnyBzcOP8Blv4t2bkPSI= X-Google-Smtp-Source: APXvYqygA9ANV4jWOvqLdMoi0n7oBOmIHV6GAYttekFscKTqfC0DW3caqazarrs+jkMHv9R0yMt5UUWX86OxcNZ28JE= X-Received: by 2002:a50:8a88:: with SMTP id j8mr27686161edj.35.1572369199278; Tue, 29 Oct 2019 10:13:19 -0700 (PDT) MIME-Version: 1.0 References: <1572269624-60283-1-git-send-email-zhongjiang@huawei.com> <5DB7A96B.8090104@huawei.com> <5DB7E74E.6060502@huawei.com> In-Reply-To: <5DB7E74E.6060502@huawei.com> From: Yang Shi Date: Tue, 29 Oct 2019 10:13:06 -0700 Message-ID: Subject: Re: [PATCH] mm: put the page into the correct list when shrink_page_list fails to reclaim. To: zhong jiang Cc: Andrew Morton , ira.weiny@intel.com, John Hubbard , "Kirill A. Shutemov" , rppt@linux.ibm.com, Linux MM Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 29, 2019 at 12:16 AM zhong jiang wrote: > > On 2019/10/29 12:12, Yang Shi wrote: > > On Mon, Oct 28, 2019 at 7:52 PM zhong jiang wro= te: > >> On 2019/10/29 2:47, Yang Shi wrote: > >>> On Mon, Oct 28, 2019 at 6:37 AM zhong jiang w= rote: > >>>> Recently, I notice an race case between mlock syscall and shrink_pag= e_list. > >>>> > >>>> one cpu run mlock syscall to make an range of the vma locked in memo= ry. And > >>>> The specified pages will escaped from evictable list from unevictabl= e. > >>>> Meanwhile, another cpu scan and isolate the specified pages to recla= im. > >>>> shrink_page_list hold the page lock to shrink the page and follow_pa= ge_pte > >>>> will fails to get the page lock, hence we fails to mlock the page to= make > >>>> it Unevictabled. > >>>> > >>>> shrink_page_list fails to reclaim the page due to some reason. it wi= ll putback > >>>> the page to evictable lru. But the page actually belongs to an locke= d range of > >>>> the vma. it is unreasonable to do that. It is better to put the page= to unevictable > >>>> lru. > >>> Yes, there is definitely race between mlock() and vmscan, and in the > >>> above case it might stay in evictable LRUs one more round, but it > >>> should be not harmful since try_to_unmap() would move the page to > >>> unevictable list eventually. > >> The key is how to make sure try_to_unmap alway will be called before t= he page is freed. > >> It is possibility page_mapped(page) is false due to some condition. > > Is it a problem? The gup just needs to refault the page in. > Hi, Yang > > if a page of the vma is not mapped , mlock will make sure it will refault= in the memory. > > But I mean that we focus on the page has been in evictable lru, Meanwhil= e mlock fails to > move the page from evictable lru to unevictable lru. > > cpu 0 cpu 1 > isolate_lru_pages > (start .. end) pages exists in evictable lru. > mlock shrink_page_list > > lock_page(pag= e) > follow_page_pte > ---> fails to hold pageloced = --=E3=80=8Bgoto out; > try_to_unmap > return page. If gup still can return legitimate page, I'm supposed it means gup happens before try_to_unmap(). If so try_to_unmap() would see the VMA is VM_LOCKED, then it should just set Mlocked flag for the page, then move_pages_to_lru() would put the page to unevictable lru instead of evictable lru. > move_pages_to_lru -= -> putback to evictable lru > put_page && munmap --> page is unmapped and evictable lru. > > > The page of vma became an clean page, hence we can free the page easily. = It is no need to try_to_unmap. > > I miss something ? :-) > > Thanks, > zhong jiang > >> Thanks, > >> zhong jiang > >>>> The patch set PageMlocked when mlock fails to get the page locked. s= hrink_page_list > >>>> fails to reclaim the page will putback to the correct list. if it su= ccess to reclaim > >>>> the page, we should ClearPageMlocked in time to prevent the warning = from free_pages_prepare. > >>>> > >>>> Signed-off-by: zhong jiang > >>>> --- > >>>> mm/gup.c | 28 ++++++++++++++++++---------- > >>>> mm/vmscan.c | 9 ++++++++- > >>>> 2 files changed, 26 insertions(+), 11 deletions(-) > >>>> > >>>> diff --git a/mm/gup.c b/mm/gup.c > >>>> index c2b3e11..c26d28c 100644 > >>>> --- a/mm/gup.c > >>>> +++ b/mm/gup.c > >>>> @@ -283,16 +283,24 @@ static struct page *follow_page_pte(struct vm_= area_struct *vma, > >>>> * handle it now - vmscan will handle it later if an= d > >>>> * when it attempts to reclaim the page. > >>>> */ > >>>> - if (page->mapping && trylock_page(page)) { > >>>> - lru_add_drain(); /* push cached pages to LR= U */ > >>>> - /* > >>>> - * Because we lock page here, and migration = is > >>>> - * blocked by the pte's page reference, and = we > >>>> - * know the page is still mapped, we don't e= ven > >>>> - * need to check for file-cache page truncat= ion. > >>>> - */ > >>>> - mlock_vma_page(page); > >>>> - unlock_page(page); > >>>> + if (page->mapping) { > >>>> + if (trylock_page(page)) { > >>>> + lru_add_drain(); /* push cached pag= es to LRU */ > >>>> + /* > >>>> + * Because we lock page here, and mi= gration is > >>>> + * blocked by the pte's page referen= ce, and we > >>>> + * know the page is still mapped, we= don't even > >>>> + * need to check for file-cache page= truncation. > >>>> + */ > >>>> + mlock_vma_page(page); > >>>> + unlock_page(page); > >>>> + } else { > >>>> + /* > >>>> + * Avoid putback the page to evictab= le list when > >>>> + * the page is in the locked vma. > >>>> + */ > >>>> + SetPageMlocked(page); > >>>> + } > >>>> } > >>>> } > >>>> out: > >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c > >>>> index 1154b3a..f7d1301 100644 > >>>> --- a/mm/vmscan.c > >>>> +++ b/mm/vmscan.c > >>>> @@ -1488,8 +1488,15 @@ static unsigned long shrink_page_list(struct = list_head *page_list, > >>>> */ > >>>> if (unlikely(PageTransHuge(page))) > >>>> (*get_compound_page_dtor(page))(page); > >>>> - else > >>>> + else { > >>>> + /* > >>>> + * There is an race between mlock and shrink= _page_list > >>>> + * when mlock fails to get the PageLocked(). > >>>> + */ > >>>> + if (unlikely(PageMlocked(page))) > >>>> + ClearPageMlocked(page); > >>>> list_add(&page->lru, &free_pages); > >>>> + } > >>>> continue; > >>>> > >>>> activate_locked_split: > >>>> -- > >>>> 1.7.12.4 > >>>> > >>>> > >>> . > >>> > >> > > . > > > > >