From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E6CB0CA9EC2 for ; Tue, 29 Oct 2019 07:16:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9F21D20663 for ; Tue, 29 Oct 2019 07:16:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F21D20663 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=huawei.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 479C56B0005; Tue, 29 Oct 2019 03:16:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 403816B0006; Tue, 29 Oct 2019 03:16:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3195E6B0007; Tue, 29 Oct 2019 03:16:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0155.hostedemail.com [216.40.44.155]) by kanga.kvack.org (Postfix) with ESMTP id 07BB76B0005 for ; Tue, 29 Oct 2019 03:16:42 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 80562181AF5D3 for ; Tue, 29 Oct 2019 07:16:42 +0000 (UTC) X-FDA: 76095964644.01.range37_3a41521da7405 X-HE-Tag: range37_3a41521da7405 X-Filterd-Recvd-Size: 8178 Received: from huawei.com (szxga06-in.huawei.com [45.249.212.32]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Tue, 29 Oct 2019 07:16:40 +0000 (UTC) Received: from DGGEMS413-HUB.china.huawei.com (unknown [172.30.72.60]) by Forcepoint Email with ESMTP id 72464A91F7512952FBC6; Tue, 29 Oct 2019 15:16:35 +0800 (CST) Received: from [127.0.0.1] (10.133.219.218) by DGGEMS413-HUB.china.huawei.com (10.3.19.213) with Microsoft SMTP Server id 14.3.439.0; Tue, 29 Oct 2019 15:16:31 +0800 Message-ID: <5DB7E74E.6060502@huawei.com> Date: Tue, 29 Oct 2019 15:16:30 +0800 From: zhong jiang User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20120428 Thunderbird/12.0.1 MIME-Version: 1.0 To: Yang Shi CC: Andrew Morton , , "John Hubbard" , "Kirill A. Shutemov" , , Linux MM Subject: Re: [PATCH] mm: put the page into the correct list when shrink_page_list fails to reclaim. References: <1572269624-60283-1-git-send-email-zhongjiang@huawei.com> <5DB7A96B.8090104@huawei.com> In-Reply-To: Content-Type: text/plain; charset="UTF-8" X-Originating-IP: [10.133.219.218] X-CFilter-Loop: Reflected Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2019/10/29 12:12, Yang Shi wrote: > On Mon, Oct 28, 2019 at 7:52 PM zhong jiang wro= te: >> On 2019/10/29 2:47, Yang Shi wrote: >>> On Mon, Oct 28, 2019 at 6:37 AM zhong jiang w= rote: >>>> Recently, I notice an race case between mlock syscall and shrink_pag= e_list. >>>> >>>> one cpu run mlock syscall to make an range of the vma locked in memo= ry. And >>>> The specified pages will escaped from evictable list from unevictabl= e. >>>> Meanwhile, another cpu scan and isolate the specified pages to recla= im. >>>> shrink_page_list hold the page lock to shrink the page and follow_pa= ge_pte >>>> will fails to get the page lock, hence we fails to mlock the page to= make >>>> it Unevictabled. >>>> >>>> shrink_page_list fails to reclaim the page due to some reason. it wi= ll putback >>>> the page to evictable lru. But the page actually belongs to an locke= d range of >>>> the vma. it is unreasonable to do that. It is better to put the page= to unevictable >>>> lru. >>> Yes, there is definitely race between mlock() and vmscan, and in the >>> above case it might stay in evictable LRUs one more round, but it >>> should be not harmful since try_to_unmap() would move the page to >>> unevictable list eventually. >> The key is how to make sure try_to_unmap alway will be called before t= he page is freed. >> It is possibility page_mapped(page) is false due to some condition. > Is it a problem? The gup just needs to refault the page in. Hi, Yang if a page of the vma is not mapped , mlock will make sure it will refault= in the memory. But I mean that we focus on the page has been in evictable lru, Meanwhil= e mlock fails to move the page from evictable lru to unevictable lru. cpu 0 cpu 1 isolate_lru_pages = =20 (start .. end) pages exists in evictable lru.=20 mlock shrink_page_list =20 =20 lock_page(pag= e) =20 follow_page_pte = =20 ---> fails to hold pageloced = --=E3=80=8Bgoto out; =20 try_to_unmap = =20 return page. =20 move_pages_to_lru -= -> putback to evictable lru put_page && munmap --> page is unmapped and evictable lru. The page of vma became an clean page, hence we can free the page easily. = It is no need to try_to_unmap. I miss something ? :-) Thanks, zhong jiang =20 >> Thanks, >> zhong jiang >>>> The patch set PageMlocked when mlock fails to get the page locked. s= hrink_page_list >>>> fails to reclaim the page will putback to the correct list. if it su= ccess to reclaim >>>> the page, we should ClearPageMlocked in time to prevent the warning = from free_pages_prepare. >>>> >>>> Signed-off-by: zhong jiang >>>> --- >>>> mm/gup.c | 28 ++++++++++++++++++---------- >>>> mm/vmscan.c | 9 ++++++++- >>>> 2 files changed, 26 insertions(+), 11 deletions(-) >>>> >>>> diff --git a/mm/gup.c b/mm/gup.c >>>> index c2b3e11..c26d28c 100644 >>>> --- a/mm/gup.c >>>> +++ b/mm/gup.c >>>> @@ -283,16 +283,24 @@ static struct page *follow_page_pte(struct vm_= area_struct *vma, >>>> * handle it now - vmscan will handle it later if an= d >>>> * when it attempts to reclaim the page. >>>> */ >>>> - if (page->mapping && trylock_page(page)) { >>>> - lru_add_drain(); /* push cached pages to LR= U */ >>>> - /* >>>> - * Because we lock page here, and migration = is >>>> - * blocked by the pte's page reference, and = we >>>> - * know the page is still mapped, we don't e= ven >>>> - * need to check for file-cache page truncat= ion. >>>> - */ >>>> - mlock_vma_page(page); >>>> - unlock_page(page); >>>> + if (page->mapping) { >>>> + if (trylock_page(page)) { >>>> + lru_add_drain(); /* push cached pag= es to LRU */ >>>> + /* >>>> + * Because we lock page here, and mi= gration is >>>> + * blocked by the pte's page referen= ce, and we >>>> + * know the page is still mapped, we= don't even >>>> + * need to check for file-cache page= truncation. >>>> + */ >>>> + mlock_vma_page(page); >>>> + unlock_page(page); >>>> + } else { >>>> + /* >>>> + * Avoid putback the page to evictab= le list when >>>> + * the page is in the locked vma. >>>> + */ >>>> + SetPageMlocked(page); >>>> + } >>>> } >>>> } >>>> out: >>>> diff --git a/mm/vmscan.c b/mm/vmscan.c >>>> index 1154b3a..f7d1301 100644 >>>> --- a/mm/vmscan.c >>>> +++ b/mm/vmscan.c >>>> @@ -1488,8 +1488,15 @@ static unsigned long shrink_page_list(struct = list_head *page_list, >>>> */ >>>> if (unlikely(PageTransHuge(page))) >>>> (*get_compound_page_dtor(page))(page); >>>> - else >>>> + else { >>>> + /* >>>> + * There is an race between mlock and shrink= _page_list >>>> + * when mlock fails to get the PageLocked(). >>>> + */ >>>> + if (unlikely(PageMlocked(page))) >>>> + ClearPageMlocked(page); >>>> list_add(&page->lru, &free_pages); >>>> + } >>>> continue; >>>> >>>> activate_locked_split: >>>> -- >>>> 1.7.12.4 >>>> >>>> >>> . >>> >> > . >