From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8EF18C4338F for ; Sun, 8 Aug 2021 18:50:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 082EC60F92 for ; Sun, 8 Aug 2021 18:50:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 082EC60F92 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 46D566B0071; Sun, 8 Aug 2021 14:50:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 41DDB8D0002; Sun, 8 Aug 2021 14:50:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30C576B0074; Sun, 8 Aug 2021 14:50:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 153106B0071 for ; Sun, 8 Aug 2021 14:50:00 -0400 (EDT) Received: from forelay.prod.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by fograve05.hostedemail.com (Postfix) with ESMTP id 8E61418173960 for ; Sun, 8 Aug 2021 15:13:16 +0000 (UTC) Received: from smtpin37.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5C44A8249980 for ; Sun, 8 Aug 2021 15:13:06 +0000 (UTC) X-FDA: 78452256372.37.C74283D Received: from out30-130.freemail.mail.aliyun.com (out30-130.freemail.mail.aliyun.com [115.124.30.130]) by imf26.hostedemail.com (Postfix) with ESMTP id 60A6120189C9 for ; Sun, 8 Aug 2021 15:13:03 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R111e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=e01e04420;MF=baolin.wang@linux.alibaba.com;NM=1;PH=DS;RN=4;SR=0;TI=SMTPD_---0UiJ9buX_1628435578; Received: from 30.0.167.218(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0UiJ9buX_1628435578) by smtp.aliyun-inc.com(127.0.0.1); Sun, 08 Aug 2021 23:12:58 +0800 Subject: Re: [PATCH 1/5] mm: migrate: Move the page count validation to the proper place To: Matthew Wilcox Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <1f7e1d083864fbb17a20a9c8349d2e8b427e20a3.1628174413.git.baolin.wang@linux.alibaba.com> <36956352-246a-b3c2-3ade-2a6c22e2cd5a@linux.alibaba.com> From: Baolin Wang Message-ID: <4f25b4e9-0069-1749-32cf-d4644f13be4e@linux.alibaba.com> Date: Sun, 8 Aug 2021 23:13:28 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=gbk; format=flowed Content-Transfer-Encoding: 7bit Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.130 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 60A6120189C9 X-Stat-Signature: rjcgjzmb3zwqnw1jkfq5bt1hu1fdh18u X-HE-Tag: 1628435583-49690 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/8/8 18:26, Matthew Wilcox wrote: > On Sun, Aug 08, 2021 at 10:55:30AM +0800, Baolin Wang wrote: >> Hi, >> >>> On Fri, Aug 06, 2021 at 11:07:18AM +0800, Baolin Wang wrote: >>>> Hi Matthew, >>>> >>>>> On Thu, Aug 05, 2021 at 11:05:56PM +0800, Baolin Wang wrote: >>>>>> We've got the expected count for anonymous page or file page by >>>>>> expected_page_refs() at the beginning of migrate_page_move_mapping(), >>>>>> thus we should move the page count validation a little forward to >>>>>> reduce duplicated code. >>>>> >>>>> Please add an explanation to the changelog for why it's safe to pull >>>>> this out from under the i_pages lock. >>>> >>>> Sure. In folio_migrate_mapping(), we are sure that the migration page was >>>> isolated from lru list and locked, so I think there are no race to get the >>>> page count without i_pages lock. Please correct me if I missed something >>>> else. Thanks. >>> >>> Unless the page has been removed from i_pages, this isn't a correct >>> explanation. Even if it has been removed from i_pages, unless an >>> RCU grace period has passed, another CPU may still be able to inc the >>> refcount on it (temporarily). The same is true for the page tables, >>> by the way; if someone is using get_user_pages_fast(), they may still >>> be able to see the page. >> >> I don't think this is an issue, cause now we've established a migration pte >> for this migration page under page lock. If the user want to get page by >> get_user_pages_fast(), it will wait for the page miggration finished by >> migration_entry_wait(). So I still think there is no need to check the >> migration page count under the i_pages lock. > > I don't know whether the patch is correct or not, but you aren't nearly > paranoid enough. Consider this sequence of events: Thanks for describing this scenario. > > CPU 0: CPU 1: > get_user_pages_fast() > lockless_pages_from_mm() > local_irq_save() > gup_pgd_range() > gup_p4d_range() > gup_pud_range() > gup_pmd_range() > gup_pte_range() > pte_t pte = ptep_get_lockless(ptep); > migrate_vma_collect_pmd() > ptep = pte_offset_map_lock(mm, pmdp, addr, &ptl) > ptep_get_and_clear(mm, addr, ptep); > page = pte_page(pte); > set_pte_at(mm, addr, ptep, swp_pte); > migrate_page_move_mapping() > head = try_grab_compound_head(page, 1, flags); On CPU0, after grab the page count, it will validate the PTE again. If swap PTE has been established for this page, it will drop the count and go to the slow path. if (unlikely(pte_val(pte) != pte_val(*ptep))) { put_compound_head(head, 1, flags); goto pte_unmap; } So CPU1 can not observe the abnormal higher refcount in this case if I did not miss anything. > ... now page's refcount is temporarily higher than it should be. CPU 0 > will notice the PTE is no longer the PTE that it used to be and drop > the reference, but in the meantime, CPU 1 can observe the higher refcount. > > None of this has anything to do with the i_pages lock. Holding it does Yes, the i_pages lock can not guarantee anything related getting page count, so I think we can move this out of the i_pages lock. > not protect from this race, but you need to know this kind of thing to > decide if changing how we test a page's refcount is safe or not. Yes, I will continue to check if there are some races when validating the page count. Any suggestion are welcome.