From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 90142C433E7 for ; Wed, 2 Sep 2020 22:26:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3FD0D2071B for ; Wed, 2 Sep 2020 22:26:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="d69AuGUt" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3FD0D2071B Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 87F198E0005; Wed, 2 Sep 2020 18:26:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 830588E0001; Wed, 2 Sep 2020 18:26:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71E498E0005; Wed, 2 Sep 2020 18:26:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 58C768E0001 for ; Wed, 2 Sep 2020 18:26:53 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 233311EFF for ; Wed, 2 Sep 2020 22:26:53 +0000 (UTC) X-FDA: 77219557506.05.ray92_2d041ea270a4 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id F0D0518027A8E for ; Wed, 2 Sep 2020 22:26:52 +0000 (UTC) X-HE-Tag: ray92_2d041ea270a4 X-Filterd-Recvd-Size: 8007 Received: from mail-ej1-f66.google.com (mail-ej1-f66.google.com [209.85.218.66]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Wed, 2 Sep 2020 22:26:52 +0000 (UTC) Received: by mail-ej1-f66.google.com with SMTP id j11so987531ejk.0 for ; Wed, 02 Sep 2020 15:26:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=lFbEpXSwQ0gQwin3BbbJm3X4fcFvRS+F/Gcn21Qcgz0=; b=d69AuGUtlt4O9iYY+vi9Bt/Bi3OWipWDoohXc8yFVtZLCZLGgrtBHLVAa5hq83DJFT 6LdIiwxCbg6I5bRv2Jf5CBom5hkCXSsv1rg8zZnB5anHKmXIIillWruYoepP5X9SJ/UU 47saA5ohSPcEG+8ytoPvO8i22G2D2PO+5cCH2GQyQvwAOmDqbiJ1SJVSPQcLLPtitBB1 DcCSbuQlcr1hhiEwrSBzu/OlAA5G/p29v63KMNMG7l9cGpsvDtQeZQkwC/Gqz/P83b1R b3fa267e/639pVzoSPjoUVImt9pURE0DSIVBRRlaWQnv3DK6lDb4FF/3prx9j6muviG/ a3VA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=lFbEpXSwQ0gQwin3BbbJm3X4fcFvRS+F/Gcn21Qcgz0=; b=FKvnJHeTgdbNBVsskP7fHFu/PwmV0SucGC/ZjVCGZFvNTryAskKoM+3Rd0s0Fzgo9e iFL84utzKSx2WBSCLDSO9gNdHGeXbzSIESwwT3Sw0TLjFPLTu2eVQi0iOTE3IMiaS9/b 2eaWq0SfGcPRLNxitAtRexDwwrcRHSPUyUnccw6V0Ql+sVN4lViIZP67hcehTFxscnr3 hMGp+bayHdtTS8uow9KKUEEkHXUGjrM0NkW8LFACb/yYVSv9FrW2g6S+8n6Lr+2c/D3o XbF2vV21L/xepTRATqBd0jK1HZg8syqXzIxhtvGsLhJXLRleLt/fjG89ubBUfXCmxUkS s7LQ== X-Gm-Message-State: AOAM531PSOMltwAgIaY+NRDUuCNWkbNGWGVTUBdEvSDDq81ZcZoZRIYG 0dJZ7n4y5J6lc/Rmror0xs79KqnnU4kY2OCICig= X-Google-Smtp-Source: ABdhPJypzWoqt0InlS0Nq3PYgvaY0SvE4INVqwUep9/FS9e0EZYzF8Fn+8S7FI/0SjVZNjygvmWpsLoLmfBL4tqEei4= X-Received: by 2002:a17:906:a1d8:: with SMTP id bx24mr281865ejb.161.1599085611213; Wed, 02 Sep 2020 15:26:51 -0700 (PDT) MIME-Version: 1.0 References: <1578737885-8890-1-git-send-email-lixinhai.lxh@gmail.com> <20200114092545.GF19428@dhcp22.suse.cz> <202001142147485028116@gmail.com> <20200115093148.GZ19428@dhcp22.suse.cz> <986d0f55-ae54-f3af-0d50-3e3b6621a863@oracle.com> <20200116102944.GQ19428@dhcp22.suse.cz> In-Reply-To: <20200116102944.GQ19428@dhcp22.suse.cz> From: Yang Shi Date: Wed, 2 Sep 2020 15:26:39 -0700 Message-ID: Subject: Re: [PATCH v4] mm/page_vma_mapped.c: Explicitly compare pfn for normal, hugetlbfs and THP page To: Michal Hocko Cc: Mike Kravetz , Li Xinhai , "linux-mm@kvack.org" , akpm , "kirill.shutemov" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: F0D0518027A8E X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 16, 2020 at 2:29 AM Michal Hocko wrote: > > On Wed 15-01-20 16:40:26, Mike Kravetz wrote: > > On 1/15/20 1:31 AM, Michal Hocko wrote: > > > On Tue 14-01-20 21:47:51, Li Xinhai wrote: > > >> On 2020-01-14 at 17:25 Michal Hocko wrote: > > >>> On Sat 11-01-20 10:18:05, Li Xinhai wrote: > > >>>> When check_pte, pfn of normal, hugetlbfs and THP page need be compared. > > >>>> The current implementation apply comparison as > > >>>> - normal 4K page: page_pfn <= pfn < page_pfn + 1 > > >>>> - hugetlbfs page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR > > >>>> - THP page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR > > >>>> in pfn_in_hpage. For hugetlbfs page, it should be > > >>>> page_pfn == pfn > > >>>> > > >>>> Now, change pfn_in_hpage to pfn_is_match to highlight that comparison > > >>>> is not only for THP and explicitly compare for these cases. > > >>> > > >>> Why is this important to do. I have asked and Mike had the same feeling > > >>> that the patch is missing any real justification. Why do we need this > > >>> change? It is great that you have dropped VM_BUG_ON btw. > > >>> > > >> I think it is important to make the code clear, as said, comparing hugetlbfs page > > >> in range page_pfn <= pfn < page_pfn + HPAGE_PMD_NR is confusion. > > > > > > I am sorry but I do not really see a big confusion here. It is probably > > > a matter of taste. If others consider this an improvement then I will > > > not stand in the way but I consider the justification insuficient for > > > merging. > > > > Perhaps I am confused, but the patch does update the check done for > > hugetlb pages. IIUC, previously there was no distinction between THP > > pages and hugetlb pages in the check. It is valid to pass in a sub- > > page of a THP page, but not that of a hugetlb page. > > > > I do not believe it is possible for existing code to pass in a sub-page > > of a hugetlb page. And, no one has ever seen this as an issue. This > > is why I was questioning the need for the patch. > > Exactly and that is the reason I fail the see a point. > > > With all that said, the new code/check is better (more complete) than > > the original. It may not do anything for the current code base, but > > it 'could' catch potential errors in future code. Because of this, I > > do consider the code an improvement. > > It adds a branch for something that doesn't happen and also to a code > path which is quite internal to the MM AFAICS. That being said, if you > believe this is an improvement then I will not stand in the way. But > there are so many other places which could add checks that are not > exercised... I just saw a bad page bug on one our host with 4.14 kernel: backtrace: :BUG: Bad page map in process python3.6 pte:1036e24025 pmd:1dd743d067 :page:fffff62840db8900 count:105 mapcount:-35 mapping:ffff97d432e97ad0 index:0x1 :flags: 0xbfffe000001006c(referenced|uptodate|lru|active|mappedtodisk) :raw: 0bfffe000001006c ffff97d432e97ad0 0000000000000001 00000069ffffffdc :raw: dead000000000100 dead000000000200 0000000000000000 ffff97c58fc0e000 :page dumped because: bad pte :page->mem_cgroup:ffff97c58fc0e000 :addr:00007f2ddffcc000 vm_flags:00000075 anon_vma: (null) mapping:ffff97d432e97ad0 index:7f :file:libc-2.17.so fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage:xfs_vm_readpage [xfs] :do_exit+0x563/0xbb0 :? vfs_write+0x162/0x1a0 :do_group_exit+0x3a/0xa0 :file:libc-2.17.so fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap [xfs] readpage:xfs_vm_readpage [xfs] :SyS_exit_group+0x10/0x10 :do_syscall_64+0x60/0x110 :entry_SYSCALL_64_after_hwframe+0x3d/0xa2 :RIP: 0033:0x7fb2504091d9 :RSP: 002b:00007ffcf035db88 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 :RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb2504091d9 :RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 :RBP: 00007fb250706838 R08: 000000000000003c R09: 00000000000000e7 :R10: ffffffffffffff70 R11: 0000000000000246 R12: 00007fb250706838 :R13: 00007fb25070be80 R14: 0000000000000000 R15: 0000000000000000 :CPU: 42 PID: 454567 Comm: python3.6 Tainted: G B W 4.14.98-t5.el7.twitter.x86_64 #1 I just saw it once and it didn't happen on the newer kernel (maybe just not happen yet). I can't tell why the mapcount could reach -35 since all page_remove_rmap() is protected by page table lock. Then I looked into pvmw code, and suspected the PTE might be changed after acquiring ptl. With the old check it was fine as long as "page_pfn <= pfn < page_pfn + 1", but for the base page case, it means we might be dec'ing mapcount for another page. Then I came across this commit. I guess this should be able to solve the problem, but unfortunately the problem is very rare so basically I can't prove this commit could solve it. If you agree my analysis, we may consider to backport this to stable tree. > > Acked-by: Mike Kravetz > > -- > Michal Hocko > SUSE Labs >