From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0192DC433E9 for ; Thu, 3 Sep 2020 16:08:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 94AC220716 for ; Thu, 3 Sep 2020 16:08:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="PbvTaDwq" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 94AC220716 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 07E406B005C; Thu, 3 Sep 2020 12:08:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 02DBE6B005D; Thu, 3 Sep 2020 12:08:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E5FC76B0062; Thu, 3 Sep 2020 12:08:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0099.hostedemail.com [216.40.44.99]) by kanga.kvack.org (Postfix) with ESMTP id CAD216B005C for ; Thu, 3 Sep 2020 12:08:30 -0400 (EDT) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 78C51824556B for ; Thu, 3 Sep 2020 16:08:30 +0000 (UTC) X-FDA: 77222232780.08.drop69_5614767270aa Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 43C491819E766 for ; Thu, 3 Sep 2020 16:08:30 +0000 (UTC) X-HE-Tag: drop69_5614767270aa X-Filterd-Recvd-Size: 9689 Received: from mail-ed1-f66.google.com (mail-ed1-f66.google.com [209.85.208.66]) by imf41.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Sep 2020 16:08:29 +0000 (UTC) Received: by mail-ed1-f66.google.com with SMTP id c10so3224562edk.6 for ; Thu, 03 Sep 2020 09:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=m2/cqmpFzJwiCZjZYmNIBw3im8kd+fSo3OYLCzsFW40=; b=PbvTaDwqdWR+Fr9MoFkvQqC3bwJyTgyM2trNJ537c4F+kYmJHdb550aQuhJHIJp998 gbSm7Lx6ZKbydLJ+9DT6zCMh5l68/oigGm1bjanhqy1xP/WXXGs0LJUd5bYEn/7SnWA/ 16vXC/yz3rS4IVOOiYpxIwLzcPwgKdpburPyrJfpSraEfjklbhabU6EztnpstQaYC5dl +F7PLwet12oFYHMmAY2gY0HUqfa1g/frRFa++wWHXcY/dtPp8JclGC2tscABhE5z2rc3 GWbFnd1B0A+INXcDTQ9rE6MVzkfjnuhSZSAr0ldBCj4tCfJ1S0Ms0zTadRn2HFVNIicI OdaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=m2/cqmpFzJwiCZjZYmNIBw3im8kd+fSo3OYLCzsFW40=; b=qisQqKdPyxCtPvrvp/59s/9yZ6tw+wTSDturuXhz2fH3GjwEEVYu4TpiuZF5gRnFzN o07KxOE1hmsKBCIz2fy44vxVue9Hn3r/QLIhL8Zx1en6jNlv4WDJHo3u73Hr6d5AFImB 3YdNBTRde0CVqTXKXib5ddyO2jxOQOZlsUuD303+ncwcRUosOJe2NJvJVATBz3Z8841k XbbuzhKwzAIMa6yDujRrXjYyqCewF11tLHtiXbN/ibn6hAaPbQFx6dtpUu2U+OPrXdNQ ulK32x5g7S9iLiEJMBvFzxX/iOcqUu7R+53Xu9XUHG6qbIQIJvOyJKrM/VCMgw7Rq4u0 NTeg== X-Gm-Message-State: AOAM533Siutt04iAUxU7yjBH+nx/8bB9IiHSQR4ru83P03BYiCQ18S8M Az6RW8SC20xUxZ9kwbcI3cnK26SE7ww1WcA/0WU= X-Google-Smtp-Source: ABdhPJwr5QcGXJiKQl68b2Jy64A3LOdUcqXyAMD5OHseM0M7MvTyIytBlF96zA5lCibDQ/cVgCRDxuMnQJ7Bl6R+YNE= X-Received: by 2002:aa7:dc05:: with SMTP id b5mr3990374edu.137.1599149308285; Thu, 03 Sep 2020 09:08:28 -0700 (PDT) MIME-Version: 1.0 References: <1578737885-8890-1-git-send-email-lixinhai.lxh@gmail.com> <20200114092545.GF19428@dhcp22.suse.cz> <202001142147485028116@gmail.com> <20200115093148.GZ19428@dhcp22.suse.cz> <986d0f55-ae54-f3af-0d50-3e3b6621a863@oracle.com> <20200116102944.GQ19428@dhcp22.suse.cz> <20200903080231.GQ4617@dhcp22.suse.cz> In-Reply-To: <20200903080231.GQ4617@dhcp22.suse.cz> From: Yang Shi Date: Thu, 3 Sep 2020 09:08:15 -0700 Message-ID: Subject: Re: [PATCH v4] mm/page_vma_mapped.c: Explicitly compare pfn for normal, hugetlbfs and THP page To: Michal Hocko Cc: Mike Kravetz , Li Xinhai , "linux-mm@kvack.org" , akpm , "kirill.shutemov" Content-Type: text/plain; charset="UTF-8" X-Rspamd-Queue-Id: 43C491819E766 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000031, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 3, 2020 at 1:02 AM Michal Hocko wrote: > > On Wed 02-09-20 15:26:39, Yang Shi wrote: > > On Thu, Jan 16, 2020 at 2:29 AM Michal Hocko wrote: > > > > > > On Wed 15-01-20 16:40:26, Mike Kravetz wrote: > > > > On 1/15/20 1:31 AM, Michal Hocko wrote: > > > > > On Tue 14-01-20 21:47:51, Li Xinhai wrote: > > > > >> On 2020-01-14 at 17:25 Michal Hocko wrote: > > > > >>> On Sat 11-01-20 10:18:05, Li Xinhai wrote: > > > > >>>> When check_pte, pfn of normal, hugetlbfs and THP page need be compared. > > > > >>>> The current implementation apply comparison as > > > > >>>> - normal 4K page: page_pfn <= pfn < page_pfn + 1 > > > > >>>> - hugetlbfs page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR > > > > >>>> - THP page: page_pfn <= pfn < page_pfn + HPAGE_PMD_NR > > > > >>>> in pfn_in_hpage. For hugetlbfs page, it should be > > > > >>>> page_pfn == pfn > > > > >>>> > > > > >>>> Now, change pfn_in_hpage to pfn_is_match to highlight that comparison > > > > >>>> is not only for THP and explicitly compare for these cases. > > > > >>> > > > > >>> Why is this important to do. I have asked and Mike had the same feeling > > > > >>> that the patch is missing any real justification. Why do we need this > > > > >>> change? It is great that you have dropped VM_BUG_ON btw. > > > > >>> > > > > >> I think it is important to make the code clear, as said, comparing hugetlbfs page > > > > >> in range page_pfn <= pfn < page_pfn + HPAGE_PMD_NR is confusion. > > > > > > > > > > I am sorry but I do not really see a big confusion here. It is probably > > > > > a matter of taste. If others consider this an improvement then I will > > > > > not stand in the way but I consider the justification insuficient for > > > > > merging. > > > > > > > > Perhaps I am confused, but the patch does update the check done for > > > > hugetlb pages. IIUC, previously there was no distinction between THP > > > > pages and hugetlb pages in the check. It is valid to pass in a sub- > > > > page of a THP page, but not that of a hugetlb page. > > > > > > > > I do not believe it is possible for existing code to pass in a sub-page > > > > of a hugetlb page. And, no one has ever seen this as an issue. This > > > > is why I was questioning the need for the patch. > > > > > > Exactly and that is the reason I fail the see a point. > > > > > > > With all that said, the new code/check is better (more complete) than > > > > the original. It may not do anything for the current code base, but > > > > it 'could' catch potential errors in future code. Because of this, I > > > > do consider the code an improvement. > > > > > > It adds a branch for something that doesn't happen and also to a code > > > path which is quite internal to the MM AFAICS. That being said, if you > > > believe this is an improvement then I will not stand in the way. But > > > there are so many other places which could add checks that are not > > > exercised... > > > > I just saw a bad page bug on one our host with 4.14 kernel: > > > > backtrace: > > :BUG: Bad page map in process python3.6 pte:1036e24025 pmd:1dd743d067 > > :page:fffff62840db8900 count:105 mapcount:-35 mapping:ffff97d432e97ad0 index:0x1 > > :flags: 0xbfffe000001006c(referenced|uptodate|lru|active|mappedtodisk) > > :raw: 0bfffe000001006c ffff97d432e97ad0 0000000000000001 00000069ffffffdc > > :raw: dead000000000100 dead000000000200 0000000000000000 ffff97c58fc0e000 > > :page dumped because: bad pte > > :page->mem_cgroup:ffff97c58fc0e000 > > :addr:00007f2ddffcc000 vm_flags:00000075 anon_vma: (null) > > mapping:ffff97d432e97ad0 index:7f > > :file:libc-2.17.so fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap > > [xfs] readpage:xfs_vm_readpage [xfs] > > :do_exit+0x563/0xbb0 > > :? vfs_write+0x162/0x1a0 > > :do_group_exit+0x3a/0xa0 > > :file:libc-2.17.so fault:xfs_filemap_fault [xfs] mmap:xfs_file_mmap > > [xfs] readpage:xfs_vm_readpage [xfs] > > :SyS_exit_group+0x10/0x10 > > :do_syscall_64+0x60/0x110 > > :entry_SYSCALL_64_after_hwframe+0x3d/0xa2 > > :RIP: 0033:0x7fb2504091d9 > > :RSP: 002b:00007ffcf035db88 EFLAGS: 00000246 ORIG_RAX: 00000000000000e7 > > :RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007fb2504091d9 > > :RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 > > :RBP: 00007fb250706838 R08: 000000000000003c R09: 00000000000000e7 > > :R10: ffffffffffffff70 R11: 0000000000000246 R12: 00007fb250706838 > > :R13: 00007fb25070be80 R14: 0000000000000000 R15: 0000000000000000 > > :CPU: 42 PID: 454567 Comm: python3.6 Tainted: G B W > > 4.14.98-t5.el7.twitter.x86_64 #1 > > > > I just saw it once and it didn't happen on the newer kernel (maybe > > just not happen yet). I can't tell why the mapcount could reach -35 > > since all page_remove_rmap() is protected by page table lock. Then I > > looked into pvmw code, and suspected the PTE might be changed after > > acquiring ptl. > > > > With the old check it was fine as long as "page_pfn <= pfn < page_pfn > > + 1", but for the base page case, it means we might be dec'ing > > mapcount for another page. > > > > Then I came across this commit. I guess this should be able to solve > > the problem, but unfortunately the problem is very rare so basically I > > can't prove this commit could solve it. > > > > If you agree my analysis, we may consider to backport this to stable tree. > > I am not sure I follow. Your page looks like a normal page cache and the > patch you are referring to is only clarifying hugetlb pages. I do not > remember details but it shouldn't have any functional effect. Are those > even used in your environemnt? Not only does the code touch hugetlbfs page, please see the below code snippet: + /* normal page and hugetlbfs page */ + if (!PageTransCompound(page) || PageHuge(page)) + return page_pfn == pfn; The potential problem per my understanding is pvmw does: 1. read pte 2. lock ptl 3. check pfn During step #1 and #2 the PTE might be changed, it is not surprising we typically have pte_same in page fault path to check this after acquiring ptl, but for pvmw path a full pte_same check might be unnecessary, since we just care if the pfn is intact or not. But before the patch as long as "old_pfn < new_pfn < old_pfn + 512" is true the check would pass for both normal base page and hugetlbfs page even though the new pfn is changed, that commit tightened the check. For the normal base page it must be "old_pfn == new_pfn". > -- > Michal Hocko > SUSE Labs