From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 783BBC2BA2B for ; Fri, 10 Apr 2020 15:32:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1181A20857 for ; Fri, 10 Apr 2020 15:32:43 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="COLGNU/8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1181A20857 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4F3B28E0040; Fri, 10 Apr 2020 11:32:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A5738E0003; Fri, 10 Apr 2020 11:32:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 393C88E0040; Fri, 10 Apr 2020 11:32:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0171.hostedemail.com [216.40.44.171]) by kanga.kvack.org (Postfix) with ESMTP id 203948E0003 for ; Fri, 10 Apr 2020 11:32:43 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id D9245824556B for ; Fri, 10 Apr 2020 15:32:42 +0000 (UTC) X-FDA: 76692337764.07.rod26_6ef4a52080524 X-HE-Tag: rod26_6ef4a52080524 X-Filterd-Recvd-Size: 10532 Received: from us-smtp-1.mimecast.com (us-smtp-delivery-1.mimecast.com [205.139.110.120]) by imf29.hostedemail.com (Postfix) with ESMTP for ; Fri, 10 Apr 2020 15:32:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1586532762; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=OA3ijlzwJGPXa79eVavUukHz6Px2Nq2BQLr6Ytz1/7U=; b=COLGNU/8tBXAE6xGZU86irYUoJVhq0WVCcJ4W+gFQCNs8l7nCHEWWqHT/ueaAOOV2hYu1k ZSUuUOB7YM0t7z/8ABn7LeNHKRTLM+7w2xUkRiP0nnOxFTAdm044HELqfN9EzFibuSA5t+ ABLgJ5y8fC+YSoKe0qoELxL8MjoyfKw= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-512-Qc6lOewmO16G9cbZ9aySRA-1; Fri, 10 Apr 2020 11:32:38 -0400 X-MC-Unique: Qc6lOewmO16G9cbZ9aySRA-1 Received: by mail-qt1-f199.google.com with SMTP id z8so2047709qtu.17 for ; Fri, 10 Apr 2020 08:32:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=JLUYil4vS5Pi2voMRT5GlnG+2xlYeMU6lI0hTtfZyHg=; b=jV4osM1/z5H+irCmYChs+wuObIRkE/KFirtVqK7o2Xm3CD4kAWfUEc8oCriQIxa2wj g8Y/ALPjw3DnJYnp8TdyuLXXrzRQ7DHWxPSTV6owZ8vKpyZGnOxzOsGJYSG2oRgXm5M5 1WHIyD5nrW0rV2Seb+7IdVslDvfuW2E7DbNEM1tAz4+ZfsLOsY/ql3XdgPRdPMnHtPAA Y6xGnvUJScZ58ASiqbFRpGXLgVJB7+fOxJNcuCTYIQvcuW5slVUrJH7aVi36tqnnrvV1 g9XjGEhK47m286YQTN6Ykr7jnGzF4oQkWIGaYPKDiOXi6Nwvn0TTaC1U8150rSJ8n1h3 e+rA== X-Gm-Message-State: AGi0PubpgyPAKP+Hm6pqqrWG77yiu2llTji/KXB3bkxQQiPzcyEnR4U/ 1g+WH4lhqN/wT69QK/BsqsSJ7n1Opp7oAK8mjJglVfgAJWVMVnfRKv5wyjofI/f4HKmN9LqGo6I P8lx25qtIlK4= X-Received: by 2002:ac8:3665:: with SMTP id n34mr2692989qtb.227.1586532757747; Fri, 10 Apr 2020 08:32:37 -0700 (PDT) X-Google-Smtp-Source: APiQypJNTP/hRSUYK1gpIDXWD5qj/bfCDgNQdDXyuxcvwIhEitGO6BGt9wACWBPTyIeIwHtWKc3Lrw== X-Received: by 2002:ac8:3665:: with SMTP id n34mr2692949qtb.227.1586532757277; Fri, 10 Apr 2020 08:32:37 -0700 (PDT) Received: from xz-x1 ([2607:9880:19c0:32::2]) by smtp.gmail.com with ESMTPSA id y126sm1838024qke.28.2020.04.10.08.32.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 10 Apr 2020 08:32:36 -0700 (PDT) Date: Fri, 10 Apr 2020 11:32:34 -0400 From: Peter Xu To: Hillf Danton Cc: kernel test robot , Andrew Morton , Linux Memory Management List , linux-kernel@vger.kernel.org Subject: Re: f45ec5ff16 ("userfaultfd: wp: support swap and page migration"): [ 140.777858] BUG: Bad rss-counter state mm:b278fc66 type:MM_ANONPAGES val:1 Message-ID: <20200410153234.GB3172@xz-x1> References: <20200410002518.GG8179@shao2-debian> <20200410073209.11164-1-hdanton@sina.com> MIME-Version: 1.0 In-Reply-To: <20200410073209.11164-1-hdanton@sina.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Hillf, On Fri, Apr 10, 2020 at 03:32:09PM +0800, Hillf Danton wrote: >=20 > On Fri, 10 Apr 2020 08:25:18 +0800 > > Greetings, > >=20 > > 0day kernel testing robot got the below dmesg and the first bad commit = is > >=20 > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git mast= er > >=20 > > commit f45ec5ff16a75f96dac8c89862d75f1d8739efd4 > > Author: Peter Xu > > AuthorDate: Mon Apr 6 20:06:01 2020 -0700 > > Commit: Linus Torvalds > > CommitDate: Tue Apr 7 10:43:39 2020 -0700 > >=20 > > userfaultfd: wp: support swap and page migration > > =20 > > For either swap and page migration, we all use the bit 2 of the ent= ry to > > identify whether this entry is uffd write-protected. It plays a si= milar > > role as the existing soft dirty bit in swap entries but only for ke= eping > > the uffd-wp tracking for a specific PTE/PMD. > > =20 > > Something special here is that when we want to recover the uffd-wp = bit > > from a swap/migration entry to the PTE bit we'll also need to take = care of > > the _PAGE_RW bit and make sure it's cleared, otherwise even with th= e > > _PAGE_UFFD_WP bit we can't trap it at all. > > =20 > > In change_pte_range() we do nothing for uffd if the PTE is a swap e= ntry. > > That can lead to data mismatch if the page that we are going to wri= te > > protect is swapped out when sending the UFFDIO_WRITEPROTECT. This = patch > > also applies/removes the uffd-wp bit even for the swap entries. > > > Have trouble understanding the last sentence in the paragraph above and > particularly linking it to the first one. This patch handles the swap entry tracking for userfault-wp. I wanted to express the fact that we didn't do that before. Sorry if my wording is confusing. >=20 > > If you fix the issue, kindly add following tag > > Reported-by: kernel test robot > >=20 > > [child3:925] eventfd (323) returned ENOSYS, marking as inactive. > > [ 132.014801] can: request_module (can-proto-2) failed. > > [ 132.063717] can: request_module (can-proto-2) failed. > > [ 137.186037] trinity-c2 (943) used greatest stack depth: 5804 bytes l= eft > > [ 140.771486] MCE: Killing trinity-c2:956 due to hardware memory corru= ption fault at 8bd2060 > > [ 140.777858] BUG: Bad rss-counter state mm:b278fc66 type:MM_ANONPAGES= val:1 > > [ 140.778736] BUG: Bad rss-counter state mm:b278fc66 type:MM_SHMEMPAGE= S val:2 > > [ 141.589424] MCE: Killing trinity-c3:940 due to hardware memory corru= ption fault at 8a8c860 > > [ 141.590730] swap_info_get: Bad swap file entry 700b8216 Should be a 32bit guest, swap type =3D=3D 11100b. I guess this also means MAX_SWAPFILES is bigger than 11100b. > > [ 141.591400] BUG: Bad page map in process trinity-c3 pte:17042c3c pm= d:b1809067 > > [ 141.592304] addr:08bcf000 vm_flags:00100073 anon_vma:f1f29528 mappin= g:00000000 index:8bcf > > [ 141.593399] file:(null) fault:0x0 mmap:0x0 readpage:0x0 > > [ 141.594065] CPU: 0 PID: 940 Comm: trinity-c3 Not tainted 5.6.0-11490= -gf45ec5ff16a75 #1 > > [ 141.595055] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), B= IOS 1.12.0-1 04/01/2014 > > [ 141.596093] Call Trace: > > [ 141.596443] dump_stack+0x16/0x18 > > [ 141.596868] print_bad_pte+0x13f/0x159 > > [ 141.597367] unmap_page_range+0x2a7/0x3e7 > > [ 141.597893] unmap_single_vma+0x53/0x5d > > [ 141.598383] unmap_vmas+0x2c/0x3b > > [ 141.598811] exit_mmap+0x81/0xfc > > [ 141.599238] __mmput+0x25/0x8d > > [ 141.599633] mmput+0x28/0x2b > > [ 141.600007] do_exit+0x2f0/0x84a > > [ 141.600449] ? ___might_sleep+0x3f/0x11f > > [ 141.600949] do_group_exit+0x86/0x86 > > [ 141.601421] __ia32_sys_exit_group+0x15/0x15 > > [ 141.601965] do_fast_syscall_32+0x86/0xbf > > [ 141.602481] entry_SYSENTER_32+0xaf/0x101 >=20 >=20 > Because is_swap_pte(oldpte) !=3D IS_ENABLED(CONFIG_MIGRATION)), restore t= he > old behavior by modifying uffd_wp only for pte that is __not__ swap entry= , > as the commit log says. >=20 > --- b/mm/mprotect.c > +++ c/mm/mprotect.c > @@ -139,7 +139,7 @@ static unsigned long change_pte_range(st > =09=09=09} > =09=09=09ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); > =09=09=09pages++; > -=09=09} else if (is_swap_pte(oldpte)) { > +=09=09} else if (IS_ENABLED(CONFIG_MIGRATION)) { > =09=09=09swp_entry_t entry =3D pte_to_swp_entry(oldpte); > =09=09=09pte_t newpte; > =20 > @@ -154,7 +154,9 @@ static unsigned long change_pte_range(st > =09=09=09=09=09newpte =3D pte_swp_mksoft_dirty(newpte); > =09=09=09=09if (pte_swp_uffd_wp(oldpte)) > =09=09=09=09=09newpte =3D pte_swp_mkuffd_wp(newpte); > -=09=09=09} else if (is_write_device_private_entry(entry)) { > +=09=09=09} > + > +=09=09=09if (is_write_device_private_entry(entry)) { > =09=09=09=09/* > =09=09=09=09 * We do not preserve soft-dirtiness. See > =09=09=09=09 * copy_one_pte() for explanation. > @@ -163,11 +165,18 @@ static unsigned long change_pte_range(st > =09=09=09=09newpte =3D swp_entry_to_pte(entry); > =09=09=09=09if (pte_swp_uffd_wp(oldpte)) > =09=09=09=09=09newpte =3D pte_swp_mkuffd_wp(newpte); > -=09=09=09} else { > -=09=09=09=09newpte =3D oldpte; > =09=09=09} > =20 > -=09=09=09if (uffd_wp) > +=09=09=09/* > +=09=09=09 * do nothing for changing uffd_wp if oldpte is a > +=09=09=09 * swap entry. > +=09=09=09 * That can lead to data mismatch if the page we > +=09=09=09 * are going to write protect is swapped out when > +=09=09=09 * sending the UFFDIO_WRITEPROTECT. > +=09=09=09 */ > +=09=09=09if (is_swap_pte(oldpte)) > +=09=09=09=09newpte =3D oldpte; > +=09=09=09else if (uffd_wp) > =09=09=09=09newpte =3D pte_swp_mkuffd_wp(newpte); > =09=09=09else if (uffd_wp_resolve) > =09=09=09=09newpte =3D pte_swp_clear_uffd_wp(newpte); I'm not sure this is correct. As I mentioned, the commit wanted to apply the uffd-wp bit even for the swap entries so that even the swap entries got swapped in, the page will still be write protected. So IIUC think we can't remove that. Above report happened at __swap_info_get() where we should have triggered type >=3D READ_ONCE(nr_swapfiles) in swap_type_to_swap_info(), which, iiuc, means the type is considered wrong. The mistery to me is how did the swap type went wrong. E.g., iiuc the "is_swap_pte(oldpte)" replacement to "IS_ENABLED(CONFIG_MIGRATION)" shouldn't affect this, because even if MIGRATION is not enabled, both is_write_migration_entry() and is_write_device_private_entry() should return constant zero after all, so they shouldn't be thouching the swap type. I'm still trying to digest on what's happened... It would be good too if more information on the test could be given, e.g., what is the behavior of trinity-c2. A reproducer is of course even better. Thanks, --=20 Peter Xu