From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFF66C433EF for ; Fri, 24 Sep 2021 17:22:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 620D661029 for ; Fri, 24 Sep 2021 17:22:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 620D661029 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id D74CD6B0071; Fri, 24 Sep 2021 13:22:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D242B900002; Fri, 24 Sep 2021 13:22:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C12926B0073; Fri, 24 Sep 2021 13:22:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0042.hostedemail.com [216.40.44.42]) by kanga.kvack.org (Postfix) with ESMTP id B2B146B0071 for ; Fri, 24 Sep 2021 13:22:08 -0400 (EDT) Received: from smtpin34.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 712CD82499A8 for ; Fri, 24 Sep 2021 17:22:08 +0000 (UTC) X-FDA: 78623135136.34.E7E3BDB Received: from mail-il1-f180.google.com (mail-il1-f180.google.com [209.85.166.180]) by imf09.hostedemail.com (Postfix) with ESMTP id 29C06300FAE5 for ; Fri, 24 Sep 2021 17:22:08 +0000 (UTC) Received: by mail-il1-f180.google.com with SMTP id h9so11204984ile.6 for ; Fri, 24 Sep 2021 10:22:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=NF6KMebTrcnE6ZDlGZIxVRvDYdCDvqX/Fd5kEH/etjc=; b=Q/BgLeJ/XyR2TTP/o2QVcP5SNne657H5JPm0IiO7gO6CCW/F3LoWtko68GXTn+9fXJ fjWl818E3xECobWQZd6PE3t3NCKFoUiBmJdcmuvwulTe5crlxKHvUesdwqJYnhjbhm9E bMgb1XsApuGY2gqrYEo/iGaEZ62GIu3K0WD7W/Sn/lH/DEJYs6LUEXFNIhigOv+zBrAK HvEYk3Il+ux4Bl8KlTKOvc36n1Y+4RvRl7GBVgoGBlrofdc9G+d0ARm7YvhXcFh1ZaaD CG48tHPwKFZBtce2YdZ2OyFFGfcHFH+tiK0DTHOu0ADPuuTUA3sNXxwVzMX1+jrHDZRE sLRA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=NF6KMebTrcnE6ZDlGZIxVRvDYdCDvqX/Fd5kEH/etjc=; b=j9GKcbnClvVvGUSb3EjcHzvh6lsx+v2teQqWpO19VH8h4QHWwfB8HqZCWcRpLmqS19 E7gCYMXrTBN9nJX17P8CHYa8Gg2rIO95bSUa84StuyWco2pOf1BCArX6p0zNRSupFK2Z 0i24A3PPkCR6tNEHQbjSNky2jSDLm5p0C2ba0g14B0MleEr3CAKzH2zGKF/+4ILf97fe K7t6m/QCQCnWlKMkaz8DkdEtC0VMAtKgtwXn8A6i0yPGyI85e9lNLCgtPX30u31F/rAx G2IjUgSfK8mUpxqBLZSt6+qbQKrNrhXkDfrViBuK3BHXSjBdPGW+Dys5fmMBomFpl+E/ o4Eg== X-Gm-Message-State: AOAM531gwVmPvkizOo2t85BZOc9X/lcE81eCLdMZ/aBrbsGZxN9LmaVo OvpEKEZXGPKI7UpE0ClzICsPKxNKlOjx0LW8zbdjIQ== X-Google-Smtp-Source: ABdhPJy2wtjGitvpd9vw+AdFGxitMROC+03E/OCri7/nbib4kFh9+M1zI/wb30Ym//Md/tQaoyDkS6S7q68N0TETXHI= X-Received: by 2002:a05:6e02:13d4:: with SMTP id v20mr9413856ilj.247.1632504127283; Fri, 24 Sep 2021 10:22:07 -0700 (PDT) MIME-Version: 1.0 References: <20210923232512.210092-1-peterx@redhat.com> In-Reply-To: <20210923232512.210092-1-peterx@redhat.com> From: Axel Rasmussen Date: Fri, 24 Sep 2021 10:21:30 -0700 Message-ID: Subject: Re: [PATCH] mm/userfaultfd: selftests: Fix memory corruption with thp enabled To: Peter Xu Cc: Linux MM , LKML , Andrew Morton , Andrea Arcangeli , Nadav Amit , Li Wang Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 29C06300FAE5 X-Stat-Signature: 1w6wpfgweepphhwhiykiw7dnpcfpaext Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Q/BgLeJ/"; spf=pass (imf09.hostedemail.com: domain of axelrasmussen@google.com designates 209.85.166.180 as permitted sender) smtp.mailfrom=axelrasmussen@google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1632504128-653405 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 23, 2021 at 4:25 PM Peter Xu wrote: > > In RHEL's gating selftests we've encountered memory corruption in the uffd > event test even with upstream kernel: > > # ./userfaultfd anon 128 4 > nr_pages: 32768, nr_pages_per_cpu: 32768 > bounces: 3, mode: rnd racing read, userfaults: 6240 missing (6240) 14729 wp (14729) > bounces: 2, mode: racing read, userfaults: 1444 missing (1444) 28877 wp (28877) > bounces: 1, mode: rnd read, userfaults: 6055 missing (6055) 14699 wp (14699) > bounces: 0, mode: read, userfaults: 82 missing (82) 25196 wp (25196) > testing uffd-wp with pagemap (pgsize=4096): done > testing uffd-wp with pagemap (pgsize=2097152): done > testing events (fork, remap, remove): ERROR: nr 32427 memory corruption 0 1 (errno=0, line=963) > ERROR: faulting process failed (errno=0, line=1117) > > It can be easily reproduced when global thp enabled, which is the default for > RHEL. > > It's also known as a side effect of commit 0db282ba2c12 ("selftest: use mmap > instead of posix_memalign to allocate memory", 2021-07-23), which is imho right > itself on using mmap() to make sure the addresses will be untagged even on arm. > > The problem is, for each test we allocate buffers using two allocate_area() > calls. We assumed these two buffers won't affect each other, however they > could, because mmap() could have found that the two buffers are near each other > and having the same VMA flags, so they got merged into one VMA. > > It won't be a big problem if thp is not enabled, but when thp is agressively > enabled it means when initializing the src buffer it could accidentally setup > part of the dest buffer too when there's a shared THP that overlaps the two > regions. Then some of the dest buffer won't be able to be trapped by > userfaultfd missing mode, then it'll cause memory corruption as described. > > To fix it, do release_pages() after initializing the src buffer. But, if I understand correctly, release_pages() will just free the physical pages, but not touch the VMA(s). So, with the right max_ptes_none setting, why couldn't khugepaged just decide to re-collapse (with zero pages) immediately after we release the pages, causing the same problem? It seems to me this change just significantly narrows the race window (which explains why we see less of the issue), but doesn't fix it fundamentally. > > Since the previous two release_pages() calls are after uffd_test_ctx_clear() > which will unmap all the buffers anyway (which is stronger than release pages; > as unmap() also tear town pgtables), drop them as they shouldn't really be > anything useful. > > We can mark the Fixes tag upon 0db282ba2c12 as it's reported to only happen > there, however the real "Fixes" IMHO should be 8ba6e8640844, as before that > commit we'll always do explicit release_pages() before registration of uffd, > and 8ba6e8640844 changed that logic by adding extra unmap/map and we didn't > release the pages at the right place. Meanwhile I don't have a solid glue > anyway on whether posix_memalign() could always avoid triggering this bug, > hence it's safer to attach this fix to commit 8ba6e8640844. > > Cc: Andrea Arcangeli > Cc: Axel Rasmussen > Cc: Nadav Amit > Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1994931 > Fixes: 8ba6e8640844 ("userfaultfd/selftests: reinitialize test context in each test") > Reported-by: Li Wang > Signed-off-by: Peter Xu > --- > tools/testing/selftests/vm/userfaultfd.c | 23 ++++++++++++++++++++--- > 1 file changed, 20 insertions(+), 3 deletions(-) > > diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c > index 10ab56c2484a..60aa1a4fc69b 100644 > --- a/tools/testing/selftests/vm/userfaultfd.c > +++ b/tools/testing/selftests/vm/userfaultfd.c > @@ -414,9 +414,6 @@ static void uffd_test_ctx_init_ext(uint64_t *features) > uffd_test_ops->allocate_area((void **)&area_src); > uffd_test_ops->allocate_area((void **)&area_dst); > > - uffd_test_ops->release_pages(area_src); > - uffd_test_ops->release_pages(area_dst); > - > userfaultfd_open(features); > > count_verify = malloc(nr_pages * sizeof(unsigned long long)); > @@ -437,6 +434,26 @@ static void uffd_test_ctx_init_ext(uint64_t *features) > *(area_count(area_src, nr) + 1) = 1; > } > > + /* > + * After initialization of area_src, we must explicitly release pages > + * for area_dst to make sure it's fully empty. Otherwise we could have > + * some area_dst pages be errornously initialized with zero pages, > + * hence we could hit memory corruption later in the test. > + * > + * One example is when THP is globally enabled, above allocate_area() > + * calls could have the two areas merged into a single VMA (as they > + * will have the same VMA flags so they're mergeable). When we > + * initialize the area_src above, it's possible that some part of > + * area_dst could have been faulted in via one huge THP that will be > + * shared between area_src and area_dst. It could cause some of the > + * area_dst won't be trapped by missing userfaults. > + * > + * This release_pages() will guarantee even if that happened, we'll > + * proactively split the thp and drop any accidentally initialized > + * pages within area_dst. > + */ > + uffd_test_ops->release_pages(area_dst); > + > pipefd = malloc(sizeof(int) * nr_cpus * 2); > if (!pipefd) > err("pipefd"); > -- > 2.31.1 >