From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B1BAC64E90 for ; Mon, 30 Nov 2020 23:06:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EBB4B2076C for ; Mon, 30 Nov 2020 23:06:11 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="S16IXx7G" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EBB4B2076C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F17236B006E; Mon, 30 Nov 2020 18:06:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EEDA56B0070; Mon, 30 Nov 2020 18:06:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDE1D8D0001; Mon, 30 Nov 2020 18:06:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0084.hostedemail.com [216.40.44.84]) by kanga.kvack.org (Postfix) with ESMTP id C1A516B006E for ; Mon, 30 Nov 2020 18:06:10 -0500 (EST) Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 872142488 for ; Mon, 30 Nov 2020 23:06:10 +0000 (UTC) X-FDA: 77542619700.02.flag43_240e3cb273a5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin02.hostedemail.com (Postfix) with ESMTP id 6B4A010097AA2 for ; Mon, 30 Nov 2020 23:06:10 +0000 (UTC) X-HE-Tag: flag43_240e3cb273a5 X-Filterd-Recvd-Size: 7052 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf02.hostedemail.com (Postfix) with ESMTP for ; Mon, 30 Nov 2020 23:06:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1606777569; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding; bh=9V9FuQlcHe27aEonW+OALoveBudDEu+UmzrjYKQNh4E=; b=S16IXx7GtZ7XUs/NtDyo+/dfs2fj2KZEGUuo7fAn+eB7lw5U2ffWveDOImpKkrAEARPQS/ 1bOB9zZtJNT5uyOPScSa0W1gbHxjt0+iG+ZX4ogxI+HmYfzUHrW3lNO9Im6LG/HMy/xxHW CPjhDcbhGd//6DmSgwJkqb3Eg56Pm4s= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-78-9F1V3YHNM6y3WLj9c7ICFQ-1; Mon, 30 Nov 2020 18:06:07 -0500 X-MC-Unique: 9F1V3YHNM6y3WLj9c7ICFQ-1 Received: by mail-qt1-f198.google.com with SMTP id v18so9461170qta.22 for ; Mon, 30 Nov 2020 15:06:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=9V9FuQlcHe27aEonW+OALoveBudDEu+UmzrjYKQNh4E=; b=FxVyc6Uj9fjnZGOHQtRDhAzh23DTaRMKefWMlNmjuVT9WNrJNFyMUZB8ja6Ej4Y2rX Abqn81Snx3hztiMiq69UKA/S6omyFFV755wVEwHcUzRKiGOJCDc4r26ieKuFVcstRL25 cIFRdD13V6CVYENtk5y8PRjZZqlbjJFixZvz3nuUp18KwK2QZ9sPwzDEGnrcBdPORJGJ SLtwT1P8LTL02xi+h/3kKJ+5idonbHdzGfaa/W5sh/2H2vO/DOE9jOU9Itoo9jW7w/Z8 tpztgEEuYbWFEF1OmBVQPIMqyRw9zq+7KTGJSYjV4VjbRvIF7sU1gxi7c/LkeRedjYDN FtSQ== X-Gm-Message-State: AOAM530gPmoXgOSuy5Bqv4YzPN+7vul6Be4MWMIrKOxDyyQ9H7jqUYWy 61ig76Syy7pFfe19Ql1hUEH98WEPU6nK4ytbP9kS9h9msjv0zlDzi9JYCNuVyJ82IeGZKQGDJGK VAUoQ9KDQaZ8= X-Received: by 2002:a0c:f585:: with SMTP id k5mr103826qvm.13.1606777566633; Mon, 30 Nov 2020 15:06:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJw/Dym6WjHP1+CpIQT83RJxo3OYyzcdTaNwpv/TvzTB4YWBqy3oSKltZ0fsvSU7z/TUTcO8sg== X-Received: by 2002:a0c:f585:: with SMTP id k5mr103794qvm.13.1606777566321; Mon, 30 Nov 2020 15:06:06 -0800 (PST) Received: from xz-x1.redhat.com ([142.126.81.247]) by smtp.gmail.com with ESMTPSA id v28sm16605610qkj.103.2020.11.30.15.06.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 30 Nov 2020 15:06:05 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , peterx@redhat.com, Hugh Dickins , Andrea Arcangeli , Mike Rapoport , Matthew Wilcox Subject: [PATCH v2] mm: Don't fault around userfaultfd-registered regions on reads Date: Mon, 30 Nov 2020 18:06:03 -0500 Message-Id: <20201130230603.46187-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Faulting around for reads are in most cases helpful for the performance s= o that continuous memory accesses may avoid another trip of page fault. However= it may not always work as expected. For example, userfaultfd registered regions may not be the best candidate= for pre-faults around the reads. For missing mode uffds, fault around does not help because if the page ca= che existed, then the page should be there already. If the page cache is not there, nothing else we can do, either. If the fault-around code is desti= ned to be helpless for userfault-missing vmas, then ideally we can skip it. For wr-protected mode uffds, errornously fault in those pages around coul= d lead to threads accessing the pages without uffd server's awareness. For exam= ple, when punching holes on uffd-wp registered shmem regions, we'll first try = to unmap all the pages before evicting the page cache but without locking th= e page (please refer to shmem_fallocate(), where unmap_mapping_range() is c= alled before shmem_truncate_range()). When fault-around happens near a hole be= ing punched, we might errornously fault in the "holes" right before it will b= e punched. Then there's a small window before the page cache was finally dropped, and after the page will be writable again (NOTE: the uffd-wp pro= tect information is totally lost due to the pre-unmap in shmem_fallocate(), so= the page can be writable within the small window). That's severe data loss. Let's grant the userspace full control of the uffd-registered ranges, rat= her than trying to do the tricks. Cc: Hugh Dickins Cc: Andrea Arcangeli Cc: Andrew Morton Cc: Mike Rapoport Cc: Matthew Wilcox Reviewed-by: Mike Rapoport Signed-off-by: Peter Xu --- v2: - use userfaultfd_armed() directly [Mike] Note that since no file-backed uffd-wp support is there yet upstream, so = the uffd-wp check is actually not really functioning. However since we have = all the necessary uffd-wp concepts already upstream, maybe it's better to do = it once and for all. This patch comes from debugging a data loss issue when working on the uff= d-wp support on shmem/hugetlbfs. I posted this out for early review and comme= nts, but also because it should already start to benefit missing mode userfaul= tfd to avoid trying to fault around on reads. --- mm/memory.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) diff --git a/mm/memory.c b/mm/memory.c index eeae590e526a..59b2be22565e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3933,6 +3933,23 @@ static vm_fault_t do_fault_around(struct vm_fault = *vmf) int off; vm_fault_t ret =3D 0; =20 + /* + * Be extremely careful with uffd-armed regions. + * + * For missing mode uffds, fault around does not help because if the + * page cache existed, then the page should be there already. If the + * page cache is not there, nothing else we can do either. + * + * For wr-protected mode uffds, errornously fault in those pages around + * could lead to threads accessing the pages without uffd server's + * awareness, finally it could cause ghostly data corruption. + * + * The idea is that, every single page of uffd regions should be + * governed by the userspace on which page to fault in. + */ + if (unlikely(userfaultfd_armed(vmf->vma))) + return 0; + nr_pages =3D READ_ONCE(fault_around_bytes) >> PAGE_SHIFT; mask =3D ~(nr_pages * PAGE_SIZE - 1) & PAGE_MASK; =20 --=20 2.26.2