From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 456FBC433F5 for ; Mon, 3 Oct 2022 21:28:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2A526B0071; Mon, 3 Oct 2022 17:28:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB2B96B0073; Mon, 3 Oct 2022 17:28:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2C5C6B0074; Mon, 3 Oct 2022 17:28:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 82F3C6B0071 for ; Mon, 3 Oct 2022 17:28:01 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4C991C0847 for ; Mon, 3 Oct 2022 21:28:01 +0000 (UTC) X-FDA: 79980925962.28.45911F1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id AF0EB14001A for ; Mon, 3 Oct 2022 21:28:00 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1664832480; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rRZrqABbAJl6nJtSQCzOUEo0gY6dCRHrx7ZyLqZHrnQ=; b=g+roi/AkpOwHg/4QcfQekDt9o6ZnRiYQcGIXyEp1dtVlDR3xqRXI37bQCii3bujCWNc+wN oweskZFb/WiOhe5PjxTODrBGVTbYi6wJl9c+mBdpVi4VItm4RuTsZoS3sYOzrNT+Rz83Z+ Z/yFbBptJ2vcQMXGi5lJno958t1jMao= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-114-GbYuzsTCP8aLx_K_Pcmt9w-1; Mon, 03 Oct 2022 17:27:57 -0400 X-MC-Unique: GbYuzsTCP8aLx_K_Pcmt9w-1 Received: by mail-qv1-f71.google.com with SMTP id mo5-20020a056214330500b004ad711537a6so7792558qvb.10 for ; Mon, 03 Oct 2022 14:27:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date; bh=rRZrqABbAJl6nJtSQCzOUEo0gY6dCRHrx7ZyLqZHrnQ=; b=RlJtw96j6pOIb7q8g/kxYwe3YRuoo3h5vixSJtEBJfgzbQU45WjNy1GqM0B0yll40h WrKerXxsx1ITMHcfCNTJgq/anJH2yTamTtAjt8qQ0QICBrOMRyEnlYo+gq8kVb0xAhGc P5Ry7UtMgQ2QHq9/XFoVd8fWmthWLv3HtVngliBoZPSKttpDMitOW1AQ6y5PpPC1mERF Rkr42U96OHWBU9HIj6hLyvbF7l8sxv11ZWneQwyskqREoAWTsihc1VmHhFAWuNpVtQDI l7ym556YPlI/9VGUqvQ5mkvxAh4SReXjcgYfaTEq+Mi4M7vJLpmqQ0W3uinzLKT+3jCy LY6A== X-Gm-Message-State: ACrzQf3nMleWT1f28tcB4KMgfdeTmIrEIwi3A6Ki7AfFjg73qRgwrXMS OjQanTZThMOSYbF6GUzEx5I4S7tYja43H55iozuAQOrZ6vH5HnC9kb5voJbVUYyvPsXh0ydiUDx kvQoeXIGpplQ= X-Received: by 2002:ad4:5944:0:b0:4ad:7802:c35a with SMTP id eo4-20020ad45944000000b004ad7802c35amr17752443qvb.84.1664832476528; Mon, 03 Oct 2022 14:27:56 -0700 (PDT) X-Google-Smtp-Source: AMsMyM4dpYFsadgte02ajtCQnVHbVKrHCOF2nTI51FBcoDopBFsDBe096LRY636BcKyBenxRodSlUg== X-Received: by 2002:ad4:5944:0:b0:4ad:7802:c35a with SMTP id eo4-20020ad45944000000b004ad7802c35amr17752431qvb.84.1664832476333; Mon, 03 Oct 2022 14:27:56 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id 85-20020a370758000000b006ceb8f36302sm11494156qkh.71.2022.10.03.14.27.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Oct 2022 14:27:55 -0700 (PDT) Date: Mon, 3 Oct 2022 17:27:52 -0400 From: Peter Xu To: Mike Kravetz Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Andrea Arcangeli , Mike Rapoport , David Hildenbrand , Andrew Morton , Axel Rasmussen , Nadav Amit Subject: Re: [PATCH 1/3] mm/hugetlb: Fix race condition of uffd missing/minor handling Message-ID: References: <20221003155630.469263-1-peterx@redhat.com> <20221003155630.469263-2-peterx@redhat.com> MIME-Version: 1.0 In-Reply-To: X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="g+roi/Ak"; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664832480; a=rsa-sha256; cv=none; b=g3vcP2B54AB2P62xrCQ8QrDnFNaxom9XR+Zn1u8/uszAndF9jb2zCyTUWqTpbVaNw3Vkrt qeqQOSbjliRvayLfoAW8S1g8Nm5zxwoM0uzrJ7AWRLEXmFpOAFZ5unjsddhz6rygQxAToo Dy8NB3zLBMIgIblmKlom7sIXZyd6spA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664832480; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rRZrqABbAJl6nJtSQCzOUEo0gY6dCRHrx7ZyLqZHrnQ=; b=2jFgQq1daXit9/yWlNim1MhfSJhIpEK8qOKajwp3xVXqlDBe/7a4UCvwAEh5SnbdJRAiHT 9hqTgqiey2IDkAhlPcQenRAqmJ2eEtJiaZ7pV7qkh04voLFhehKXFcPMlg2dWha/jS/SYK Ux4sUk/RVyrafdThUHbnMQwtSfKkZEc= X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: AF0EB14001A X-Rspam-User: Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="g+roi/Ak"; spf=pass (imf23.hostedemail.com: domain of peterx@redhat.com designates 170.10.133.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Stat-Signature: 49rmpfwuck3y4qh4gbbxq3wct9gdyyq3 X-HE-Tag: 1664832480-889627 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 03, 2022 at 10:20:29AM -0700, Mike Kravetz wrote: > On 10/03/22 11:56, Peter Xu wrote: > > After the recent rework patchset of hugetlb locking on pmd sharing, > > kselftest for userfaultfd sometimes fails on hugetlb private tests with > > unexpected write fault checks. > > > > It turns out there's nothing wrong within the locking series regarding this > > matter, but it could have changed the timing of threads so it can trigger > > an old bug. > > > > The real bug is when we call hugetlb_no_page() we're not with the pgtable > > lock. It means we're reading the pte values lockless. It's perfectly fine > > in most cases because before we do normal page allocations we'll take the > > lock and check pte_same() again. However before that, there are actually > > two paths on userfaultfd missing/minor handling that may directly move on > > with the fault process without checking the pte values. > > > > It means for these two paths we may be generating an uffd message based on > > an unstable pte, while an unstable pte can legally be anything as long as > > the modifier holds the pgtable lock. > > > > One example, which is also what happened in the failing kselftest and > > caused the test failure, is that for private mappings CoW can happen on one > > page. CoW requires pte being cleared before being replaced with a new page > > for TLB coherency, but then there can be a race condition: > > > > thread 1 thread 2 > > -------- -------- > > > > hugetlb_fault hugetlb_fault > > private pte RO > > hugetlb_wp > > pgtable_lock() > > huge_ptep_clear_flush > > pte=NULL > > hugetlb_no_page > > generate uffd missing event > > even if page existed!! > > set_huge_pte_at > > pgtable_unlock() > > Thanks for working on this Peter! > > I agree with this patch, but I suspect the above race is not possible. Why? > In both cases, we take the hugetlb fault mutex when processing a huegtlb > page fault. This means only one thread can execute the fault code for > a specific mapping/index at a time. This is why I was so confused, and may > remain a bit confused about how we end up with userfault processing a write > fault. It was part of the reason for my 'unclear' wording about this being > more about cpus not seeing updated values. Note that we do drop the fault > mutex before calling handle_usefault, but by then we have already made the > 'missing' determination. > > Thoughts? Perhaps, I am still confused. It's my fault to have the commit message wrong, sorry. :) And thanks for raising this question, I could have overlooked that. It turns out it's not the CoW that's clearing the pte... it's the wr-protect with huge_ptep_modify_prot_start(). So the race is with UFFDIO_WRITEPROTECT, not CoW. Obviously when I was tracking the hpte changes I overlooked huge_ptep_get_and_clear(), only seeing the CoW path and I'm pretty sure that's already a bug which was obvious enough. I didn't prove they happened at the same time during the MISSING event. Then after I further looked at the CoW code I start to question myself on why CoW would trigger at all even with an available fault mutex, since for private mappings mapcount should be 1: if (page_mapcount(old_page) == 1 && PageAnon(old_page)) { if (!PageAnonExclusive(old_page)) page_move_anon_rmap(old_page, vma); if (likely(!unshare)) set_huge_ptep_writable(vma, haddr, ptep); delayacct_wpcopy_end(); return 0; } There could still be something else I didn't catch, even though that may not be relevant to this patchset anymore. I'll wait a few more hours for other reviewers, then prepare a new version with the corrected commit message. Thanks, -- Peter Xu