From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83122C433FE for ; Wed, 30 Nov 2022 16:23:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 056BB6B0072; Wed, 30 Nov 2022 11:23:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 006CC6B0073; Wed, 30 Nov 2022 11:23:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0FF46B0074; Wed, 30 Nov 2022 11:23:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id D14006B0072 for ; Wed, 30 Nov 2022 11:23:45 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id A3D82807A7 for ; Wed, 30 Nov 2022 16:23:45 +0000 (UTC) X-FDA: 80190629610.23.CAA14E8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 4E7871A000E for ; Wed, 30 Nov 2022 16:23:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1669825424; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=o2Al/9kqwid6v5pluscQbgDnOPDZ2kG1eBnUqFvKT+A=; b=cDJXjn7edi2Vz82TsmC5PjxsJr4QYCWxP51d3ZzkDZWlPyO6jlNU5/1m+UubxPlF8peSJv YYnFzVVpiDyz1WO3FMS1lr1+rA09lrGepY5KTeToieZDlURR7YgczhOjhl/DwJQJJ7rSub 4VKe4qcViqms33zZE2xQrXpGFukHyEM= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id us-mta-501-YLwPYFlMMYShNBMouBzDEA-1; Wed, 30 Nov 2022 11:23:41 -0500 X-MC-Unique: YLwPYFlMMYShNBMouBzDEA-1 Received: by mail-qt1-f200.google.com with SMTP id bz20-20020a05622a1e9400b003a646e03748so29219828qtb.12 for ; Wed, 30 Nov 2022 08:23:41 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=o2Al/9kqwid6v5pluscQbgDnOPDZ2kG1eBnUqFvKT+A=; b=k42HMoow0z/8a+jTaJPe42mnowgsSB/rrt/YUp60b93yN8fY0ogkuTJ9Vdz4YEe0di SayBoGiyfzj/9PwXGYwrZeblW4yW9FP6l5YC0v2lT5h93++mUzSL8Ay1cD6ipt/rhFpx KfenCQkVYttMTE7mEHSpvRtr4bzPbtou4yek5WBJ5PyFe73hEojbIhtXGYL6FdrBRhZQ do08OSaVyjCyIxhYs9Dj5Ej+/PjvZOK1HsKX1YU7odOWfWQQdXAEmEgVOUN6tN764Ep+ cRAS93r96WAAeHqfqzN+H4FDdVEMisCood87lWyGCmh8lCa0EzaDj9J4WiIgMaIyJEBX TFsw== X-Gm-Message-State: ANoB5plffqC0y9lHiat7kePlfrOXBDPweN7SaXCmjAZ/kRWyCB/NZjmf 7gWw5kKc2hLvnxuV4eMTkAZbUvvTt/DpWzOW9YwcF/WLbzXr4QG4UQImvhH1qB9pDhlqYquTrp0 O6YsMPPL/aQc= X-Received: by 2002:a05:622a:4184:b0:3a5:931a:8280 with SMTP id cd4-20020a05622a418400b003a5931a8280mr43139169qtb.31.1669825420761; Wed, 30 Nov 2022 08:23:40 -0800 (PST) X-Google-Smtp-Source: AA0mqf72rYOP8wiMoqHFGZfl96zZ6te1Cv7mBNQlvjzRx3UydJMhSRTCiSxFdfCFB4SOGBifAY029Q== X-Received: by 2002:a05:622a:4184:b0:3a5:931a:8280 with SMTP id cd4-20020a05622a418400b003a5931a8280mr43139146qtb.31.1669825420495; Wed, 30 Nov 2022 08:23:40 -0800 (PST) Received: from x1n (bras-base-aurron9127w-grc-46-70-31-27-79.dsl.bell.ca. [70.31.27.79]) by smtp.gmail.com with ESMTPSA id bn12-20020a05622a1dcc00b003a527d29a41sm1050053qtb.75.2022.11.30.08.23.39 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 30 Nov 2022 08:23:40 -0800 (PST) Date: Wed, 30 Nov 2022 11:23:39 -0500 From: Peter Xu To: David Hildenbrand Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton , Jann Horn , Andrew Morton , Andrea Arcangeli , Rik van Riel , Nadav Amit , Miaohe Lin , Muchun Song , Mike Kravetz Subject: Re: [PATCH 00/10] mm/hugetlb: Make huge_pte_offset() thread-safe for pmd unshare Message-ID: References: <20221129193526.3588187-1-peterx@redhat.com> <1eff312b-1aca-6afb-3587-f65e698b3f8c@redhat.com> MIME-Version: 1.0 In-Reply-To: <1eff312b-1aca-6afb-3587-f65e698b3f8c@redhat.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669825425; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=o2Al/9kqwid6v5pluscQbgDnOPDZ2kG1eBnUqFvKT+A=; b=kSJN/Of+ERgyNLvW4FpU2t6wR6hRGC7Qb5COc17UuxOABkoUfnupdK/ad5+sV4Ygzr97/L SS88fMGA2npl3vBUTy5gteekdJ24REKKaL3An6y2n9zzAutfNEMCIoKshMmBhZTGBFwADj mFli63EheY7IYLAp6ePUZCsiQn5tnXw= ARC-Authentication-Results: i=1; imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cDJXjn7e; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669825425; a=rsa-sha256; cv=none; b=F4/gJyoIX8ORO0DlgC9Bzj/qcK0GxkPqiGifJ+PSGlpuU7l0GcBFiLDnTSuvaMkcbeFVwD 3+GIAkxSmkPW3yl33okTqKXi3tPdjPEq+ve3xtAhuFUK0qhWUA9j5x5/hVMeu16G9almtv UcaKS6NseOGMg3kbcDCq+67+Hg3sj6A= X-Stat-Signature: w8s1nm7ihcnwjfd3sfn75gy3zfjpoma4 X-Rspamd-Queue-Id: 4E7871A000E Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=cDJXjn7e; spf=pass (imf19.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam06 X-Rspam-User: X-HE-Tag: 1669825425-392296 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Nov 30, 2022 at 10:46:24AM +0100, David Hildenbrand wrote: > > huge_pte_offset() is always called with mmap lock held with either read or > > write. It was assumed to be safe but it's actually not. One race > > condition can easily trigger by: (1) firstly trigger pmd share on a memory > > range, (2) do huge_pte_offset() on the range, then at the meantime, (3) > > another thread unshare the pmd range, and the pgtable page is prone to lost > > if the other shared process wants to free it completely (by either munmap > > or exit mm). > > So just that I understand correctly: > > Two processes, #A and #B, share a page table. Process #A runs two threads, > #A1 and #A2. > > #A1 walks that shared page table (using huge_pte_offset()), for example, to > resolve a page fault. Concurrently, #A2 triggers unsharing of that page > table (replacing it by a private page table), Not yet replacing it, just unsharing. If the replacement happened we shouldn't trigger a bug either because huge_pte_offset() will return the private pgtable page instead. > for example, using munmap(). munmap() may not work because it needs mmap lock, so it'll wait until #A1 completes huge_pte_offset() walks and release mmap lock read. Many of other things can trigger unshare, though. In the reproducer I used MADV_DONTNEED. > > So #A1 will eventually read/write the shared page table while we're placing > a private page table. Which would be fine (assuming no unsharing would be > required by #A1), however, if #B also concurrently drops the reference to > the shared page table (), the shared page table could essentially get freed > while #A1 is still walking it. > > I suspect, looking at the reproducer, that the page table deconstructor was > called. Will the page table also actually get freed already? IOW, could #A1 > be reading/writing a freed page? If with the existing code base, I think it could. If with RCU lock, it couldn't, but still since the pgtable lock is freed even if the page is not, we'll still hit weird issues when accessing the lock. And with vma lock it should be all safe. -- Peter Xu