From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D3121C6FD1F for ; Tue, 26 Mar 2024 17:48:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 64B326B0082; Tue, 26 Mar 2024 13:48:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5FB146B008C; Tue, 26 Mar 2024 13:48:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C2C26B0092; Tue, 26 Mar 2024 13:48:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 3AA726B0082 for ; Tue, 26 Mar 2024 13:48:57 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 072E7A0788 for ; Tue, 26 Mar 2024 17:48:57 +0000 (UTC) X-FDA: 81939925914.14.20C6269 Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf10.hostedemail.com (Postfix) with ESMTP id 19EECC0006 for ; Tue, 26 Mar 2024 17:48:54 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1711475335; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=CvpisOiIq20x9lhw6zGo9vMt2Jp7iikVwGfYc3NGP6c=; b=BT94DjZnEsX0r3hwzGJSYlJBaDdopfjxUScK71gIaGxqHWKwvQnHPOunM99sEaHsONDUEP +VISjOs5GopH0wjFCKXUD+lPYe0Sw1vAnBKTpd2iOEqeVWtecx7cQH885k5ux+6rNBi8Lu DaCARScwo3LUlCfJu0O4kov0fJfy5vg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1711475335; a=rsa-sha256; cv=none; b=v4Mgl+xMm4kKzygCqQm7nbiTz9UHfSDGcNBk548npz5Iz066cu9Z3Eu+2SvT847WE0VrEy GDzD1xN56tHPiTBmU13XLrikpEaEqMDMyqC4xNTfFzcdA/OLTrSWTU87YCX41qYTxNQkFi mRb+eAxdhrKLsH8b/Fo7AC04K3M0o0M= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=none; spf=pass (imf10.hostedemail.com: domain of ryan.roberts@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=ryan.roberts@arm.com; dmarc=pass (policy=none) header.from=arm.com Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id F39BC2F4; Tue, 26 Mar 2024 10:49:27 -0700 (PDT) Received: from [10.1.29.179] (XHFQ2J9959.cambridge.arm.com [10.1.29.179]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id 55DAE3F64C; Tue, 26 Mar 2024 10:48:52 -0700 (PDT) Message-ID: Date: Tue, 26 Mar 2024 17:48:52 +0000 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [RFC PATCH v1 3/4] mm/memory: Use ptep_get_lockless_norecency() for orig_pte Content-Language: en-GB To: David Hildenbrand , Mark Rutland , Catalin Marinas , Will Deacon , Alexander Shishkin , Jiri Olsa , Ian Rogers , Adrian Hunter , Andrew Morton , Muchun Song Cc: linux-arm-kernel@lists.infradead.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20240215121756.2734131-1-ryan.roberts@arm.com> <20240215121756.2734131-4-ryan.roberts@arm.com> From: Ryan Roberts In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 19EECC0006 X-Rspam-User: X-Stat-Signature: 4xonc69smecz4dc9bj9te8pq5rym7bqq X-Rspamd-Server: rspam03 X-HE-Tag: 1711475334-727879 X-HE-Meta: U2FsdGVkX192vCC142krsVQYmgYP9AogFpP0MphGaTetZQHnF08M1zBZzRqq8v1X1DzZwMeI6veLox9FTWGkM4nO4FptbNJJJwq79qbfS0y5uJ8H6v/TqGTvOBLdhCH0OMWTEOruJ1O+i28WHd4Yfy0+pEqrdLq1nWyMlqOBZKpZs9FO9a1zv/YgwYHIfv9mT2E3kQhgZ+XSXGU23yR6RvM/LLHYC3hqOJaMTRgeXhGxiVTSv+jtl172pgYZsqlsUX0xMIJpU8NvchHlLlynWgwhLwuNtRNyi7D28UfQ6sCqYIml+cnLUSTC/PrEEyArPVY3fD8ILwTW/wwMA9eD1xiXmCnybCzOi5ADo/x3Lg1T5RTqr+fGGhR4l6/JAY1sFXMMl3dn3mlAs5K+WKMPn9GyectRU5ich5ZMkmHYUBdWQLIUB1O4ukt+hEY/VTrhX5tfQIzgfypgiuFosN5n2dg0EqSIx/Ovoljb9ho30aFQ7xHPKkUInqHHVJ3bkfqQIzSo39CW33HjYhi9gfJnG04qItZ5s8Yyuz5FngHFZwCsxBzzPwSRyo/8TtsN1vP40yxCYny9utpemddmtcMwywM4u/8W4m2wRbDV9XffqvldhGknC0muSwuOgAXYKHHbCs+1RgNhloRkaHL5YvXgAS750ZhiKuo/60cIo85DwCrWIb7wMPps3ANdmNU0cXYivB0OwBnXmhLa9KFBDCjuGT7jDgQVFL+rNVd2EyYtlz7lg9FfdSYfc2NHBoKEGbMIQcjQzNs1WnqokEvEUSSFiLn/iWEC49VDMkdLvHpN2aLtsKcPx0e/hP6G3UnnOz9WWNKXBoiuKCNZbQpe0oRKjYDZih/phQo6kb1Vmic8qOvWom2ff6U7PyFSI/dyVRIDy4s+iJ2LImp9ACehBr5LM9Vkrk4ZKG4ve1OTlJnCxqFLgDRE+bBc8jN4UgJ6UKXm2FYMswXZDE6PeOjgOmV u+Ih0JrJ K0S6raRZpPrK+f9iS6zyeRybBqt/4qejj8mQaWX4rlYZ/kzQujYXAHCbjd+zhi1D+zXQdRvaJEAHNzmS778yBe/a7gUGdOT2a9SGrRnLZbHCUQOzdFwFlWtPS7ZhBfYMM05yx0QUQjcUiL4lH8VoDpCvpu3bp/gi8u82adKk4C8vpiUNxsKBeu/yKCfPVyMTirfqFQiUt0ShSD6o8ULpiA0SszvBGObVGcG+z X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 26/03/2024 17:38, David Hildenbrand wrote: > On 26.03.24 18:27, Ryan Roberts wrote: >> On 26/03/2024 17:02, David Hildenbrand wrote: >>> On 15.02.24 13:17, Ryan Roberts wrote: >>>> Let's convert handle_pte_fault()'s use of ptep_get_lockless() to >>>> ptep_get_lockless_norecency() to save orig_pte. >>>> >>>> There are a number of places that follow this model: >>>> >>>>       orig_pte = ptep_get_lockless(ptep) >>>>       ... >>>>       >>>>       if (!pte_same(orig_pte, ptep_get(ptep))) >>>>               // RACE! >>>>       ... >>>>       >>>> >>>> So we need to be careful to convert all of those to use >>>> pte_same_norecency() so that the access and dirty bits are excluded from >>>> the comparison. >>>> >>>> Additionally there are a couple of places that genuinely rely on the >>>> access and dirty bits of orig_pte, but with some careful refactoring, we >>>> can use ptep_get() once we are holding the lock to achieve equivalent >>>> logic. >>> >>> We really should document that changed behavior somewhere where it can be easily >>> found: that orig_pte might have incomplete/stale accessed/dirty information. >> >> I could add it to the orig_pte definition in the `struct vm_fault`? >> >>> >>> >>>> @@ -5343,7 +5356,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) >>>>                             vmf->address, &vmf->ptl); >>>>            if (unlikely(!vmf->pte)) >>>>                return 0; >>>> -        vmf->orig_pte = ptep_get_lockless(vmf->pte); >>>> +        vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte); >>>>            vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID; >>>> >>>>            if (pte_none(vmf->orig_pte)) { >>>> @@ -5363,7 +5376,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) >>>> >>>>        spin_lock(vmf->ptl); >>>>        entry = vmf->orig_pte; >>>> -    if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) { >>>> +    if (unlikely(!pte_same_norecency(ptep_get(vmf->pte), entry))) { >>>>            update_mmu_tlb(vmf->vma, vmf->address, vmf->pte); >>>>            goto unlock; >>> >>> I was wondering about the following: >>> >>> Assume the PTE is not dirty. >>> >>> Thread 1 does >> >> Sorry not sure what threads have to do with this? How is the vmf shared between >> threads? What have I misunderstood... > > Assume we have a HW that does not have HW-managed access/dirty bits. One that > ends up using generic ptep_set_access_flags(). Access/dirty bits are always > updated under PT lock. > > Then, imagine two threads. One is the the fault path here. another thread > performs some other magic that sets the PTE dirty under PTL. > >> >>> >>> vmf->orig_pte = ptep_get_lockless_norecency(vmf->pte) >>> /* not dirty */ >>> >>> /* Now, thread 2 ends up setting the PTE dirty under PT lock. */ Ahh, this comment about thread 2 is not referring to the code immediately below it. It all makes much more sense now. :) >>> >>> spin_lock(vmf->ptl); >>> entry = vmf->orig_pte; >>> if (unlikely(!pte_same(ptep_get(vmf->pte), entry))) { >>>      ... >>> } >>> ... >>> entry = pte_mkyoung(entry); >> >> Do you mean pte_mkdirty() here? You're talking about dirty everywhere else. > > No, that is just thread 1 seeing "oh, nothing to do" and then goes ahead and > unconditionally does that in handle_pte_fault(). > >> >>> if (ptep_set_access_flags(vmf->vma, ...) >>> ... >>> pte_unmap_unlock(vmf->pte, vmf->ptl); >>> >>> >>> Generic ptep_set_access_flags() will do another pte_same() check and realize >>> "hey, there was a change!" let's update the PTE! >>> >>> set_pte_at(vma->vm_mm, address, ptep, entry); >> >> This is called from the generic ptep_set_access_flags() in your example, right? >> > > Yes. > >>> >>> would overwrite the dirty bit set by thread 2. >> >> I'm not really sure what you are getting at... Is your concern that there is a >> race where the page could become dirty in the meantime and it now gets lost? I >> think that's why arm64 overrides ptep_set_access_flags(); since the hw can >> update access/dirty we have to deal with the races. > > My concern is that your patch can in subtle ways lead to use losing PTE dirty > bits on architectures that don't have the HW-managed dirty bit. They do exist ;) But I think the example you give can already happen today? Thread 1 reads orig_pte = ptep_get_lockless(). So that's already racy, if thread 2 is going to set dirty just after the get, then thread 1 is going to set the PTE back to (a modified version of) orig_pte. Isn't it already broken? > > Arm64 should be fine in that regard. > There is plenty of arm64 HW that doesn't do HW access/dirty update. But our ptep_set_access_flags() can always deal with a racing update, even if that update originates from SW. Why do I have the feeling you're about to explain (very patiently) exactly why I'm wrong?... :)