From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5032FC433EF for ; Mon, 4 Apr 2022 13:05:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A62B26B0072; Mon, 4 Apr 2022 09:05:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A0F0A6B0073; Mon, 4 Apr 2022 09:05:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 887F96B0074; Mon, 4 Apr 2022 09:05:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0231.hostedemail.com [216.40.44.231]) by kanga.kvack.org (Postfix) with ESMTP id 7A1C86B0072 for ; Mon, 4 Apr 2022 09:05:03 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 25F93824C421 for ; Mon, 4 Apr 2022 13:04:53 +0000 (UTC) X-FDA: 79319216466.24.941B1E2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 78C181C0067 for ; Mon, 4 Apr 2022 13:04:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649077482; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bKLcnrXpl+KcZQr63lS1etgImM+Jisa8A4Tv6u94WeY=; b=H1mxBzrw6BVcv6wjJ9EevRgKuM578inbX4sPFhZQCjbQ2QQkdjibdTvM0NjwLaTOYT4W47 49P9H6rNSPKgCHnquZjJQ9vzbYQyKKeIRBQLhp3qUJutG8M7w7H0E7NPpDOHMVGJSYDUJX A+AYbGjEfEYk+1LI6pqW3bT/jL07xIo= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-65-utHd2DzpMLqYFI91IkGG9g-1; Mon, 04 Apr 2022 09:04:31 -0400 X-MC-Unique: utHd2DzpMLqYFI91IkGG9g-1 Received: by mail-wm1-f69.google.com with SMTP id c62-20020a1c3541000000b003815245c642so6905255wma.6 for ; Mon, 04 Apr 2022 06:04:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent :content-language:to:cc:references:from:organization:subject :in-reply-to:content-transfer-encoding; bh=bKLcnrXpl+KcZQr63lS1etgImM+Jisa8A4Tv6u94WeY=; b=XVfgR+IXi7uCXvQ7DOmy32kOPFAu5DWQP47HdtSswQr7m9yXV3R+vZtM8hk3qW8kWl ubC5B7JTavI8rmqQTaJYqVlkMRZThOzAItweMx8RyQML+bpXjuy+pBEYOtOAAHcexCYK X5TU0OzcaD7+IGNtSyZpMOpSu3U06Zynp+o+bKOEFz5XWbbygTkjSBpcMWjwOEZZJuKy WMxXe4kR1q8A09YGSPzQKD29WOZ036WKULzJYcxdEw43uJqrD8n9+R06qpKX+w9jCU3h oYmmDVCiaRNWp92oY/eusctXxdlmWkjVmGqpW9Cd4jk5paP0D8MxXvDxQJZut6Vq4iQ1 JCZg== X-Gm-Message-State: AOAM532p6SZlKdZt6Xam91MlLntskWJt8hC+LwB6BcZ0laDrqEaosesT BWzLZAAcoKwBiQrnaWoJ5s2Ck8u9mPONxDk0KB9JWmCQNZT9flvDbbUNCHKr9j3MZkzW7vuMMK7 eeUqExmoZnts= X-Received: by 2002:a5d:6b0f:0:b0:1e7:9432:ee8c with SMTP id v15-20020a5d6b0f000000b001e79432ee8cmr16587420wrw.216.1649077469855; Mon, 04 Apr 2022 06:04:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwxtY/OPGax1qO/sx2WuexL7oOLf1eJuj6oF4bfJe58bEmh/NxbXjXnOtYCPACR/NKBsTM8ZA== X-Received: by 2002:a5d:6b0f:0:b0:1e7:9432:ee8c with SMTP id v15-20020a5d6b0f000000b001e79432ee8cmr16587382wrw.216.1649077469430; Mon, 04 Apr 2022 06:04:29 -0700 (PDT) Received: from ?IPV6:2003:cb:c704:4100:c220:ede7:17d4:6ff4? (p200300cbc7044100c220ede717d46ff4.dip0.t-ipconnect.de. [2003:cb:c704:4100:c220:ede7:17d4:6ff4]) by smtp.gmail.com with ESMTPSA id u7-20020a5d6da7000000b00203d9d1875bsm10501373wrs.73.2022.04.04.06.04.28 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 04 Apr 2022 06:04:28 -0700 (PDT) Message-ID: <99cf9e14-7608-8e72-0c8e-0dd9b0047319@redhat.com> Date: Mon, 4 Apr 2022 15:04:28 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.2 To: Nadav Amit Cc: LKML , linux-mm , Linus Torvalds , Andrew Morton , Dave Hansen , Andrea Arcangeli , Peter Xu , Yang Shi , Hugh Dickins , Mel Gorman , "Edgecombe, Rick P" References: <20220401101334.68859-1-david@redhat.com> <8A6AF878-D5D7-4D88-A736-0FEF71439D44@gmail.com> From: David Hildenbrand Organization: Red Hat Subject: Re: [PATCH v1 mmotm] mm/mprotect: try avoiding write faults for exclusive anonynmous pages when changing protection In-Reply-To: <8A6AF878-D5D7-4D88-A736-0FEF71439D44@gmail.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 78C181C0067 X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=H1mxBzrw; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf18.hostedemail.com: domain of david@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=david@redhat.com X-Stat-Signature: 46czox33xoc6yqmo63iww3xhy9tpe4ut X-HE-Tag: 1649077482-917096 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 01.04.22 21:15, Nadav Amit wrote: > [ +Rick ] >=20 >> On Apr 1, 2022, at 3:13 AM, David Hildenbrand wrote= : >> >> Similar to our MM_CP_DIRTY_ACCT handling for shared, writable mappings= , we >> can try mapping anonymous pages writable if they are exclusive, >> the PTE is already dirty, and no special handling applies. Mapping the >> PTE writable is essentially the same thing the write fault handler wou= ld do >> in this case. >=20 > In general I am all supportive for such a change. >=20 > I do have some mostly-minor concerns. Hi Nadav, thanks a lot for your review! >=20 >> >> +static inline bool can_change_pte_writable(struct vm_area_struct *vma= , >> + unsigned long addr, pte_t pte, >> + unsigned long cp_flags) >> +{ >> + struct page *page; >> + >> + if ((vma->vm_flags & VM_SHARED) && !(cp_flags & MM_CP_DIRTY_ACCT)) >> + /* >> + * MM_CP_DIRTY_ACCT is only expressive for shared mappings; >> + * without MM_CP_DIRTY_ACCT, there is nothing to do. >> + */ >> + return false; >> + >> + if (!(vma->vm_flags & VM_WRITE)) >> + return false; >> + >> + if (pte_write(pte) || pte_protnone(pte) || !pte_dirty(pte)) >> + return false; >=20 > If pte_write() is already try then return false? I understand you want > to do so because the page is already writable, but it is confusing. I thought about just doing outside of the function if ((vma->vm_flags & VM_WRITE) && !pte_write(pte) && can_change_pte_writable()... =09 I refrained from doing so because the sequence of checks might be sub-optimal. But most probably we don't really care about that and it might make the code easier to grasp. Would that make it clearer? >=20 > In addition, I am not sure about the pte_dirty() check is really robust= . > I mean I think it is ok, but is there any issue with shadow-stack?=20 Judging that it's already used that way for VMAs with dirty tracking, I assume it's ok. Without checking that the PTE is dirty, we'd have to do a= : pte_mkwrite(pte_mkwrite(ptent)); Which would set the pte and consequently the page dirty, although there might not even be a write access. That's what we want to avoid here. >=20 > And this also assumes the kernel does not clear the dirty bit without > clearing the present, as otherwise the note in Intel SDM section 4.8 > ("Accessed and Dirty Flags=E2=80=9D) will be relevant and dirty bit mig= ht be > set unnecessarily. I think it is ok. Yeah, I think so as well. >=20 >> + >> + /* Do we need write faults for softdirty tracking? */ >> + if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !pte_soft_dirty(pte) && >> + (vma->vm_flags & VM_SOFTDIRTY)) >=20 > If !IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) then VM_SOFTDIRTY =3D=3D 0. So I = do not > think the IS_ENABLED() is necessary (unless you think it is clearer thi= s > way). Right, we can just do if ((vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte)) and it should get fully optimized out. Thanks! >=20 >> + return false; >> + >> + /* Do we need write faults for uffd-wp tracking? */ >> + if (userfaultfd_pte_wp(vma, pte)) >> + return false; >> + >> + if (!(vma->vm_flags & VM_SHARED)) { >> + /* >> + * We can only special-case on exclusive anonymous pages, >> + * because we know that our write-fault handler similarly would >> + * map them writable without any additional checks while holding >> + * the PT lock. >> + */ >> + page =3D vm_normal_page(vma, addr, pte); >=20 > I guess we cannot call vm_normal_page() twice, once for prot_numa and o= nce > here, in practice... I guess we could, but it doesn't necessarily make the code easier to read :) And we want to skip protnone either way. >=20 >> + if (!page || !PageAnon(page) || !PageAnonExclusive(page)) >> + return false; >> + } >> + >> + return true; >> +} >=20 > Note that there is a small downside to all of that. Assume you mprotect= () > a single page from RO to RW and you have many threads. >=20 > With my pending patch you would avoid the TLB shootdown (and get a PF). > With this patch you would get a TLB shootdown and save the PF. IOW, I > think it is worthy to skip the shootdown as well in such a case and > instead flush the TLB on spurious page-faults. But I guess that=E2=80=99= s for > another patch. Just so I understand correctly: your optimization avoids the flush when effectively, nothing changed (R/O -> R/O). And the optimization for this case here would be, to avoid the TLB flush when similarly not required (R/O -> R/W). Correct? --=20 Thanks, David / dhildenb