From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.2 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE45AC433DB for ; Tue, 29 Dec 2020 02:09:42 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2D5D520771 for ; Tue, 29 Dec 2020 02:09:41 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2D5D520771 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 38C9C6B00BC; Mon, 28 Dec 2020 21:09:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 33D606B00BD; Mon, 28 Dec 2020 21:09:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1DD816B00BE; Mon, 28 Dec 2020 21:09:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id 06BEC6B00BC for ; Mon, 28 Dec 2020 21:09:41 -0500 (EST) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id BCE3A8249980 for ; Tue, 29 Dec 2020 02:09:40 +0000 (UTC) X-FDA: 77644688520.05.nerve28_3607f0527498 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id A5B0B180206C6 for ; Tue, 29 Dec 2020 02:09:40 +0000 (UTC) X-HE-Tag: nerve28_3607f0527498 X-Filterd-Recvd-Size: 7067 Received: from mail-io1-f44.google.com (mail-io1-f44.google.com [209.85.166.44]) by imf32.hostedemail.com (Postfix) with ESMTP for ; Tue, 29 Dec 2020 02:09:40 +0000 (UTC) Received: by mail-io1-f44.google.com with SMTP id t8so10967642iov.8 for ; Mon, 28 Dec 2020 18:09:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ZyqPO1ygk+Kgh5PBjPG9OOPdRGdHfoYxQ9cjVCRFiaw=; b=r1WWTHu1UCN4RFJ6oKxSYwBkNN34X13VM8GUUWYqqCkqGWl44iTOSPYCetZvXltVXx SyV+t8ugRgYImifL5lFzDIw0PAXuIuU0cic0PVhb37Uh3askNIPp9R38EI+o1o0MRLcf +5sV5B4gjeiY61HnO3PPFeHcEyAFYMj2EEX23VmhFbqteQcX6psrpu7GIBiOmDZM7UGQ DRgzhotCURGdMBJUwFXKmcuLsCN3LQTLohHP6oCjpg4xdRDexCDk7FNxkvcJzugosElG p0g50Biv+SOeVSjXSCBrcC+6fILx7kBIC0Cu6mNmwz9zJLKlfdlv5QBgcUkoInSDWvox aNxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ZyqPO1ygk+Kgh5PBjPG9OOPdRGdHfoYxQ9cjVCRFiaw=; b=kJjfSU8o5MEMlNyhiNUa3//UpeTNEKFbbVainTFfVbUqgyK50ybfsOLLrGVnbQ36AQ 28KVqH3Bmm68JfqpepT8iQQ0VMfybznQ+VcuyukfC1RR2piX1rgicM8xdFWQQ5txPiCo RGgQ5RaatHXV2DEXW10NazsEChewkpVE4+rpqXtnjKghOlS/TfL7q7FF1N3SN8MD34al R9MFEjph62VQc2NXFWDi+4mQJfopy1J1Ptp5zhCm4E8KYnL5RimMRJTvVyUeluEB1gMQ SxpqdTgttTyUPoW0UskNqQsAqkxoXHFdKyNMn0BH/2NK/ZAPsPp0C0m09nmCi+ZE5Pr1 wTbQ== X-Gm-Message-State: AOAM533IcHQDjM2WSjtzZe4SxBYfOKvzcg/vBhtO0JM1VRjfYJZHGZpj CDgCYxcUfNlkMjni4xh8HB1EsTE4GwVgRr4kiqhfTA== X-Google-Smtp-Source: ABdhPJy4JgsPFpUXkMdOtsdyEdvuo0sXQJ/Mmmann3btsIGzSSV+TcBY90ise9OUJcQWOKhYaItfmYrn2TR6RnccNKI= X-Received: by 2002:a02:a304:: with SMTP id q4mr40348971jai.97.1609207779482; Mon, 28 Dec 2020 18:09:39 -0800 (PST) MIME-Version: 1.0 References: <20201212053152.3783250-1-pcc@google.com> <20201223143411.e889bcad32e5a1c0252c745c@linux-foundation.org> In-Reply-To: <20201223143411.e889bcad32e5a1c0252c745c@linux-foundation.org> From: Peter Collingbourne Date: Mon, 28 Dec 2020 18:09:28 -0800 Message-ID: Subject: Re: [PATCH] mm: improve mprotect(R|W) efficiency on pages referenced once To: Andrew Morton Cc: Kostya Kortchinsky , linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Dec 23, 2020 at 2:34 PM Andrew Morton wrote: > > On Fri, 11 Dec 2020 21:31:52 -0800 Peter Collingbourne wrote: > > > In the Scudo memory allocator [1] we would like to be able to > > detect use-after-free vulnerabilities involving large allocations > > by issuing mprotect(PROT_NONE) on the memory region used for the > > allocation when it is deallocated. Later on, after the memory > > region has been "quarantined" for a sufficient period of time we > > would like to be able to use it for another allocation by issuing > > mprotect(PROT_READ|PROT_WRITE). > > > > Before this patch, after removing the write protection, any writes > > to the memory region would result in page faults and entering > > the copy-on-write code path, even in the usual case where the > > pages are only referenced by a single PTE, harming performance > > unnecessarily. Make it so that any pages in anonymous mappings that > > are only referenced by a single PTE are immediately made writable > > during the mprotect so that we can avoid the page faults. > > > > I worry that some other application out there does a similar thing, but > only expects to very sparsely write to the area. It will see a big increase > in mprotect() latency. > > Would it be better to implement this as a separate operation, rather > than unconditionally tying it into mprotect()? Say, a new madvise() > operation? So the case that you're concerned about would be highlighted by this program, correct? #include #include enum { kSize = 131072 }; int main(int argc, char **argv) { char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); memset(addr, 1, kSize); for (int i = 0; i != 100000; ++i) { mprotect((void *)addr, kSize, PROT_NONE); mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE); } } I measured the before/after real time execution time of this program (median of 100 runs) on the DragonBoard and the results were: Before: 0.325928s After: 0.365493s So there's an increase of around 12%. It doesn't seem very large to me when compared to the 400-500% improvement that we get in the case where every page is touched. I also tried this program to measure the impact in the case where a single page is touched after the mprotect: #include #include enum { kSize = 131072 }; int main(int argc, char **argv) { char *addr = (char *)mmap(0, kSize, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); memset(addr, 1, kSize); for (int i = 0; i != 100000; ++i) { memset(addr + (i * 4096) % kSize, i, 4096); mprotect((void *)addr, kSize, PROT_NONE); mprotect((void *)addr, kSize, PROT_READ | PROT_WRITE); } } With this program the numbers were: Before: 0.441516s After: 0.380251s So it seems that even with a single page fault the new approach is faster. I saw similar results if I adjusted the programs to use a larger mapping size. With kSize = 1048576 I get these numbers for the first program: Before: 1.563078s After: 1.607476s i.e. around 3%. And these for the second: Before: 1.684663s After: 1.683272s i.e. about the same. What I think we may conclude from these results is that for smaller mappings the advantage of the previous approach, although measurable, is wiped out by a single page fault. I think we may expect that there should be at least one access resulting in a page fault (under the previous approach) after making the pages writable, since the program presumably made the pages writable for a reason. For larger mappings we may guesstimate that the new approach wins if the density of future page faults is > 0.4%. But for the mappings that are large enough for density to matter (not just the absolute number of page faults) it doesn't seem like the increase in mprotect latency would be very large relative to the total mprotect execution time. So my inclination is that I don't really see a strong need for it to be toggleable, but if we do, the behavior implemented by this patch should be the default. Perhaps something like madvise(MADV_COLD) could be used to get the previous behavior but I wouldn't implement that straight away. Peter