From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 83B97C43387 for ; Fri, 11 Jan 2019 16:26:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 32D012183F for ; Fri, 11 Jan 2019 16:26:37 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=linux-foundation.org header.i=@linux-foundation.org header.b="A7k/Kaiz" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 32D012183F Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C46528E0003; Fri, 11 Jan 2019 11:26:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BF6798E0001; Fri, 11 Jan 2019 11:26:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE56E8E0003; Fri, 11 Jan 2019 11:26:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from mail-lj1-f199.google.com (mail-lj1-f199.google.com [209.85.208.199]) by kanga.kvack.org (Postfix) with ESMTP id 42C068E0001 for ; Fri, 11 Jan 2019 11:26:36 -0500 (EST) Received: by mail-lj1-f199.google.com with SMTP id t22-v6so3881398lji.14 for ; Fri, 11 Jan 2019 08:26:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:mime-version:references :in-reply-to:from:date:message-id:subject:to:cc; bh=vbenFxntMJJI3im+gr+rLb3/45Dw0c7ONogM+YwDZXY=; b=K4e/7aaGt4KsSeeI6lJiqhkOnQyGv3SegZ029ffcLjGvokDdlVOSAfSLgGlkF4z/bt OQXNHMtAibYH0AO593UHfXdFvF2xOF/e7c8w728zkeYiRDrFzLFj3hH+drerewe2fWOC ZHxARUmzp80MBczIKiBHoMqiElQ5h14xrbGzXEogN2cQFF2sQZEjhIYkRxHKFoFxA3Ew 4wP+PplzLZ5Zt2jdokSUh+ShJE8BmrFV0RrRkmsoQCmvs7DUSbVtmxQ3mhO5YJLNRQxc GZRJxwvmqm8Dicb34mHWuTbSL9EC+51xsyjDPJSB5x2mKDGj3h7YH2ySw95FiOyvTNvX as4g== X-Gm-Message-State: AJcUukdv5zbh87AZqLf5dksYmbNimrVTSSqv3nc6PTE9w7U4y3kOM4FT g1AH2Gp3868Clxg8ugm3oArHuC2+Gk39qILTdKmXmtkqOocDSzmPqzFUO9kSBC4UMbgbSPh4sma 0//1L7SaM3W4n/Vr6UpB7WqXh/lGLFOcbavrqyE8VZruOj7RBZqqaCauH4to3ga5HlDcSq3CuB2 lBd3yBhZsEwUjHjf1qWrhuXNNVghDNhm5B2hr1cg8jsSOv4wlzq/UR55zKzN8j9EFiDKubZngv7 Q8/WFwDXuSFBoAYuguq2Ha1BwUPrGhPvP1Y7kJflmnITjYgz7X8FqMEGVUHuxCl4F2IMUrKo0YY rznLFfBVTFNqGYsdiNv4bHnIFliG5OzBpMrhfK2yn+L16KQZ7LL11PAyRzDaefu5A3r8e/dZ3KL R X-Received: by 2002:a2e:5b93:: with SMTP id m19-v6mr7671314lje.115.1547223995622; Fri, 11 Jan 2019 08:26:35 -0800 (PST) X-Received: by 2002:a2e:5b93:: with SMTP id m19-v6mr7671260lje.115.1547223994246; Fri, 11 Jan 2019 08:26:34 -0800 (PST) ARC-Seal: i=1; a=rsa-sha256; t=1547223994; cv=none; d=google.com; s=arc-20160816; b=czf+goJKFHV3ugOSK9XLAmbWDfwPjAMeQujfEOFqIh/7BK7rR/W5f9YQDB/N6jSf4J 2BKae6hf1C0UaDjU0IhYZwh5xIcDsVq99L1Ms3rR0Km1eS4SAiPzj6wxHOTeBLgUeeU9 lvbLyzTiV308iB58doFviITIZlIcfSTJBsbi4aAxK2sah7K1HHuSqv34lL5QLPWYxd8f rJ2lQ2g1PrH4zHboLIrFf33UFBNqA3guUpJN+VKBGe9I7Q5x74qlGlTihY4Zd6h7lJX6 qF4+L1CmgWpkkp9yTTgKUEcTHsgDu63+uLCdSomkKTqx9cFa4JESBQORgk5babIa1iHC 0Ypg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:dkim-signature; bh=vbenFxntMJJI3im+gr+rLb3/45Dw0c7ONogM+YwDZXY=; b=ZqM5phr3e2DYOW2nOF2ilTI5/VePl9PsGvYf8wfXFbMN7xTwRtj+i8mQcqaWP1XFAK Y55bPlO4sPCsTkK0Vb0eIQbWC0aPpIduE5WzgSMv2T7XgZGCXTPuH00Xzl8Isg8IQbU7 g5QQjJlhp536IQhJOKM1cNF+F+zZoDLXAt76wwHVp1q5cJtRzvZMBLND8q0IJuMmxLdw 06UdFDeko+dFWYwX9ghY+ZQOE+hh27vHatauxiqDR+V1HjSWZHQF1LPoyINX3x8fEwHE hbST4UQ3pd6QLt1brubxQcpn/D8Cj9DICBuMNbktslpcGFKNOfpaExxc5lAaMek0msvn QhBA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b="A7k/Kaiz"; spf=pass (google.com: domain of torvalds@linuxfoundation.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id m10-v6sor44640343lje.8.2019.01.11.08.26.34 for (Google Transport Security); Fri, 11 Jan 2019 08:26:34 -0800 (PST) Received-SPF: pass (google.com: domain of torvalds@linuxfoundation.org designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@linux-foundation.org header.s=google header.b="A7k/Kaiz"; spf=pass (google.com: domain of torvalds@linuxfoundation.org designates 209.85.220.65 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vbenFxntMJJI3im+gr+rLb3/45Dw0c7ONogM+YwDZXY=; b=A7k/KaizozfTI3rBrlvY3irlDJ5TfkFn1JQLsJSc13ang98zrx0HocnODrct6I3zRv sXlVINnsBzbxsJRy2F0R8MOI6FDB9D8UqJgkszbOp9fTU/2qZpfc5E3qRvfiuagKxYTH khvqSsnfI7FaBQLpIXqdntBzaFT9jEEhlEHik= X-Google-Smtp-Source: ALg8bN5swoOud4RsC/XQhe3NDmYknpJeZ+My+LMszx5AjaRioNjRyaTO9YghKHMJq3BvJLya/abzRQ== X-Received: by 2002:a2e:5356:: with SMTP id t22-v6mr8521951ljd.26.1547223993015; Fri, 11 Jan 2019 08:26:33 -0800 (PST) Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com. [209.85.208.170]) by smtp.gmail.com with ESMTPSA id m4-v6sm15560999ljb.58.2019.01.11.08.26.31 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Jan 2019 08:26:31 -0800 (PST) Received: by mail-lj1-f170.google.com with SMTP id v1-v6so13467276ljd.0 for ; Fri, 11 Jan 2019 08:26:31 -0800 (PST) X-Received: by 2002:a2e:310a:: with SMTP id x10-v6mr9708846ljx.6.1547223990982; Fri, 11 Jan 2019 08:26:30 -0800 (PST) MIME-Version: 1.0 References: <20190110004424.GH27534@dastard> <20190110070355.GJ27534@dastard> <20190110122442.GA21216@nautica> <20190111020340.GM27534@dastard> <20190111040434.GN27534@dastard> <20190111073606.GP27534@dastard> In-Reply-To: <20190111073606.GP27534@dastard> From: Linus Torvalds Date: Fri, 11 Jan 2019 08:26:14 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH] mm/mincore: allow for making sys_mincore() privileged To: Dave Chinner Cc: Dominique Martinet , Jiri Kosina , Matthew Wilcox , Jann Horn , Andrew Morton , Greg KH , Peter Zijlstra , Michal Hocko , Linux-MM , kernel list , Linux API Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Message-ID: <20190111162614.Sx6jlVG-VNZBAsNXe31UURRdaX_T1IRYkRhwwg7sTuw@z> On Thu, Jan 10, 2019 at 11:36 PM Dave Chinner wrote: > > > It's only that single page that *matters*. That's the page that the > > probe reveals the status of - but it's also the page that the probe > > then *changes* the status of. > > It changes the state of it /after/ we've already got the information > we need from it. It's not up to date, it has to come from disk, we > return EAGAIN, which means it was not in the cache. Oh, I see the confusion. Yes, you get the information about whether something was in the cache or not, so the side channel does exist to some degree. But it's actually hugely reduced for a rather important reason: the _primary_ reason for needing to know whether some page is in the cache or not is not actually to see if it was ever accessed - it's to see that the cache has been scrubbed (and to _guide_ the scrubbing), and *when* it was accessed. Think of it this way: the buffer cache residency is actually a horribly bad signal on its own mainly because you generally have a very high hit-rate. In most normal non-streaming situations with sufficient amounts of memory you have pretty much everything cached. So in order to use it as a signal, first you have to first scrub the cache (because if the page was already there, there's no signal at all), and then for the signal to be as useful as possible, you're also going to want to try to get out more than one bit of information: you are going to try to see the patterns and the timings of how it gets filled. And that's actually quite painful. You don't know the initial cache state, and you're not (in general) controlling the machine entirely, because there's also that actual other entity that you're trying to attack and see what it does. So what you want to do is basically to first make sure the cache is scrubbed (only for the pages you're interested in!), then trigger whatever behavior you are looking for, and then look how that affected the cache. In other words, you want *multiple* residency status check - first to see what the cache state is (because you're going to want that for scrubbing), then to see that "yes, it's gone" when doing the scrubbing, and then to see the *pattern* and timings of how things are brought in. And then you're likely to want to do this over and over again, so that you can get real data out of the signal. This is why something that doesn't perturb what you measure is really important. If the act of measurement brings the page in, then you can't use it for that "did I successfully scrub it" phase at all, and you can't use it for measurement but once, so your view into patterns and timings is going to be *much* worse. And notice that this is true even if the act of measurement only affects the *one* page you're measuring. Sure, any additional noise around it would likely be annoying too, but it's not really necessary to make the attack much harder to carry out. In fact, it's almost irrelevant, since the signal you're trying to *see* is going to be affected by prefetching etc too, so the patterns and timings you need to look at are in bigger chunks than the readahead thing. So yes, you as an attacker can remove the prefetching from *your* load, but you can't remove it from the target load anyway, so you'll just have to live with it. Can you brute-force scrubbing? Yes. For something like an L1 cache, that's easy (well, QoS domains make it harder). For something like a disk cache, it's much harder, and makes any attempt to read out state a lot slower. The paper that started this all uses mincore() not just to see "is the page now scrubbed", but also to guide the scrubbing itself (working set estimation etc). And note that in many ways, the *scrubbing* is really the harder part. Populating the cache is really easy: just read the data you want to populate. So if you are looking for a particular signal, say "did this error case trigger so that it faulted in *that* piece of information", you'd want to scrub the target, populate everything else, and then try to measure at "did I trigger that target". Except you wouldn't want to do it one page at a time but see as much pattern of "they were touched in this order" as you can, and you'd like to get timing information of how the pages you are interested were populated too. And you'd generally do this over and over and over again because you're trying to read out some signal. Notice what the expensive operation was? It's the scrubbing.The "did the target do IO" you might actually even see other ways for the trivial cases, like even just look at iostat: just pre-populate everything but the part you care about, then try to trigger whatever you're searching for, and see if it caused IO or not. So it's a bit like a chalkboard: in order to read out the result, you need to erase it first, and doing that blindly is nasty. And you want to look at timings, which is also really nasty if every time you look, you smudge the very place you looked at. It makes it hard to see what somebody else is writing on the board if you're always overwriting what you just looked at. Did you get some new information? If not, now you have to go back and do that scrubbing again, and you'll likely be missing what *else* the person wrote. Ans as always: there is no "black and white". There is no "absolute security", and similarly, there is no "absolute leak proof". It's all about making it inconvenient enough that it's not really practical. Linus