From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3DEA0EE4993 for ; Sun, 20 Aug 2023 12:46:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F2ED8E0003; Sun, 20 Aug 2023 08:46:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A2768D0002; Sun, 20 Aug 2023 08:46:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 569CF8E0003; Sun, 20 Aug 2023 08:46:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 4311B8D0002 for ; Sun, 20 Aug 2023 08:46:30 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0EB92C07AD for ; Sun, 20 Aug 2023 12:46:30 +0000 (UTC) X-FDA: 81144456540.16.6C7C140 Received: from mail-oa1-f43.google.com (mail-oa1-f43.google.com [209.85.160.43]) by imf16.hostedemail.com (Postfix) with ESMTP id 45F5E180005 for ; Sun, 20 Aug 2023 12:46:28 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=f07wCTHQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.160.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692535588; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Rf/mokhlkVqnyRN8z9yUgvzp3OgmnpyJfx27efmYiTE=; b=EEJ2wi7aTaQ57UibjO3aXDv7pzn5qKuqHDhKraEBeCjnQbKmpoDurCgif2Ilvd2kq0bYIH xvmv0UpGb+a8EmfB65OeJOhZAJRNdggN8Z3vF52wPEckpJlVr64zoW4hdIZeipNnxIar7P JOezh/aAVkKLw1FWbISjNAcGkhAX0Uo= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=gmail.com header.s=20221208 header.b=f07wCTHQ; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf16.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.160.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692535588; a=rsa-sha256; cv=none; b=gXurJvcD/x+vTlHv5gui2omzXRS+MTpn0s6FrM7c++N4lu5936xdEM2Da0APY1PrkdydyD gUqlkCMcx0brSOxFyzQpzK9zyjtLrj4kahJhT6WspAH15qCZKHTqDRllYPKO+b/rybYHB1 wa+9y26VifRvgu9Sjftuw4rPup3Ke3g= Received: by mail-oa1-f43.google.com with SMTP id 586e51a60fabf-1c4de3b9072so1578464fac.3 for ; Sun, 20 Aug 2023 05:46:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692535587; x=1693140387; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=Rf/mokhlkVqnyRN8z9yUgvzp3OgmnpyJfx27efmYiTE=; b=f07wCTHQ+dy4IKEPOQPmXY3h/FC23o6NaydedluJ/g/BTsqMPJR1Fpfj0MRlaGxaQl QfvEY3/OFUowt5MBU4rXfJ9UIhawPpovJLqjaeDyf/n4Y8YKRxODRFdQYOvx1gCgVI/o A75BJvfZvICysy/duLp/d5fWCL2LSEnyNSPSIw2buVSbXFzcymUuYL2iaMGNx+MyjhZE WEotunRHzoHD7633k8iMNqbScFHLPVG9UVX/+lQfyiC2Pg+zNVIDbTzCZIm5XETt2qSL Tfcuir6L0Gf1nKURM0/qngyQroSNIDVJgl0JoF5UWreeimRQId5yfFC7xHeS9wRy1tm4 ti6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692535587; x=1693140387; h=cc:to:subject:message-id:date:from:references:in-reply-to :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=Rf/mokhlkVqnyRN8z9yUgvzp3OgmnpyJfx27efmYiTE=; b=R5m7M9/Ipx/UUhU32luljvC56GRc4mWP9qzRV4Rxrd5VmCbITtLoAv5VnyAOstsko4 pht+lVU3JnMGf7GyBqw30sZDPq5PZdX4eO77WBb9i81YecXxa9vI3pSRMyQPu+G2hxog 4ua2pvlcZTX14t7uK9Dh/fsQdrPSuUhuEi8FxUQ4oxiER2JVAsBZ7Jojk2XgRDTb0FoB NPOTsKrIOficQJN1429H2IaLnQYBMoPl8fW/ddQESaCuilLXDbZtUXOZoWG6rOkt926C bS0QAeEb01bTkmyuHG7hMQCpZQJibdcpMobO4OqEaZTvaSwg7bi/nAehHzDYZn1S4sUO aLTQ== X-Gm-Message-State: AOJu0YzFxqGYMKQzzvTsUXh9LlSDOQl7oqf+iK3TecE/4SDsPYi6TPfr 01o24XlvNOxYyuG7jynMFWsEurJISIq9mK38zlQ= X-Google-Smtp-Source: AGHT+IHSn0+Siyc3vbPr66XdN7GDgF5EwoxPCMGwId2foVng2aVg5afz4gNmRDgzgKuhXWkGH87RHxwSVnTEKghrIpM= X-Received: by 2002:a05:6870:8288:b0:1bf:e703:f727 with SMTP id q8-20020a056870828800b001bfe703f727mr5101148oae.34.1692535587363; Sun, 20 Aug 2023 05:46:27 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:ac9:79d9:0:b0:4f0:1250:dd51 with HTTP; Sun, 20 Aug 2023 05:46:26 -0700 (PDT) In-Reply-To: References: <20230820104303.2083444-1-mjguzik@gmail.com> From: Mateusz Guzik Date: Sun, 20 Aug 2023 14:46:26 +0200 Message-ID: Subject: Re: [PATCH] mm: remove unintentional voluntary preemption in get_mmap_lock_carefully To: Matthew Wilcox Cc: torvalds@linux-foundation.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Stat-Signature: jxsqtcy84579dir5oposcjcoft99yzgy X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 45F5E180005 X-HE-Tag: 1692535588-270481 X-HE-Meta: U2FsdGVkX1/0Q22m0giEmIs0zTeyXSGlMMNHndcTN9ZJ3J+mOun0MScOauOsZTBmNWhbjK4Ts/Ioy80F6/HkjluYzH7DUhqZDLrqpZ6EtDsuFXqHaqoES8FJcm7M2flh7Y8sR0i+gmSZrnmLvFuN/yrrTXcJfR0nGIxe6gu1sFITpztnYPEee1PiGUzuJASPNUXKPXk6QC9ZfXca4sYHXYsYfnamoDRVMZAZG9GEtUUUzWHKufoCiRZOxsgqk6DgBrbZvkD/LHuRxFArkGmw/34k+mWbJ2ZdctgdDengJkGrDRl0tWMuGrtnZbxV/5aAC045wY+nmJZvkHNLVBJiPGMub+H4wcZL1WpU7oq4ukQcpHbozprKR21MVvUs+DrLFG2C25vj/eKCEYW/KpWsmADnbleYW1xb/Q7tQ2rRyUm2NBjKn6MICNHia+dPDfXXspDwF/KU8OoL4U65VvfQKLTvOgzZZvvS3j1abtBaHcazJfDfh6VZ7xD0ztx7E+W+r/B3+No0/nj0VMPl2xPkRZUBjzjUa8UqemW2MX5mZ7Leht6A1dqzjiY39U7TsothrPQLM9Fx53dTbWSa8Z9SjwOgTIK+7iBXX5sfQS9259sqpax0Mby9otcQOZbGqVPwwui+z4ArwL0f2Gj2dK3NM5ED5pK/KWa0QFTmtCtyaty5qCV8AKAQZ4vvavwPFJlNKCWu+TzLQFPCXOWtQZD2AsdRdw6lZ0t8+uD5tJWST1jGmif7TWvEb/Zagnfh46YFyS02VGGd3QSUhd9XObtssOdHAmpGJcYNv8AUvP7jyvrW1Eu/UDvaEORO6rIGjgAk+lXi5tDLsg5k9MGbiXue0gan+Nb33jVrT73ofPSGAypt9lWWICL6Lbhno6PKMFoLHHgEgmeopnTbRmsXlDfWMY8hdF3eX97RBl7Qo9qJjBk7MPOHmII0Kx3ig2dUg1SEFEQbOgbdW6DpTrrX1I+ u6w7Fror CSYNYkpxKseq41oNhoUtV0KDsGBfMjxE5z94nlM1HGx4O1Q99c79/9650eIiHO/B173096MdhGGk36V+3ATtMI1iQUxTpMqr+9lQLCzEoRvBCblWS1els5ZrKSd5iHUaWO0dLop/ySGJxLnCELmWf2iSJGDNXzbDhFAzDThEsNuNuifnRpx5qmJMLYU8fOwIlLapn0px66nblIM8ONOP70Vj8r6EMgv9w7eCFbPFLvm0sh7ZR7H+n9754NaxMWICsXbZimUUWoHFurd3WVpo6031HaT79ATAojIAkpxgZIOY9dDTL81q73SWmXjcia3jy8C3DEWGrFIvloK6qD3iPVS1nQKfuWRqOr9LnDsAC6roH9FFg+UoqC+dJ95soZawJLw1sXcekY4HXxsbbvBk0JrpTvApnTOiU5j5l7XGSAICKDu+10pjPjOteRE73iC0ZIxmk9/WLp5CljaNlEmBsYVWnez2TMSg3B1s9AF58ocXyoN0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/20/23, Mateusz Guzik wrote: > On 8/20/23, Matthew Wilcox wrote: >> On Sun, Aug 20, 2023 at 12:43:03PM +0200, Mateusz Guzik wrote: >>> Should the trylock succeed (and thus blocking was avoided), the routine >>> wants to ensure blocking was still legal to do. However, the method >>> used ends up calling __cond_resched injecting a voluntary preemption >>> point with the freshly acquired lock. >>> >>> One can hack around it using __might_sleep instead of mere might_sleep, >>> but since threads keep going off CPU here, I figured it is better to >>> accomodate it. >> >> Except now we search the exception tables every time we call it. >> The now-deleted comment (c2508ec5a58d) suggests this is slow: >> > > I completely agree it a rather unfortunate side-effect. > >> - /* >> - * Kernel-mode access to the user address space should only occur >> - * on well-defined single instructions listed in the exception >> - * tables. But, an erroneous kernel fault occurring outside one >> of >> - * those areas which also holds mmap_lock might deadlock >> attempting >> - * to validate the fault against the address space. >> - * >> - * Only do the expensive exception table search when we might be >> at >> - * risk of a deadlock. This happens if we >> - * 1. Failed to acquire mmap_lock, and >> - * 2. The access did not originate in userspace. >> - */ >> >> Now, this doesn't mean we're doing it on every page fault. We skip >> all of this if we're able to handle the fault under the VMA lock, >> so the effect is probably much smaller than it was. But I'm surprised >> not to see you send any data quantifying the effect of this change! >> > > Going off CPU *after* taking the lock sounds like an obviously bad > thing to happen and as such I did not think it warrants any > measurements. > > My first patch looked like this: > diff --git a/mm/memory.c b/mm/memory.c > index 1ec1ef3418bf..8662fd69eae8 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -5259,7 +5259,9 @@ static inline bool > get_mmap_lock_carefully(struct mm_struct *mm, struct pt_regs > { > /* Even if this succeeds, make it clear we *might* have slept */ > if (likely(mmap_read_trylock(mm))) { > - might_sleep(); > +#if defined(CONFIG_DEBUG_ATOMIC_SLEEP) > + __might_sleep(__FILE__, __LINE__); > +#endif > return true; > } > > This keeps assertions while dodging __cond_resched. > > But then I figured someone may complain about scheduling latency which > was not there prior to the patch. > > Between the 2 not so great choices I rolled with what I considered the > safer one. > > However, now that I said it, I wonder if perhaps the search could be > circumvented on x86-64? The idea would be to check if SMAP got > disabled (and assuming the CPU supports it) -- if so and the faulting > address belongs to userspace, assume it's all good. This is less > precise in that SMAP can be disabled over the intended users access > but would be fine as far as I'm concerned if the search is such a big > deal. > Oof, hit send too fast. This is less precise in that SMAP can be disabled over A LARGER AREA THAN the intended users access but would be fine as far as I'm concerned if the search is such a big. there :) Anyhow I don't feel strongly about any of this, I was mostly interested in what happens with VFS on the off-CPU front and this one is just a random thing I needed to check. Now that I elaborated on my $0,03 I'm happy to respin with the __might_sleep variant. If someone wants a different fix altogether they are welcome to ignore these patches. I do claim the current state *is* a problem though -- it can block down_write for no good reason. -- Mateusz Guzik