From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F0FFC52D7C for ; Thu, 15 Aug 2024 18:24:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 135F36B018D; Thu, 15 Aug 2024 14:24:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BF506B018E; Thu, 15 Aug 2024 14:24:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E7AA26B018F; Thu, 15 Aug 2024 14:24:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C60316B018D for ; Thu, 15 Aug 2024 14:24:32 -0400 (EDT) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6C5111A17CD for ; Thu, 15 Aug 2024 18:24:32 +0000 (UTC) X-FDA: 82455305184.23.0415E6C Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) by imf08.hostedemail.com (Postfix) with ESMTP id 6EA41160034 for ; Thu, 15 Aug 2024 18:24:30 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ohi/3CEW"; spf=pass (imf08.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723746197; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8KTOckrbLosoH+MqatNzZ6gxaYF4DiO7VwZNlqNUG40=; b=6RyHm61Z+8gpv0P+GdJPeZViLqBo+PANMNDvD5mN+wXxfHzCoLoydYfS6zRipIteUTp5eg tmI3xyZ69AGzWiCTHj/OQkXrO6YqYOi/ftI2q0+Jd2xYQEx3rN3cemGwNMwTugNVPc4Nbb NgpOUFptQAJIFcoQudiTJIV4QSZwfT4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723746197; a=rsa-sha256; cv=none; b=jIdk88nn/P72C3ZaJDd95d9bjyu3RckVOXNN3GqbzctfZ7+bwXOQSBthegLvzP5fdMTTR6 auGGBciuctevBiKLfr0BoUsdSgdmcUZeui7oZkNDpA4llkvZkq6Kv3PgSqESwOMN9oqklc QtKW9hURvvXwAoWlMrEoinh9mOFKdIA= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Ohi/3CEW"; spf=pass (imf08.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.221.43 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-wr1-f43.google.com with SMTP id ffacd0b85a97d-36bd70f6522so637863f8f.1 for ; Thu, 15 Aug 2024 11:24:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723746269; x=1724351069; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=8KTOckrbLosoH+MqatNzZ6gxaYF4DiO7VwZNlqNUG40=; b=Ohi/3CEW7SCaY6kf6RV1VujZLnc2LUXeV2qNzjnbScKyy2qA1u6ZPSsSrdboM4Qds3 zu4dSzVOxbPHq2AL9HrKdTzhoFuO+RVOgZdEsMtSF0B7LW5aXUoZkzgctsiXZZoJTFgU SPF6gJ+G4jQkXCB1mrn/U4A2SRjyr91AHNs93DYNCa26wWt1z9xIO1VdxwFgSm3n4ce0 DQIz/3W94ZjUka7GBXKaZAVSfkPGL1Wczr5QlXcP6QmoQ8G8BLAd+VF08mui4CR2emHP qKwJYG63EBbh8TxzP0nOpQ+z9T1cRkku4rKJxdk+k3GNuwXe22XtJ/2AcVRko169lEeM o8TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723746269; x=1724351069; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8KTOckrbLosoH+MqatNzZ6gxaYF4DiO7VwZNlqNUG40=; b=jWiG8OevD7GDO70/WrDbCQ/imm+F/490uVadyW/8u1B5CCtYnZxQ00JgmmBmusmYta dmccoGy11jhxXIir4SMnHMespWlDscROfkol7bRZsxhnWWq633V8i6qg+7/FIxO0mYPr MMIixG9AxdZiXHhBuMP0P1uO8uerVWNvYxfaglvucVmFWsvcmQGNhdits2kHvOMf1CsM xL3Kcuip9qELqKkzPYRjdt4NKWr9retjUUyP2Cw01EvSKWsmmEyaZNmXnmWg0yWbxAcp tSIil51CapC2Sy2VGETNv8SbTvgL6yMvH9pu8xBH9c+PLhmxynNsRPuLXgw7NphkeYjB Mn5A== X-Forwarded-Encrypted: i=1; AJvYcCWrFjCWsFkbtVYQ82836LCFodrQeuTLAD5ODZEe1iGcGwBF32vaoa7+YVexyGHCr+wr2fLndd+Yd4BgEJrmlp8aCGA= X-Gm-Message-State: AOJu0Yy5TuZpAyWxBhwb5DtAy+3fLvUl2t2vb1rfPMVSDO6PgNJGB2me eCB8yjbh2YsjH/XfmoXh39R2G9dkGZdyOFV9GcD16bq/Pv/oeQTt X-Google-Smtp-Source: AGHT+IGQoj9gJ04NKNoIhlq7hwLmGKYPEgKnWxl/ceBsbuCi0I8Bo3zXwbffCScibD2/t+fV1OTjvQ== X-Received: by 2002:adf:cc8c:0:b0:368:633c:a341 with SMTP id ffacd0b85a97d-3719445210fmr202041f8f.22.1723746268336; Thu, 15 Aug 2024 11:24:28 -0700 (PDT) Received: from f (cst-prg-76-86.cust.vodafone.cz. [46.135.76.86]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3718985a6ddsm2101099f8f.58.2024.08.15.11.24.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 15 Aug 2024 11:24:27 -0700 (PDT) Date: Thu, 15 Aug 2024 20:24:15 +0200 From: Mateusz Guzik To: Suren Baghdasaryan Cc: Andrii Nakryiko , Andrii Nakryiko , linux-trace-kernel@vger.kernel.org, peterz@infradead.org, oleg@redhat.com, rostedt@goodmis.org, mhiramat@kernel.org, bpf@vger.kernel.org, linux-kernel@vger.kernel.org, jolsa@kernel.org, paulmck@kernel.org, willy@infradead.org, akpm@linux-foundation.org, linux-mm@kvack.org, Jann Horn Subject: Re: [PATCH RFC v3 13/13] uprobes: add speculative lockless VMA to inode resolution Message-ID: References: <20240813042917.506057-1-andrii@kernel.org> <20240813042917.506057-14-andrii@kernel.org> <7byqni7pmnufzjj73eqee2hvpk47tzgwot32gez3lb2u5lucs2@5m7dvjrvtmv2> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 6EA41160034 X-Stat-Signature: ha1febiyiyw3nf3kkaxdaeox3jegwnk4 X-HE-Tag: 1723746270-260931 X-HE-Meta: U2FsdGVkX1+jasuK2vFb4sMbc0MJQUyXEAVnLi50t/E8ap9nqSFrz8f6ySR7OX1aDyPx1Br4dNLETOgCx5kczXNzfIT+GAnIW44kswXJWs0ZvrIDcmQ3iLOMqxgb59O3exzoaij8cN/ys82kJ0fF+ErT+QjdesiElDVrW+J1l8SptQGClmiX6l6kUR9QoBbeHBMMMFtDmgPEy+QlPoYYwccJxSNZkZThVHPRGG6x7BW2iEnAVvwxCLcu7fSWsqJcXl+9EYNczNjMHP9Bj4J7T+g0H7zWRaabdsoeu0LbDXI/4BvFLmHRELPCsTtE6rsbA/3y98AdEHBPEhmEdZ9X1nHiXtF4snZHoLtRPvTkwQRlwSlWcJPKQUlY2SPBlpTyUScZm9KfMhdE15bg8ixv8GXnLbIaC9DX1kubVMiFvXYZ9vc9rtH+4uBcB5UW2w0B2+qGSfMcuvdygCMfbTpL2i9DhpusX5U1btHUijvbe/IPzZ5naaaREl+HXfbAPz/RdeUuX+ho0wH6p9ffM0trQxrvLlt4rOcg+LtBxadmKCuw9U5/NDXR6N6+S4EZ81bKEJbG4DsvNL0eJ9GfnMBbO0TxmTaimEblqRHP2Vj/sE5cEpo9a7SVop0OchqRglOUowNrTLmmsq1Z/jd8TD2NYtdTqC2jy4+gz/1N3eNdvjgKGqZaZxQs0Xrty6cNpVgEe7cmSEwRLpvHPKsbjC21IyDTs1idmILzF4So+YCVTwyECcq6Fwr82ZXHRyaD89H1kzLNAkJDnqrmgqZ0ai8R/KGdSHJAEtLlw1rSCcanp+vBbwFB6Uuj8Ar0DCuEu4m4FzEXLHAYUiXXir8OAifLsiY8M+XB036HEGx3QHIK5yjAPODsiBp5KRIDK4nrziVKS7pQnY6LhlhpN5/jJtuaDek+LD/4yvNWWnaeq1T8NjscDYkNUmpI+x3WMAZyTqSuEI8hMI9ce+XN+z1rftg 8O9lW2X0 kv7P9RvG3YKoGkwJ5HmzhuiIDgHv1fSBDEAsdsxeotwvZ1e/yNC9DoZQ9uPCpkProESiwZaTHDzhkkjVzmVVKauTehqvcZ0j52kv+gP+iYMaS3lBTtwoP28e05P745xUqhedR3+XVrmI8u7SKBeZMaz+5e3XE3+ANdk4MUY1d8/zJOpvKywMJ7aLWqGaxbBR2VAi19GvWbAcovHJ+Lx/Jcgrtt7RCP3qw4V4paCtpuFGXIyu7TomMRIg7BA0W/sNp1VPz+M/Qlt472p1UdFoDUNy2ipDa1q21eCsbzWNNgO8FR6A1Y7P1dijwBhlatQARiBCCvWe2wLgpskWfzTNdJ9fvc+p92Z1Y8D1gBINqvyz9XcBaKBxU40lAMFh5Mj2uAthI3jAfpW289jmBs07ibIk2+NHUl4i6EDcYsl+SYwlwhhWcoqCobhP3vUm8EZgCbsa3XE3LkHt2+mb8o/zYHmQMdf0UxgDyFK+YzB+vU1SblDVpAs3JSl7ztyrx5fwWVfCjVSwnAO67r80qR/sbFk0aKQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000112, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Aug 15, 2024 at 10:45:45AM -0700, Suren Baghdasaryan wrote: > >From all the above, my understanding of your objection is that > checking mmap_lock during our speculation is too coarse-grained and > you would prefer to use the VMA seq counter to check that the VMA we > are working on is unchanged. I agree, that would be ideal. I had a > quick chat with Jann about this and the conclusion we came to is that > we would need to add an additional smp_wmb() barrier inside > vma_start_write() and a smp_rmb() in the speculation code: > > static inline void vma_start_write(struct vm_area_struct *vma) > { > int mm_lock_seq; > > if (__is_vma_write_locked(vma, &mm_lock_seq)) > return; > > down_write(&vma->vm_lock->lock); > /* > * We should use WRITE_ONCE() here because we can have concurrent reads > * from the early lockless pessimistic check in vma_start_read(). > * We don't really care about the correctness of that early check, but > * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. > */ > WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); > + smp_wmb(); > up_write(&vma->vm_lock->lock); > } > > Note: up_write(&vma->vm_lock->lock) in the vma_start_write() is not > enough because it's one-way permeable (it's a "RELEASE operation") and > later vma->vm_file store (or any other VMA modification) can move > before our vma->vm_lock_seq store. > > This makes vma_start_write() heavier but again, it's write-locking, so > should not be considered a fast path. > With this change we can use the code suggested by Andrii in > https://lore.kernel.org/all/CAEf4BzZeLg0WsYw2M7KFy0+APrPaPVBY7FbawB9vjcA2+6k69Q@mail.gmail.com/ > with an additional smp_rmb(): > > rcu_read_lock() > vma = find_vma(...) > if (!vma) /* bail */ > > vm_lock_seq = smp_load_acquire(&vma->vm_lock_seq); > mm_lock_seq = smp_load_acquire(&vma->mm->mm_lock_seq); > /* I think vm_lock has to be acquired first to avoid the race */ > if (mm_lock_seq == vm_lock_seq) > /* bail, vma is write-locked */ > ... perform uprobe lookup logic based on vma->vm_file->f_inode ... > smp_rmb(); > if (vma->vm_lock_seq != vm_lock_seq) > /* bail, VMA might have changed */ > > The smp_rmb() is needed so that vma->vm_lock_seq load does not get > reordered and moved up before speculation. > > I'm CC'ing Jann since he understands memory barriers way better than > me and will keep me honest. > So I briefly noted that maybe down_read on the vma would do it, but per Andrii parallel lookups on the same vma on multiple CPUs are expected, which whacks that out. When I initially mentioned per-vma sequence counters I blindly assumed they worked the usual way. I don't believe any fancy rework here is warranted especially given that the per-mm counter thing is expected to have other uses. However, chances are decent this can still be worked out with per-vma granualarity all while avoiding any stores on lookup and without invasive (or complicated) changes. The lockless uprobe code claims to guarantee only false negatives and the miss always falls back to the mmap semaphore lookup. There may be something here, I'm going to chew on it. That said, thank you both for writeup so far.