From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE93CC433E6 for ; Wed, 10 Mar 2021 21:14:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 375B564FC9 for ; Wed, 10 Mar 2021 21:14:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 375B564FC9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AC6D38D0210; Wed, 10 Mar 2021 16:14:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9DB58D01ED; Wed, 10 Mar 2021 16:14:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9678E8D0210; Wed, 10 Mar 2021 16:14:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 7B83C8D01ED for ; Wed, 10 Mar 2021 16:14:45 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 46FF87588 for ; Wed, 10 Mar 2021 21:14:45 +0000 (UTC) X-FDA: 77905218930.18.C372BF9 Received: from mail-lj1-f179.google.com (mail-lj1-f179.google.com [209.85.208.179]) by imf29.hostedemail.com (Postfix) with ESMTP id EAEBCE6 for ; Wed, 10 Mar 2021 21:14:41 +0000 (UTC) Received: by mail-lj1-f179.google.com with SMTP id h4so27535448ljl.0 for ; Wed, 10 Mar 2021 13:14:44 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mWjSyj9+MgpkAuhvTxP4EPO0ZRIe4bF8NB224VR5k5Y=; b=PIqDJ2U0GSCx4NtqwWKrFE9QHN7qXy5qh2Js/ocQazC6t5gl+kXnW0bxIJE/LFuD2k T6xs2xk/6PHqSWp9K4hjRH0M1ASKEW1Ap/K9lX9Q8UAOfl66sg1BvnzXj2ilccmkluus iX3LkCYCwJNr7hiSwfQg9Kx4Hg3FIHAwDl0VY= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mWjSyj9+MgpkAuhvTxP4EPO0ZRIe4bF8NB224VR5k5Y=; b=b2hP9Q3GeW+iTdof4X9XVJuGWp0tL/D4GdnPiKJWnFyH4sseDY1DIs7Q1oEN1ZdyiH 3QVBq2Hfh6VOqsYzrVvdokHvY5FbvNTcoyRiaZNuL+26TmKTiiYv3vRUwos51MkFH33x vXR3qcwbhVBmg7nSzhvf6uvUnmcTzN97q1Bir85myHY+H2rIAF2llGOA1T62rxAynPO5 WPnYHxt0EkXcRbTR/h/zVPAXmOAoImd7cYhg8UXJf/X8BgiAoI7ywABoilih+1X+6aJk fyeG8kcGks1hDfADH5adnDcE3L6QbEBV7y5+wTTMAKdUKLqNfy2jyHMK+AYQuMLaSPH8 UZ7A== X-Gm-Message-State: AOAM530J50lL14DAzLrgGZdftMWinMHJWohz9qsYXXnBoawEp9YnH9Eo j4oJaQ2BUdS9aUwq5SoIOU8OjvFbkZOjoA== X-Google-Smtp-Source: ABdhPJzTjUjZ3fsUiL1epd8j3t1r+4o6Z3geyLZ4b/rV6tTRCWCVrMzBFUhvQD9viu6mgVLaKM9Suw== X-Received: by 2002:a2e:9d8f:: with SMTP id c15mr2894541ljj.494.1615410882796; Wed, 10 Mar 2021 13:14:42 -0800 (PST) Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com. [209.85.208.176]) by smtp.gmail.com with ESMTPSA id o24sm144278lfg.64.2021.03.10.13.14.41 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 10 Mar 2021 13:14:41 -0800 (PST) Received: by mail-lj1-f176.google.com with SMTP id r25so27461941ljk.11 for ; Wed, 10 Mar 2021 13:14:41 -0800 (PST) X-Received: by 2002:a2e:5c84:: with SMTP id q126mr2800569ljb.61.1615410880841; Wed, 10 Mar 2021 13:14:40 -0800 (PST) MIME-Version: 1.0 References: <59ee3289194cd97d70085cce701bc494bfcb4fd2.1615372955.git.gladkov.alexey@gmail.com> In-Reply-To: <59ee3289194cd97d70085cce701bc494bfcb4fd2.1615372955.git.gladkov.alexey@gmail.com> From: Linus Torvalds Date: Wed, 10 Mar 2021 13:14:24 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v8 3/8] Use atomic_t for ucounts reference counting To: Alexey Gladkov Cc: LKML , io-uring , Kernel Hardening , Linux Containers , Linux-MM , Alexey Gladkov , Andrew Morton , Christian Brauner , "Eric W . Biederman" , Jann Horn , Jens Axboe , Kees Cook , Oleg Nesterov Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: EAEBCE6 X-Stat-Signature: 8qheme9dffkncs8kky954cghjmni9f6s Received-SPF: none (linuxfoundation.org>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=mail-lj1-f179.google.com; client-ip=209.85.208.179 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1615410881-331364 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 10, 2021 at 4:01 AM Alexey Gladkov wrote: > > > +/* 127: arbitrary random number, small enough to assemble well */ > +#define refcount_zero_or_close_to_overflow(ucounts) \ > + ((unsigned int) atomic_read(&ucounts->count) + 127u <= 127u) > + > +struct ucounts *get_ucounts(struct ucounts *ucounts) > +{ > + if (ucounts) { > + if (refcount_zero_or_close_to_overflow(ucounts)) { > + WARN_ONCE(1, "ucounts: counter has reached its maximum value"); > + return NULL; > + } > + atomic_inc(&ucounts->count); > + } > + return ucounts; Side note: you probably should just make the limit be the "oh, the count overflows into the sign bit". The reason the page cache did that tighter thing is that it actually has _two_ limits: - the "try_get_page()" thing uses the sign bit as a "uhhuh, I've now used up half of the available reference counting bits, and I will refuse to use any more". This is basically your "get_ucounts()" function. It's a "I want a refcount, but I'm willing to deal with failures". - the page cache has a _different_ set of "I need to unconditionally get a refcount, and I can *not* deal with failures". This is basically the traditional "get_page()", which is only used in fairly controlled places, and should never be something that can overflow. And *that* special code then uses that "zero_or_close_to_overflow()" case as a "doing a get_page() in this situation is very very wrong". This is purely a debugging feature used for a VM_BUG_ON() (that has never triggered, as far as I know). For your ucounts situation, you don't have that second case at all, so you have no reason to ever allow the count to even get remotely close to overflowing. A reference count being within 128 counts of overflow (when we're talking a 32-bit count) is basically never a good idea. It means that you are way too close to the limit, and there's a risk that lots of concurrent people all first see an ok value, and then *all* decide to do the increment, and then you're toast. In contrast, if you use the sign bit as a "ok, let's stop incrementing", the fact that your "overflow" test and the increment aren't atomic really isn't a big deal. (And yes, you could use a cmpxchg to *make* the overflow test atomic, but it's often much much more expensive, so..) Linus