Re: [PATCH v1 1/1] xarray: fix the data-race in xas_find_chunk() by using READ_ONCE()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mirsad Todorovac <mirsad.todorovac@alu.unizg.hr>
To: Jan Kara <jack@suse.cz>, Mirsad Todorovac <mirsad.todorovac@alu.hr>
Cc: Matthew Wilcox <willy@infradead.org>,
	Yury Norov <yury.norov@gmail.com>,
	Philipp Stanner <pstanner@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Chris Mason <clm@fb.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Josef Bacik <josef@toxicpanda.com>,
	David Sterba <dsterba@suse.com>,
	linux-btrfs@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH v1 1/1] xarray: fix the data-race in xas_find_chunk() by using READ_ONCE()
Date: Thu, 12 Oct 2023 00:09:27 +0200	[thread overview]
Message-ID: <1b030e7d-1d8d-4c77-a6a0-870794090661@alu.unizg.hr> (raw)
In-Reply-To: <20231009101550.pqnkrp5cp5zbr3lr@quack3>



On 10/9/23 12:15, Jan Kara wrote:
> On Fri 06-10-23 16:39:54, Mirsad Todorovac wrote:
>> On 9/19/2023 6:20 AM, Matthew Wilcox wrote:
>>> On Mon, Sep 18, 2023 at 11:56:36AM -0700, Yury Norov wrote:
>>>> Guys, I lost the track of the conversation. In the other email Mirsad
>>>> said:
>>>>           Which was the basic reason in the first place for all this, because something changed
>>>>           data from underneath our fingers ..
>>>>
>>>> It sounds clearly to me that this is a bug in xarray, *revealed* by
>>>> find_next_bit() function. But later in discussion you're trying to 'fix'
>>>> find_*_bit(), like if find_bit() corrupted the bitmap, but it's not.
>>>
>>> No, you're really confused.  That happens.
>>>
>>> KCSAN is looking for concurrency bugs.  That is, does another thread
>>> mutate the data "while" we're reading it.  It does that by reading
>>> the data, delaying for a few instructions and reading it again.  If it
>>> changed, clearly there's a race.  That does not mean there's a bug!
>>>
>>> Some races are innocuous.  Many races are innocuous!  The problem is
>>> that compilers sometimes get overly clever and don't do the obvious
>>> thing you ask them to do.  READ_ONCE() serves two functions here;
>>> one is that it tells the compiler not to try anything fancy, and
>>> the other is that it tells KCSAN to not bother instrumenting this
>>> load; no load-delay-reload.
>>>
>>>> In previous email Jan said:
>>>>           for any sane compiler the generated assembly with & without READ_ONCE()
>>>>           will be exactly the same.
>>>>
>>>> If the code generated with and without READ_ONCE() is the same, the
>>>> behavior would be the same, right? If you see the difference, the code
>>>> should differ.
>>>
>>> Hopefully now you understand why this argument is wrong ...
>>>
>>>> You say that READ_ONCE() in find_bit() 'fixes' 200 KCSAN BUG warnings. To
>>>> me it sounds like hiding the problems instead of fixing. If there's a race
>>>> between writing and reading bitmaps, it should be fixed properly by
>>>> adding an appropriate serialization mechanism. Shutting Kcsan up with
>>>> READ_ONCE() here and there is exactly the opposite path to the right direction.
>>>
>>> Counterpoint: generally bitmaps are modified with set_bit() which
>>> actually is atomic.  We define so many bitmap things as being atomic
>>> already, it doesn't feel like making find_bit() "must be protected"
>>> as a useful use of time.
>>>
>>> But hey, maybe I'm wrong.  Mirsad, can you send Yury the bug reports
>>> for find_bit and friends, and Yury can take the time to dig through them
>>> and see if there are any real races in that mess?
>>>
>>>> Every READ_ONCE must be paired with WRITE_ONCE, just like atomic
>>>> reads/writes or spin locks/unlocks. Having that in mind, adding
>>>> READ_ONCE() in find_bit() requires adding it to every bitmap function
>>>> out there. And this is, as I said before, would be an overhead for
>>>> most users.
>>>
>>> I don't believe you.  Telling the compiler to stop trying to be clever
>>> rarely results in a performance loss.
>>
>> Hi Mr. Wilcox,
>>
>> Do you think we should submit a formal patch for this data-race?
> 
> So I did some benchmarking with various GCC versions and the truth is that
> READ_ONCE() does affect code generation a bit (although the original code
> does not refetch the value from memory). As a result my benchmarks show the
> bit searching functions are about 2% slower. This is not much but it is
> stupid to cause a performance regression due to non-issue. I'm trying to
> get some compiler guys look into this whether we can improve it somehow...
> 
> 								Honza

Dear Jan,

First, I am not an expert or an authority on the subject, this is only
my opinion.

IMHO, 2% slower code is acceptable if it gives us data integrity. If a
16-core system manages to break and tear loads without READ_ONCE(), 2%
faster code gives us nothing if the other core changes half of the location
in the midst of the load, just because the optimiser did some "funny stuff".

If I had a pacemaker and it is running Linux kernel, I would probably choose
2% slower but race-free code.

Please allow me to assert that this is not a spin lock, memory bus lock,
or a memory barrier that would affect the other cores - it will only slightly
prevent some read reordering/tearing.

I think you are on the good track, and that this patch is a good thing.

Low-level functions have to be first safe, then fast.

A faster algorithm, like replacing spinlocks with RCU, can certainly more
than make up for that ...

Sorry for a digression.

Best regards,
Mirsad

     prev parent reply	other threads:[~2023-10-11 22:09 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-18  4:47 Mirsad Goran Todorovac
2023-09-18  9:41 ` Jan Kara
2023-09-18 10:20   ` Mirsad Todorovac
2023-09-18 11:38     ` Jan Kara
2023-09-18 12:46       ` Mirsad Todorovac
2023-09-18 13:18         ` Jan Kara
2023-09-18 13:34           ` Mirsad Todorovac
2023-09-18 14:17             ` Jan Kara
2023-09-18 14:59         ` Yury Norov
2023-09-18 15:33           ` Mirsad Todorovac
2023-09-18 15:54           ` Jan Kara
2023-09-18 16:28             ` Mirsad Todorovac
2023-09-18 18:56               ` Yury Norov
2023-09-19  4:20                 ` Matthew Wilcox
2023-10-06 14:39                   ` Mirsad Todorovac
2023-10-09 10:15                     ` Jan Kara
2023-10-11 22:09                       ` Mirsad Todorovac [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1b030e7d-1d8d-4c77-a6a0-870794090661@alu.unizg.hr \
    --to=mirsad.todorovac@alu.unizg.hr \
    --cc=akpm@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=dsterba@suse.com \
    --cc=jack@suse.cz \
    --cc=josef@toxicpanda.com \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mirsad.todorovac@alu.hr \
    --cc=pstanner@redhat.com \
    --cc=willy@infradead.org \
    --cc=yury.norov@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox