From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A33B1C433FE for ; Wed, 9 Mar 2022 19:08:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C40E18D0002; Wed, 9 Mar 2022 14:08:23 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BF04E8D0001; Wed, 9 Mar 2022 14:08:23 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AB8438D0002; Wed, 9 Mar 2022 14:08:23 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 9C7CF8D0001 for ; Wed, 9 Mar 2022 14:08:23 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 61C1022B45 for ; Wed, 9 Mar 2022 19:08:23 +0000 (UTC) X-FDA: 79225783686.02.344A097 Received: from mail-lj1-f173.google.com (mail-lj1-f173.google.com [209.85.208.173]) by imf03.hostedemail.com (Postfix) with ESMTP id D346B2000E for ; Wed, 9 Mar 2022 19:08:22 +0000 (UTC) Received: by mail-lj1-f173.google.com with SMTP id o6so4621755ljp.3 for ; Wed, 09 Mar 2022 11:08:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=XXR1y9mWPI79aJN86zN1Xw4k1kIrgJW/qF3J3mr8s/o=; b=f2HKjN7pnq1Yx0IZQSDYsv3QQKsG+2d6SyKPs8kRV0DJINmCv0zY8YggZgmbRh8s9G 5W1TTtOlOHELiVloxzikMZjMiZj3rLso6o3JgLPBQDjrwKLUkGQeH0vZpt7UqvmOkcfZ TYZh0tYA7JeXfR6IRR0cTQnv83g6YC9SeVOuE= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=XXR1y9mWPI79aJN86zN1Xw4k1kIrgJW/qF3J3mr8s/o=; b=DrgYg8t655uHVEJdVLnq6hhq8pfNUt7XdSDAxKqGqEVgHFkLR8T5dd3Sng81URmVuZ N98LtjqVUIlU81cpejHP0RNjCVoGAB6UcagpnxwIueG/Be3AxVvdzXsD9eXdOZgxkd0c RK+KYhF6Pk/PzQrPAf3+axd1d5rLaWKnA3GqyeYSUKgD/mLn9q7RgWIZ7g/8cYEAgcDz t9K5I3k6X1nTSuRwrIe6hUYYsQ989/kiUiaX8MuTxTTYZQ89nTefJzSns+oyAg0CEKfy PjiVx2514WifUR6zlO4w3PCcb95d0S5pABXmcRZvqsLTzpHW8SOayhExlp8F46XhB6pK GCRQ== X-Gm-Message-State: AOAM532+1EaGj+lrwdreTxTEweTk2VBR/UlTC0vHA0TPaNsE49O0uZL1 9dEAFq+LamG5y+G48nw08ygDx10IAKJN/y1Jr/0= X-Google-Smtp-Source: ABdhPJy93SZ9XaAjM0NWlRbX3AhLzZfQMB0JYK18gUCIEpg1Zs7gjs6ZHK+6YT0bou8mrd5/1e9ROQ== X-Received: by 2002:a2e:924d:0:b0:246:370c:5618 with SMTP id v13-20020a2e924d000000b00246370c5618mr616382ljg.358.1646852900468; Wed, 09 Mar 2022 11:08:20 -0800 (PST) Received: from mail-lf1-f42.google.com (mail-lf1-f42.google.com. [209.85.167.42]) by smtp.gmail.com with ESMTPSA id y23-20020a2e95d7000000b00247e4e386aasm581375ljh.121.2022.03.09.11.08.19 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 09 Mar 2022 11:08:19 -0800 (PST) Received: by mail-lf1-f42.google.com with SMTP id n19so5471792lfh.8 for ; Wed, 09 Mar 2022 11:08:19 -0800 (PST) X-Received: by 2002:ac2:41cf:0:b0:448:1eaa:296c with SMTP id d15-20020ac241cf000000b004481eaa296cmr679738lfi.52.1646852899217; Wed, 09 Mar 2022 11:08:19 -0800 (PST) MIME-Version: 1.0 References: <20220309184238.1583093-1-agruenba@redhat.com> In-Reply-To: <20220309184238.1583093-1-agruenba@redhat.com> From: Linus Torvalds Date: Wed, 9 Mar 2022 11:08:02 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: Buffered I/O broken on s390x with page faults disabled (gfs2) To: Andreas Gruenbacher Cc: Catalin Marinas , David Hildenbrand , Alexander Viro , linux-s390 , Linux-MM , linux-fsdevel , linux-btrfs Content-Type: text/plain; charset="UTF-8" X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D346B2000E X-Stat-Signature: 7qyewoitt7fu4k7kweuhm1yp8quforr4 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b=f2HKjN7p; spf=pass (imf03.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.208.173 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org; dmarc=none X-HE-Tag: 1646852902-743047 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Mar 9, 2022 at 10:42 AM Andreas Gruenbacher wrote: > > From what I took from the previous discussion, probing at a sub-page > granularity won't be necessary for bytewise copying: when the address > we're trying to access is poisoned, fault_in_*() will fail; when we get > a short result, that will take us to the poisoned address in the next > iteration. Sadly, that isn't actually the case. It's not the case for GUP (that page aligns things), and it's not the case for fault_in_writeable() itself (that also page aligns things). But more importantly, it's not actually the case for the *users* either. Not all of the users are byte-stream oriented, and I think it was btrfs that had a case of "copy a struct at the beginning of the stream". And if that copy failed, it wouldn't advance by as many bytes as it got - it would require that struct to be all fetched, and start from the beginning. So we do need to probe at least a minimum set of bytes. Probably a fairly small minimum, but still... > With a large enough buffer, a simple malloc() will return unmapped > pages, and reading into such a buffer will result in fault-in. So page > faults during read() are actually pretty normal, and it's not the user's > fault. Agreed. But that wasn't the case here: > In my test case, the buffer was pre-initialized with memset() to avoid > those kinds of page faults, which meant that the page faults in > gfs2_file_read_iter() only started to happen when we were out of memory. > But that's not the common case. Exactly. I do not think this is a case that we should - or need to - optimize for. And doing too much pre-faulting is actually counter-productive. > * Get rid of max_size: it really makes no sense to second-guess what the > caller needs. It's not about "what caller needs". It's literally about latency issues. If you can force a busy loop in kernel space by having one unmapped page and then do a 2GB read(), that's a *PROBLEM*. Now, we can try this thing, because I think we end up having other size limitations in the IO subsystem that means that the filesystem won't actually do that, but the moment I hear somebody talk about latencies, that max_size goes back. Linus