From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 56C99C433F5 for ; Wed, 27 Apr 2022 01:35:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E07AF6B0073; Tue, 26 Apr 2022 21:35:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB6136B0078; Tue, 26 Apr 2022 21:35:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C59126B007B; Tue, 26 Apr 2022 21:35:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id B6D9A6B0073 for ; Tue, 26 Apr 2022 21:35:11 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 882DD281FD for ; Wed, 27 Apr 2022 01:35:11 +0000 (UTC) X-FDA: 79400940822.14.25E1B28 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf11.hostedemail.com (Postfix) with ESMTP id 25D844004C for ; Wed, 27 Apr 2022 01:35:07 +0000 (UTC) Received: by mail-ej1-f50.google.com with SMTP id l18so476195ejc.7 for ; Tue, 26 Apr 2022 18:35:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=cuMemhK5xIsarCH9/oTmT9dxFi1jAyKuuZDdiuUgrA8=; b=TCP10/qe9Q7mX7BBVd9N5qR12JU4ZU7GN1Vdw0bT4WLE6GBs8lZf8d6pIMh6W99uuy +ByiZ0SKyNBKfVivB20Qfv1HcyYocCLPeMkZ8BxU77r8QuWnXeWbOUZeXPmIO/Ez4lFU QTV1CIUoDVtcWPB3PtminDbgmMVEToyNmLloo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=cuMemhK5xIsarCH9/oTmT9dxFi1jAyKuuZDdiuUgrA8=; b=6vrkjCiDqH2ea7VOxMkxyiiXIfSyD5a9AB6aZJgp2eiCq2v/AAgGHFwp2VVEQv3XI2 kGJvHBhGGNrAC+vWoxy9MPJzx/uc3Eh4Bqq154Ov0U8Q5biXaWY80mS09fpA3VX/1zsI nun+0dDf8KJsQyRCvcleVsmAceg4ukCbMo9J2/OfNF54/7uG+NJHcgcXNrXJtf9UrQ88 0dWaL3VnX7mXwIWlTAYbvuwyn8pyNDWhrJwtXh+xYTFAL/hQKbTFxQbgFE50wMzQB7PI 9dJqwqHEdZ0ZgeXywmjj9drEUDvtmXb5NfFqi1wmUymXNHP+2voOgEwd0Og0VAr6hF6S uhug== X-Gm-Message-State: AOAM530jlg0OPmrgdm35qQgAAyFE4Nlzv9ZOmprOa/zgABdKCLz3OhIs eQo/NAL9u6W6u9fEKR6jspeidJamlRNx/ZPE9tw= X-Google-Smtp-Source: ABdhPJzfFa8yeijILAvCrhRlaOGhxQQh7qT5C8tUoQFBEOx8nvHSGsgfEjTIOABwcBtbCIIzP95wfg== X-Received: by 2002:a17:907:3f86:b0:6db:b745:f761 with SMTP id hr6-20020a1709073f8600b006dbb745f761mr23164259ejc.610.1651023309188; Tue, 26 Apr 2022 18:35:09 -0700 (PDT) Received: from mail-ej1-f47.google.com (mail-ej1-f47.google.com. [209.85.218.47]) by smtp.gmail.com with ESMTPSA id ka6-20020a170907990600b006ce54c95e3csm5771472ejc.161.2022.04.26.18.35.08 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 26 Apr 2022 18:35:08 -0700 (PDT) Received: by mail-ej1-f47.google.com with SMTP id i27so459766ejd.9 for ; Tue, 26 Apr 2022 18:35:08 -0700 (PDT) X-Received: by 2002:a05:6512:6c6:b0:472:296e:6dfb with SMTP id u6-20020a05651206c600b00472296e6dfbmr767509lff.542.1651022972680; Tue, 26 Apr 2022 18:29:32 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Linus Torvalds Date: Tue, 26 Apr 2022 18:29:16 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [patch 02/14] tmpfs: fix regressions from wider use of ZERO_PAGE To: Borislav Petkov Cc: Mark Hemment , Andrew Morton , "the arch/x86 maintainers" , Peter Zijlstra , patrice.chotard@foss.st.com, Mikulas Patocka , Lukas Czerner , Christoph Hellwig , "Darrick J. Wong" , Chuck Lever , Hugh Dickins , patches@lists.linux.dev, Linux-MM , mm-commits@vger.kernel.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: ogjtb66byoo9oqea5yu5my8p5kxtym3n X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 25D844004C X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux-foundation.org header.s=google header.b="TCP10/qe"; dmarc=none; spf=pass (imf11.hostedemail.com: domain of torvalds@linuxfoundation.org designates 209.85.218.50 as permitted sender) smtp.mailfrom=torvalds@linuxfoundation.org X-HE-Tag: 1651023307-827818 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Apr 26, 2022 at 5:14 PM Borislav Petkov wrote: > > So when we enter the function, we shift %rcx to get the number of > qword-sized quantities to zero: > > SYM_FUNC_START(clear_user_original) > mov %rcx,%rax > shr $3,%rcx # qwords <--- Yes. But that's what we do for "rep stosq" too, for all the same reasons. > but when we encounter the fault here, we return *%rcx* - not %rcx << 3 > - latter being the *bytes* leftover which we *actually* need to return > when we encounter the #PF. Yes, but: > So, we need to shift back when we fault during the qword-sized zeroing, > i.e., full function below, see label 3 there. No. The problem is that you're using the wrong exception type. Thanks for posting the whole thing, because that makes it much more obvious. You have the exception table entries switched. You should have _ASM_EXTABLE_TYPE_REG(0b, 3b, EX_TYPE_UCOPY_LEN8, %rax) _ASM_EXTABLE_UA(2b, 3b) and not need that label '4' at all. Note how that "_ASM_EXTABLE_TYPE_REG" thing is literally designed to do %rcx = 8*%rcx+%rax in the exception handler. Of course, to get that regular _ASM_EXTABLE_UA(2b, 3b) to work, you need to have the final byte count in %rcx, not in %rax so that means that the "now do the rest of the bytes" case should have done something like movl %eax,%ecx 2: movb $0,(%rdi) inc %rdi decl %ecx jnz 2b instead. Yeah, yeah, you could also use that _ASM_EXTABLE_TYPE_REG thing for the second exception point, and keep %rcx as zero, and keep it in %eax, and depend on that whole "%rcx = 8*%rcx+%rax" fixing it up, and knowing that if an exception does *not* happen, %rcx will be zero from the word-size loop. But that really seems much too subtle for me - why not just keep things in %rcx, and try to make this look as much as possible like the "rep stosq + rep stosb" case? And finally: I still think that those fancy exception table things are *much* too fancy, and much too subtle, and much too complicated. So I'd actually prefer to get rid of them entirely, and make the code use the "no register changes" exception, and make the exception handler do a proper site-specific fixup. At that point, you can get rid of all the "mask bits early" logic, get rid of all the extraneous 'test' instructions, and make it all look something like below. NOTE! I've intentionally kept the %eax thing using 32-bit instructions - smaller encoding, and only the low three bits matter, so why move/mask full 64 bits? NOTE2! Entirely untested. But I tried to make the normal code do minimal work, and then fix things up in the exception case more. So it just keeps the original count in the 32 bits in %eax until it wants to test it, and then uses the 'andl' to both mask and test. And the exception case knows that, so it masks there too. I dunno. But I really think that whole new _ASM_EXTABLE_TYPE_REG and EX_TYPE_UCOPY_LEN8 was unnecessary. Linus SYM_FUNC_START(clear_user_original) movl %ecx,%eax shrq $3,%rcx # qwords jz .Lrest_bytes # do the qwords first .p2align 4 .Lqwords: movq $0,(%rdi) lea 8(%rdi),%rdi dec %rcx jnz .Lqwords .Lrest_bytes: andl $7,%eax # rest bytes jz .Lexit # now do the rest bytes .Lbytes: movb $0,(%rdi) inc %rdi decl %eax jnz .Lbytes .Lexit: xorl %eax,%eax RET .Lqwords_exception: # convert qwords back into bytes to return to caller shlq $3, %rcx andl $7, %eax addq %rax,%rcx jmp .Lexit .Lbytes_exception: movl %eax,%ecx jmp .Lexit _ASM_EXTABLE_UA(.Lqwords, .Lqwords_exception) _ASM_EXTABLE_UA(.Lbytes, .Lbytes_exception)