From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A992EE499B for ; Fri, 18 Aug 2023 21:39:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D5DF940076; Fri, 18 Aug 2023 17:39:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9845E940012; Fri, 18 Aug 2023 17:39:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8246F940076; Fri, 18 Aug 2023 17:39:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6E953940012 for ; Fri, 18 Aug 2023 17:39:28 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 47F0C80812 for ; Fri, 18 Aug 2023 21:39:28 +0000 (UTC) X-FDA: 81138542016.21.0C7A361 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.85.151]) by imf17.hostedemail.com (Postfix) with ESMTP id 679C840010 for ; Fri, 18 Aug 2023 21:39:24 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=aculab.com; spf=pass (imf17.hostedemail.com: domain of david.laight@aculab.com designates 185.58.85.151 as permitted sender) smtp.mailfrom=david.laight@aculab.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692394766; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9pltIFwbgdUzgDcprph2ue5lEEqG5wSTPD457h5fJGc=; b=tIEtkJ7gqabT2zKKpPtoehxqcDfVC45+MVaqTY/v4g8XLCgiwi0JxulKCyHDlfNQUAOkXB vUSu9U4Kegh7RiOB2EsxEEpsVoLeSuut41KEuE7rgbSzyeoFRL0wUqVfvx7ciAXXiqdu92 gd/KPUrGJcTYR9RWFZai6nBfDbU+E3Q= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=aculab.com; spf=pass (imf17.hostedemail.com: domain of david.laight@aculab.com designates 185.58.85.151 as permitted sender) smtp.mailfrom=david.laight@aculab.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692394766; a=rsa-sha256; cv=none; b=A3EXkVH1pBJXErF1x9P7CQDTxtoQImdJ3J/ByLTZogZSyd9/bLIZSytPn/gWXkbOeeTjX1 POjsePblllXOMDB2Ua5L2Z/hWB9ug8kfWq3bTJsTmIIfdt5cKUYH1IV1cvTW0YtwQ1+gm+ JsTrrJTkBQes0a/ws53ure4JnFeJTTE= Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-156-hPJ_18syOlyCB1PGiL-aSA-1; Fri, 18 Aug 2023 22:39:21 +0100 X-MC-Unique: hPJ_18syOlyCB1PGiL-aSA-1 Received: from AcuMS.Aculab.com (10.202.163.4) by AcuMS.aculab.com (10.202.163.4) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Fri, 18 Aug 2023 22:39:20 +0100 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Fri, 18 Aug 2023 22:39:20 +0100 From: David Laight To: 'David Howells' CC: Linus Torvalds , Al Viro , Jens Axboe , Christoph Hellwig , Christian Brauner , Matthew Wilcox , Jeff Layton , "linux-fsdevel@vger.kernel.org" , "linux-block@vger.kernel.org" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" Subject: RE: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() Thread-Topic: [PATCH v3 2/2] iov_iter: Don't deal with iter->copy_mc in memcpy_from_iter_mc() Thread-Index: AQHZ0DpP/l59sWTPXU+UuQ9VGbJikq/s16Kg///6foCAACRpIIAA7PpbgABGFYCAAgTAc4AAASpwgAAG9ACAAFxrAA== Date: Fri, 18 Aug 2023 21:39:20 +0000 Message-ID: <04ee44bc6c2d4c5bb1c143bcb6803b7b@AcuMS.aculab.com> References: <03730b50cebb4a349ad8667373bb8127@AcuMS.aculab.com> <20230816120741.534415-1-dhowells@redhat.com> <20230816120741.534415-3-dhowells@redhat.com> <608853.1692190847@warthog.procyon.org.uk> <3dabec5643b24534a1c1c51894798047@AcuMS.aculab.com> <665724.1692218114@warthog.procyon.org.uk> <2058762.1692371971@warthog.procyon.org.uk> <2093413.1692377320@warthog.procyon.org.uk> In-Reply-To: <2093413.1692377320@warthog.procyon.org.uk> Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 679C840010 X-Stat-Signature: punb1i34z96dsapi5m59gnqnwuwtffcm X-HE-Tag: 1692394764-900351 X-HE-Meta: U2FsdGVkX19IY0UvP5FRMwNewH+/ipzusttafqLywGaobyUoaVtklNYmS/74LbOVjEMHK9uU9R3X6UXoWkRXZdTy31wPXUj9yxcBQKTpiCAjYsV6iyxDbAjGRtk1NUVHOi/hYjutSSkz7JuxvuC2DSpeQtS5UrlxFxXXX3L1Ff7s28MgZQ8Rlr34YjEM2PmKuVqRlUNdO0i7scLr9BHYZef3MPtN/vNYeRrrFXBwKlv8J/9mobdl9xK/L4+DN4bp2yIdsDhM9nhx0g3vSGKTyd4IpRw801ManzeH5AS2ZkIF2juQGIl3n9Hnrnm0so58XT+eBM+HOKnydr4kaQNtqPx/Ce3e7FDoTAtKn/nD4ofxc4sgFp3oBGZQ5xiAsFWvxuj/3eLVrjRKOUGwvGi3nZ8gSAcT272F6fOs+RdpvHNBbKUP568n3Oql1X94JGNEWa3pUgl/3Ltc3ijYAKHA5qLcirR7JuVzvHA2iVEVmRdBjTUutkCNRUxGaAVoNSEAfJ5Rg7vGW6TEcGX2PXLcs2OrGBXJZCHi42W5JlEvxVFgiHghY51wH1SrNoA9W5r6P7jLWs00QWX3kgIr91bBeQr4027IBw9t71/+Bdze8oyrS52F5+zDUqTkDr3Np2j7ph6dhCKVc02zN037ujriPWfssTKfqk46Ge/LxeoGGpzOMAhD1zcurZqOl7W+/s1ngoCuL71CzZc/IaufDKl051/jKkhZpiwqCvW9/gDQcl0NLJ7QnneGQ4/6iDDZZ/iAhoLPcThwNiu3IiYOA/fGNncy/4lwvVBNsDKCtkCukNvl+Id3G/5gxvzOt1Ar6cBEgkIrgqewWYNJMa/RbSoMh3AC32UmBec1mZ4IoiW19ty8NDubNqYUy/2QHmCQrTR060m80nXHtdeqlupjm6rJLo2Y2xzwUQnUz9AKLxlpRyL79XJMNJ453XpEYMOP/usG+j83qIJHYADcK1HLw7U FuXGrraH wCDb9fT2uJk5XefmcJ+MTpUPGbMCsgC12FeTBnEXLKC/W4nWPeToZLXinTidWo7AgQzAiByr+jwGpqP6+/vn8LPmAvalsk8UPf6A2hVSs2hxjdJ/YoFQgm4vvGdFKxVV+XQgadhOBhSuWl4z8A2wDMXS9xg7DQmwSg4df8Xv321Z0gsTTdCIE6C9CZWBAgQpLLCaI8vEsIge746E6igSMRQmu6dcuwV4PbbGVlVGmjY8P5xdqJL1iynyGN38IyLOZ3YFi90n1B7RsumSe9l/UzKsLTVQcpc79Lh5O5e71rXTH4r76pBZAYtMwBw9WzKRcrLAdlz8bv1elz85acY7rVKYsnT8k8xdsl49mZ3C3b569UVA1DVk9lL/sxA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: David Howells > Sent: Friday, August 18, 2023 5:49 PM >=20 > David Laight wrote: >=20 > > > iov_iter_init inc 0x27 -> 0x31 +0xa > > > > Are you hitting the gcc bug that loads the constant from memory? >=20 > I'm not sure what that looks like. For your perusal, here's a disassembl= y of > the use-switch-on-enum variant: >=20 > 0xffffffff8177726c <+0>: cmp $0x1,%esi > 0xffffffff8177726f <+3>: jbe 0xffffffff81777273 > 0xffffffff81777271 <+5>: ud2 > 0xffffffff81777273 <+7>: test %esi,%esi > 0xffffffff81777275 <+9>: movw $0x1,(%rdi) > 0xffffffff8177727a <+14>: setne 0x3(%rdi) > 0xffffffff8177727e <+18>: xor %eax,%eax > 0xffffffff81777280 <+20>: movb $0x0,0x2(%rdi) > 0xffffffff81777284 <+24>: movb $0x1,0x4(%rdi) > 0xffffffff81777288 <+28>: mov %rax,0x8(%rdi) > 0xffffffff8177728c <+32>: mov %rdx,0x10(%rdi) > 0xffffffff81777290 <+36>: mov %r8,0x18(%rdi) > 0xffffffff81777294 <+40>: mov %rcx,0x20(%rdi) > 0xffffffff81777298 <+44>: jmp 0xffffffff81d728a0 <__x86_return_t= hunk> >=20 > versus the use-bitmap variant: >=20 > 0xffffffff81777311 <+0>: cmp $0x1,%esi > 0xffffffff81777314 <+3>: jbe 0xffffffff81777318 > 0xffffffff81777316 <+5>: ud2 > 0xffffffff81777318 <+7>: test %esi,%esi > 0xffffffff8177731a <+9>: movb $0x2,(%rdi) > 0xffffffff8177731d <+12>: setne 0x1(%rdi) > 0xffffffff81777321 <+16>: xor %eax,%eax > 0xffffffff81777323 <+18>: mov %rdx,0x10(%rdi) > 0xffffffff81777327 <+22>: mov %rax,0x8(%rdi) > 0xffffffff8177732b <+26>: mov %r8,0x18(%rdi) > 0xffffffff8177732f <+30>: mov %rcx,0x20(%rdi) > 0xffffffff81777333 <+34>: jmp 0xffffffff81d72960 <__x86_return_t= hunk> >=20 > It seems to be that the former is loading byte constants individually, wh= ereas > Linus combined all those fields into a single byte and eliminated one of = them. I think you need to re-order the structure. The top set writes to bytes 0..4 with: > 0xffffffff81777275 <+9>: movw $0x1,(%rdi) > 0xffffffff8177727a <+14>: setne 0x3(%rdi) > 0xffffffff81777280 <+20>: movb $0x0,0x2(%rdi) > 0xffffffff81777284 <+24>: movb $0x1,0x4(%rdi) Note that the 'setne' writes into the middle of the constants. The lower writes bytes 0..1 with: > 0xffffffff8177731a <+9>: movb $0x2,(%rdi) > 0xffffffff8177731d <+12>: setne 0x1(%rdi) I think that if you move the 'conditional' value to offset 4 you'll get fewer writes. Probably a 32bit load into %eax and then a write. I don't think gcc likes generating 16bit immediates. In some tests I did it loaded a 32bit value into %eax and then wrote the low bits. So the code is much the same (on x86) for 2 or 4 bytes of constants. I'm sure you can use the 'data-16' prefix with an immediate. I'm not sure why you have two non-zero values when Linus only had one though. OTOH you don't want to be writing 3 bytes of constants. Also gcc won't generate: =09movl $0xaabbccdd,%eax =09setne %al // overwriting the dd =09movl %eax,(%rdi) and I suspect the partial write (to %al) will be a stall. =09David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1= PT, UK Registration No: 1397386 (Wales)