From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2029DE77183 for ; Mon, 16 Dec 2024 15:09:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A85BD6B00B6; Mon, 16 Dec 2024 10:09:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A35D56B00B7; Mon, 16 Dec 2024 10:09:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FD746B00B8; Mon, 16 Dec 2024 10:09:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 73C0E6B00B6 for ; Mon, 16 Dec 2024 10:09:28 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 2AFAC140F42 for ; Mon, 16 Dec 2024 15:09:28 +0000 (UTC) X-FDA: 82901155722.29.9314531 Received: from eu-smtp-delivery-151.mimecast.com (eu-smtp-delivery-151.mimecast.com [185.58.85.151]) by imf07.hostedemail.com (Postfix) with ESMTP id 6152A40008 for ; Mon, 16 Dec 2024 15:08:43 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=aculab.com; spf=pass (imf07.hostedemail.com: domain of david.laight@aculab.com designates 185.58.85.151 as permitted sender) smtp.mailfrom=david.laight@aculab.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734361733; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=i/e03Us4dgk2Ebh6MlZf1feY7W579RZMlJHnzPVqq54=; b=EKCr9qkXzzFNp1Lk4VeEik3mj4klRBEqrXXxCziP1oLgsTRNC6mSC3TjwCV2yGmK4kcXm1 6XjGlYTU7JPRU+3XXZA6em32fDW0/feU4fpsKw2SSo2SJeJyMtd//oMY2fgK96gCtJ/fxv nAMTjwzUBKRmDXewIuAKmdJSOWRFGgQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734361733; a=rsa-sha256; cv=none; b=KM9c4+opXLNO+rOmEs7TcLNvTjqUZ7oRVHTjo0nAy0NVeJ4XMNabDGLh/PnH6XbOq6IxHY aOlx6+mmjQbqR0SdBpOzXHkSEWpRqg4DgRX1/XjlPz+/2iGVZ7gG0Xcagqhbm/UaDkbS/k deT9iXMOV4YKImJ9ivbCe+H5v7+QOds= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=aculab.com; spf=pass (imf07.hostedemail.com: domain of david.laight@aculab.com designates 185.58.85.151 as permitted sender) smtp.mailfrom=david.laight@aculab.com Received: from AcuMS.aculab.com (156.67.243.121 [156.67.243.121]) by relay.mimecast.com with ESMTP with both STARTTLS and AUTH (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id uk-mta-186-zBF_QFIGN8iskqJS4NvIGg-1; Mon, 16 Dec 2024 15:09:19 +0000 X-MC-Unique: zBF_QFIGN8iskqJS4NvIGg-1 X-Mimecast-MFC-AGG-ID: zBF_QFIGN8iskqJS4NvIGg Received: from AcuMS.Aculab.com (10.202.163.4) by AcuMS.aculab.com (10.202.163.4) with Microsoft SMTP Server (TLS) id 15.0.1497.48; Mon, 16 Dec 2024 15:08:14 +0000 Received: from AcuMS.Aculab.com ([::1]) by AcuMS.aculab.com ([::1]) with mapi id 15.00.1497.048; Mon, 16 Dec 2024 15:08:14 +0000 From: David Laight To: 'Jiri Olsa' , Oleg Nesterov CC: "linux-mm@kvack.org" , Peter Zijlstra , Andrii Nakryiko , "bpf@vger.kernel.org" , Song Liu , Yonghong Song , John Fastabend , Hao Luo , Steven Rostedt , Masami Hiramatsu , Alan Maguire , "linux-kernel@vger.kernel.org" , "linux-trace-kernel@vger.kernel.org" Subject: RE: [PATCH bpf-next 08/13] uprobes/x86: Add support to optimize uprobes Thread-Topic: [PATCH bpf-next 08/13] uprobes/x86: Add support to optimize uprobes Thread-Index: AQHbS9GTSF5rwnXysUaDsufIVqYKlLLnNYvQgAFWD+SAAA72QIAAE/UAgAAKdkCAACFInYAAIoEw Date: Mon, 16 Dec 2024 15:08:14 +0000 Message-ID: References: <20241211133403.208920-1-jolsa@kernel.org> <20241211133403.208920-9-jolsa@kernel.org> <1521ff93bc0649b0aade9cfc444929ca@AcuMS.aculab.com> <20241215141412.GA13580@redhat.com> <20241216101258.GA374@redhat.com> <0916e24539ba4bae9fb729198b033bd7@AcuMS.aculab.com> <20241216122204.GB374@redhat.com> In-Reply-To: Accept-Language: en-GB, en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ms-exchange-transport-fromentityheader: Hosted x-originating-ip: [10.202.205.107] MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-MFC-PROC-ID: bPqvGw8RXd6CLOWVnuzegFAiYQZY54UhWWlIKgLLbIg_1734361758 X-Mimecast-Originator: aculab.com Content-Language: en-US Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 6152A40008 X-Rspam-User: X-Rspamd-Server: rspam07 X-Stat-Signature: 6t5kgogmnadewuo3zpbichyj8wuiecji X-HE-Tag: 1734361723-774310 X-HE-Meta: U2FsdGVkX1/jnqjDIm84wPuowMZEFu47xEwq2ARBxqlVdcVc3PD8xhxpnaAM46WQ3F/6nyKfBEGrSk9gTpT7atGUWkf9hIz/wUaoB2LWNbMmI/LWsg7wjw9egwMqlBFZOswV00Jc5v3jGRweRT5PjRS9M2xxVbGtUrquuCrrn+yZLBS0uRQXwmf04KNXqUNgsg1LNzUTYNBTA7WJBlLt8RPxQV+4fnHw35XpBc+BAxPtl9tW7YcK8g8+Or4CWiHM7C0t1y63spvyTncOfE+ifa9Nn/6/rUHfONS/muWp/1/Y5Rkf0ygMXCZbYf0lAFaHK/SlPJHxRIEasHnVORLVDd62DO5a2dOT46QVSjbAubwQ9NQ1fygFAql9QId1lrsry7VgKRZMUYFt+IKUDf4ggnvSehqhNwsKCR04Fti+WnlIoM07FnPLq6ChcnxC7W2CmzU8zMjlqzJ8Kz7h84GUoeHYkgKx/cG64s3q/xbAENZ5QEtK+HOOQR+13wFELbrXvRUj51nEP6WxKJdeFe2xDodQD0NrJSn121ffplbvgRhLiZmvTudu+Sc2/behpeqVLZqvik4CvSIDCRGobuCgTJh/M2UJg1H8NYYAINn8kM4KKg1rZw8AxxcWPxTR2XrvpShrTK/CFRxe4y7wSKfr/fO3zQT279Y0JJrmxUPZP6NwJ5hDc/sQSt4eIxC2pwATqchZ7FqrflqKLyHEpi5x1duTGqjK2ORxnLIwwoIIwxBAsp7PQFMScLs78K8DOg3UfxJPSwA0EwgVt13Cau9nkbLPbmttpXDSBLN2aIRl92xR391m/0iBGdUegmMxVaUf0giCD/6Bw9x7FUkc4VarG8mSF+ULYMuY4+s3mfKdpfCz+Nn6s8SOoT0w5LTd3lku9egbF15BPaw1D8IAbDyBQdWGJYl7TWAXNpu3BcfWeVxXyAZi7w3A4csSw8AB3CZ/JIeQaP974jZRkHzyNM7 Eu0TFhQK PobZVHEv/W22z7a+W/ZqLDttzdHqZWSvKLaL0SsmME2BnXTIwLypemgvNVXXcn3meZ2Ky1YVbtjxSUawNT/ZiEij95fALTZ5MKurCOWtm/sI8jy3UBLBVJ1Lw7RYhVHwPgbjgC9aBk1ZCpOcW68qHvp0jHZGPt06TDmXGbyNdRLObH+G8GAR2kN+FZFYqW2oUG7TCJxITzb3I9jBTPZphT+qtJqMoGcc/dTASLNVTbUUiqJefhSzXerSyA4DFAU5MEkqDgMWvdzD+NK8zCEb9Yn04CWxrFhH4v62kO68Yrxcw+liIheGAn2B7CHGTnDqJMyDvkroyN6FmWMbytf8m+pY8thYvGkVWAsOzVOu75A4+L+o7Hp3H4At70wjgyrMkkiizaCtScB60PFA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Jiri Olsa > Sent: 16 December 2024 12:50 >=20 > On Mon, Dec 16, 2024 at 01:22:05PM +0100, Oleg Nesterov wrote: > > OK, thanks, I am starting to share your concerns... > > > > Oleg. > > > > On 12/16, David Laight wrote: > > > > > > From: Oleg Nesterov > > > > Sent: 16 December 2024 10:13 > > > > > > > > David, > > > > > > > > let me say first that my understanding of this magic is very limite= d, > > > > please correct me. > > > > > > I only (half) understand what the 'magic' has to accomplish and > > > some of the pitfalls. > > > > > > I've copied linux-mm - someone there might know more. > > > > > > > On 12/16, David Laight wrote: > > > > > > > > > > It all depends on how hard __replace_page() tries to be atomic. > > > > > The page has to change from one backed by the executable to a pri= vate > > > > > one backed by swap - otherwise you can't write to it. > > > > > > > > This is what uprobe_write_opcode() does, > > > > > > And will be enough for single byte changes - they'll be picked up > > > at some point after the change. > > > > > > > > But the problems arise when the instruction prefetch unit has rea= d > > > > > part of the 5-byte instruction (it might even only read half a ca= che > > > > > line at a time). > > > > > I'm not sure how long the pipeline can sit in that state - but I > > > > > can do a memory read of a PCIe address that takes ~3000 clocks. > > > > > (And a misaligned AVX-512 read is probably eight 8-byte transfers= .) > > > > > > > > > > So I think you need to force an interrupt while the PTE is invali= d. > > > > > And that need to be simultaneous on all cpu running that process. > > > > > > > > __replace_page() does ptep_get_and_clear(old_pte) + flush_tlb_page(= ). > > > > > > > > That's not enough? > > > > > > I doubt it. As I understand it. > > > The hardware page tables will be shared by all the threads of a proce= ss. > > > So unless you hard synchronise all the cpu (and flush the TLB) while = the > > > PTE is being changed there is always the possibility of a cpu picking= up > > > the new PTE before the IPI that (I presume) flush_tlb_page() generate= s > > > is processed. > > > If that happens when the instruction you are patching is part-read in= to > > > the instruction decode buffer then you'll execute a mismatch of the t= wo > > > instructions. >=20 > if 5 byte update would be a problem, I guess we could workaround that thr= ough > partial updates using int3 like we do in text_poke_bp_batch? >=20 > - changing nop5 instruction to 'call xxx' > - write int3 to first byte of nop5 instruction > - have poke_int3_handler to emulate nop5 if int3 is triggered > - write rest of the call instruction to nop5 last 4 bytes > - overwrite first byte of nop5 with call opcode That might work provided there are IPI (to flush the decode pipeline) after the write of the 'int3' and one before the write of the 'call'. You'll need to ensure the I-cache gets invalidated as well. And if the sequence crosses a page boundary.... =09David >=20 > similar update from 'call xxx' -> 'nop5' >=20 > thanks, > jirka >=20 > > > > > > I can't remember the outcome of discussions about live-patching kerne= l > > > code - and I'm sure that was aligned 32bit writes. > > > > > > > > > > > > Stopping the process using ptrace would do it. > > > > > > > > Not an option :/ > > > > > > Thought you'd say that. > > > > > > =09David > > > > > > > > > > > Oleg. > > > > > > - > > > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes,= MK1 1PT, UK > > > Registration No: 1397386 (Wales) > > > > > - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1= PT, UK Registration No: 1397386 (Wales)