From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DB6D8C02188 for ; Mon, 27 Jan 2025 19:27:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6BDD428019C; Mon, 27 Jan 2025 14:27:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 66DFC280191; Mon, 27 Jan 2025 14:27:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5361928019C; Mon, 27 Jan 2025 14:27:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 34C36280191 for ; Mon, 27 Jan 2025 14:27:18 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id A70C1A0618 for ; Mon, 27 Jan 2025 19:27:17 +0000 (UTC) X-FDA: 83054215314.02.F63E2BE Received: from mail-qv1-f43.google.com (mail-qv1-f43.google.com [209.85.219.43]) by imf04.hostedemail.com (Postfix) with ESMTP id BF15F40011 for ; Mon, 27 Jan 2025 19:27:15 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DExrZBPt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.219.43 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1738006035; a=rsa-sha256; cv=none; b=c+jXSBxHPi0JTrRyYfySpFPAhnToGa+2nB05gphtq5LoeIUIgSdOPCgGHmoLfv4UXOsUTp iABLbT3QKpKNV+ZLGEyr2G/qVU2OAM7F0a3Gg64orI7x65F6t9oeBPX/QtU0DI8zFUbDmm ioD1AIfEFqcFyR+ONVR9/dNIa+vkuX0= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=DExrZBPt; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf04.hostedemail.com: domain of andrii.nakryiko@gmail.com designates 209.85.219.43 as permitted sender) smtp.mailfrom=andrii.nakryiko@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1738006035; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CNopMD8bxFrUq1IsZsiT99nre+ljuZTk7nAPSQpUuPI=; b=OrmQkgX1Z55utp2nVJW+0/0f1DGVqNhcYIiJotMBng8l7aEZk6rl4kEU1CLCyZSTreHyPm b9l3lP3K8jm/6VpR65kU3ip2CF9JEouFOtjMcvSNEl6XRa+ND1CQzXa6TBPB7fryjxCFa6 MdDoRXjmP3sFnwWCBRMncyW2fiojcCk= Received: by mail-qv1-f43.google.com with SMTP id 6a1803df08f44-6e1a41935c3so72959136d6.3 for ; Mon, 27 Jan 2025 11:27:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1738006035; x=1738610835; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=CNopMD8bxFrUq1IsZsiT99nre+ljuZTk7nAPSQpUuPI=; b=DExrZBPt5uy1YsF8Cf2z7WFvJCg48cvkGHFZZhaZVPkUpivUZm3UKxsFGknZOwW1Ww hrLNP0WQpo57bDQw8ABsRChTwB9PSliyBqXOCNi4oSo8FXQOVaTswMbFyDJFCLoKHviV /ZuIuEmY+idUPGffAFypXsMgVpmcCpNGQwde1OXBei2JjvB0/kpJRHiulHq+ZSPslW9k iIz+NkbE42NlDWod949oPq28OOI5D3VhmstSLfD8hnUIKBrs/XDGIaG2G8z7/hldwPSu 2QdIO0cXbkPmKRq4gC7VjDbkvC22KMIOGugoqvWWJomGQ4dHKhjsseOnud+NLr+ZZHV2 VXlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1738006035; x=1738610835; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=CNopMD8bxFrUq1IsZsiT99nre+ljuZTk7nAPSQpUuPI=; b=RRl9gigPrqtZY7d1rBoSFJEr52Qcva1hzd+6+48Ny0AXlHi/ktHypz/lsSrJQW8O7r eG3RlZ1S1VOLLS4hv5QzYoQe1h7OTVeeJ56nnm9x/CeogDpTKEVgws0sL+7OzRY1TYxb CGVs8PXxEm/qeOZ7We28LoSReaOvvguqpWFUyRbnXwenntyhQ4iPWbPHVIR3H3Ff2JFp 2xl1dUaiGX0TbXW/nD45dirEt91QmzZH4sqE3kMe0jPekderTexBwtBQ3uMmF7TO4WKa 8UJCr1wMVobCqcl2lRW2IJNTeWSmo+TpgRgDDH8sigmwv4GbyaTzqQxRsoB1o1t/9N3X BmJg== X-Forwarded-Encrypted: i=1; AJvYcCWd3OE8UKoEuqwL07o4W9vtZ6WPwLE78DzHXp581f8mIM7d0Iob3/uojgvXdQ0tVqukvyPBlYYfPg==@kvack.org X-Gm-Message-State: AOJu0YyTRYsstEmlliMTFqdESQb2bj0QhiYTPAHVw8qUXfZF+2yR2erk 5SypZlPD1pysGkYaZign92dW3L4/tIQdgBCkTY2ignBwUrbkC7QPBi/jDgz6vGwvOpZkv1zqoxG Guz2qZClsTMGL8Z6GR+YQNlfH9AVjfw== X-Gm-Gg: ASbGnctA8NuHrINwAesH2rPYT04mLzbAIi8qk1rUxTHqAJwjiHQNRoaEUftk0puO0lT CO7oqVdg9V9bibHXNM5/Sk1XDuPIz3fk3Qg/t7L5qBMa1t68gFESTsdOVoc/CcwgZ8TtncOdZII lN5A== X-Google-Smtp-Source: AGHT+IGRCr/wIjb/0gdZpCqJWXQ1D02zXueEkzLpFPZAUdLCZ0Q2U5abQO9U7aXNEJY2hi0z7CKA2pOjbXbJx8L31wQ= X-Received: by 2002:a05:6a21:3388:b0:1e1:a9dd:5a68 with SMTP id adf61e73a8af0-1eb2145db73mr67058622637.1.1738005673604; Mon, 27 Jan 2025 11:21:13 -0800 (PST) MIME-Version: 1.0 References: <20250124181602.1872142-1-linux@jordanrome.com> In-Reply-To: From: Andrii Nakryiko Date: Mon, 27 Jan 2025 11:21:01 -0800 X-Gm-Features: AWEUYZnvyfPel3phqu1es4xEAOyK4NuRwd4Gnqr0Bf8_-ahK2n9mBPDT-6n5-cM Message-ID: Subject: Re: [bpf-next v3 1/3] mm: add copy_remote_vm_str To: Jordan Rome Cc: bpf@vger.kernel.org, linux-mm@kvack.org, Alexei Starovoitov , Daniel Borkmann , Andrii Nakryiko , Kernel Team , Andrew Morton , Shakeel Butt , Alexander Potapenko Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: BF15F40011 X-Stat-Signature: dg3zwiatog98pufw7ibdexzg3kgzghi8 X-HE-Tag: 1738006035-879127 X-HE-Meta: U2FsdGVkX1/0Tmx1+MjUmU+2IvPM5dst2iPl8V73lx1irmhUsef2IRMhbExVv6J5Qra2HHc7k0zhAOv0oRCL4BAaPj4b0nbUfl2RMC1QgfbfhKE0Pndq/qZrEDPu1a94zbk8bmcVhVGPjU5GlosATF0cEyJwW/BEtWb8yq7J+R6wpXylmlAs9ID0uXztmM8jMbjJPg39lnTu3iMAnqHxsdswYJlyVWu5Qj/3/gQf0xnLIxlDlZwkBvPJP+sL0Z4q9sJg1CVslz5YfQg09zIuBODO1Hx/7NlDD/9kaCcRAvBphuV4FlYBv/tqk40UZYsCY9+hfEJ5sO02oLPBcOGbDcqmurNIYPhBXEBObNr2/gxz3dV6dCkezFJc7PcjF7ouUlSiRPc9eD1bOp53+xUgDcRe2Wy0qo83JuYNxSGZBZ9y63yzPrZ5YV2B39ym9qHPrbTlyx7ewTyI4lRi/pcuegYZ3V58DR4yrGzzS0ZGG1IxnSLKSqYOLy/WzEzmG94Klu9CtT7pGk7S8rzqbJ+Dhy33QjRP5t2smfXUK+getmYrCA8iw6zeeki8TklJ8YJJTro82qMKe8UG70H6QO+cbW6PAE4dRNMtUUASebT0ECGxhhTkX7XVVax1KufxlnAynCDvhWMumwUAukwWA4jDD0SOfoQw3O8UaBLhtEbkRJ61I3lAj8GS8oVIMb0z34OpOe93sVbeZnoXRNASvfcf3a09qVINFdXafdJ7vV/jk1WwRJyMqH0nEcW9GMaQo11zMLnIT4UpUXhk6Fw7PzrC8413PWQX5uYsN/U+QxscVRQPmtr0zuPbuMi3UyU4oRIkDfcwZQnXC4j5fWCP7aJp7CFax1pWDUFIjEZiLHoGq6ApqEvGTobAORpxte8rrahXsMreozqpHXZ+aM/7PujeV9AdBPfyXhZaIX+wAv5xHR6FKZA3ezfmEL5t88O1UyU26LahafGwAGxxsEQWWSi leSPS84n RPCIyc2XZk8Wv0V/9cQMK+A2HXgE5Enb92I/Asa1JRIL+waTMcC6JbJx4eZB4Ht2bWjuwuZFvqYRusBCbY8ewiDNK0WmExgk+2df5VZw1P1hA9Z08GQciz+Lloe+gT3aooQiZVCu5P3AczcgeCtEnBh1Qj/ahUsRPZYtef/WodmjfTUtl7a8bAho4N1NPzbU9RwIudZOSLguqznUmSR5pQ4hBDe3RpTq/PlABSt7qBbqYn3FUyjo2R8+lHOFUkizJnL3XENfbBtLgW5P7Gqb72VwL+ZqWEZcBAI4UFg8q7NsH0BnNyBgOCh7DId2FmjgasA3t5gjI9tuF5OX5v1bHg2qRKsl+rrY7IGkg16eiMjwhX6Q2LVW/NbHhaFNEbBW4LgX7Az5kpxVDOeeEDS7J1X0T+1zX/qXmZa5TYAPMUTyNdYyHDBvDhjHEQx2euJHdd+1h3x+iWnpIqDc4hdWRgM+PlKClKKWFldhx X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Sat, Jan 25, 2025 at 6:02=E2=80=AFAM Jordan Rome = wrote: > > On Fri, Jan 24, 2025 at 7:09=E2=80=AFPM Andrii Nakryiko > wrote: > > > > On Fri, Jan 24, 2025 at 10:22=E2=80=AFAM Jordan Rome wrote: > > > > > > Similar to `access_process_vm` but specific to strings. > > > Also chunks reads by page and utilizes `strscpy` > > > for handling null termination. > > > > > > Signed-off-by: Jordan Rome > > > --- > > > include/linux/mm.h | 3 ++ > > > mm/memory.c | 119 +++++++++++++++++++++++++++++++++++++++++++= ++ > > > mm/nommu.c | 68 ++++++++++++++++++++++++++ > > > 3 files changed, 190 insertions(+) > > > > > > > [...] > > > > > + maddr =3D kmap_local_page(page); > > > + retval =3D strscpy(buf, maddr + offset, bytes); > > > + unmap_and_put_page(page, maddr); > > > + > > > + if (retval > -1 && retval < bytes) { > > > + /* found the end of the string */ > > > + buf +=3D retval; > > > + goto out; > > > + } > > > + > > > + if (retval =3D=3D -E2BIG) { > > > > nit: strscpy() can't return any other error, so I'd structure result > > handling as: > > > > if (retval < 0) { > > /* that annoying last byte copy */ > > retval =3D bytes; > > } > > if (retval < bytes) { > > /* "we are done" handling */ > > } > > > > /* common len, buf, addr adjustment logic stays here */ > > > > Ack. Actually, I thought of a way to make this cleaner > and correct. > > > > > but also here's the question. If we get E2BIG, while bytes is exactly > > how many bytes we have left in the buffer, the last byte should be > > zero, no? So this should be cleanly handled, right? Or do we have a > > test for that and it works already? > > > > Ok, I found an inconsistency that gets handled in the BPF helper > `bpf_copy_from_user_task_str`, which I'm going to fix in the next > version of this patch. > > Let me explain how this function is SUPPOSED to work > and enumerate some different test cases (a lot of which are in commit 3). > > Note1: all of the target strings > are null terminated (if you try to copy a string that's not null terminat= ed > you may accidentally copy junk). > > Note2: strscopy returns E2BIG if the len requested isn't as long > as the string INCLUDING the nul terminator. So if you want to copy > "test_data\0" you need to request 10 bytes not 9. And if you request > 10 or anything greater it returns 9. yeah, that's actually smart for strscpy() to have this protocol, it allows to read full string and know that it's full (if the string fits exactly into the dst buffer). But I think on the BPF side we have a convention to return the size *including* the NUL byte, so we get a bit of inefficiency. But oh well, I think consistency is better to be maintained. > > Note3: This function, as opposed to the bpf helper calling > this function, returns the number copied NOT including the nul terminator= . > > So... the target string is "test_data". > > Request 10 bytes, return 9. > Request 2 bytes, return 1. > Request 20 bytes, return 9. > > Now, let's say this string falls across a page boundary > where "_" is the last character in the page. > > Request 10 bytes, which becomes a request > for 5 bytes for the first page. strscopy returns E2BIG > and copies "test\0" into the buffer. We copy the last > bytes of the page into the buffer, which is the "_", > and then request 5 more bytes on the next page, > copying "data\0" and strscopy returns 4. Return 9. > > Now let's say the last "a" is the last character on the page. > Request 10 bytes, which becomes a request > for 9 bytes. strscopy returns E2BIG and copies "test_dat\0" > into the buffer. Once again we copy the last byte > of the page into the buffer, which is "a" > and we request 1 more byte of the next page, which > is the nul terminator. strscopy returns 0 and this > function returns 9. so I was worried about the situation where the string is test_data|\0, like you describe (where | is denoting page boundary, NUL is right after the page end), and we request 9 bytes (not 10 like in your example). By manually copying that "a" (last byte of the page), we might not return zero terminated dst buf (that was my theory). Would be good to have a test proving otherwise. But I'll go and read the latest version, maybe I'm overthinking this. > > Again note, that this version of the code has a bug > that is "handled" by the bpf helper and I'm going to fix. > > > > + retval =3D bytes; > > > + /* > > > + * Because strscpy always null terminates we = need to > > > + * copy the last byte in the page if we are g= oing to > > > + * load more pages > > > + */ > > > + if (bytes < len) { > > > + end =3D bytes - 1; > > > + copy_from_user_page(vma, > > > + page, > > > + addr + end, > > > + buf + end, [...]