From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 28C7AC47DDB for ; Thu, 1 Feb 2024 04:06:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B834C6B0085; Wed, 31 Jan 2024 23:06:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B32CF6B0087; Wed, 31 Jan 2024 23:06:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9FC176B0088; Wed, 31 Jan 2024 23:06:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 8ABB46B0085 for ; Wed, 31 Jan 2024 23:06:31 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 295448069C for ; Thu, 1 Feb 2024 04:06:31 +0000 (UTC) X-FDA: 81741898182.25.5964C84 Received: from mail-yw1-f180.google.com (mail-yw1-f180.google.com [209.85.128.180]) by imf29.hostedemail.com (Postfix) with ESMTP id 2D190120007 for ; Thu, 1 Feb 2024 04:06:28 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=umich.edu header.s=google-2016-06-03 header.b=FOWrtkFc; spf=pass (imf29.hostedemail.com: domain of tmgross@umich.edu designates 209.85.128.180 as permitted sender) smtp.mailfrom=tmgross@umich.edu; dmarc=pass (policy=none) header.from=umich.edu ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1706760389; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A9cS7nVgUKdaXy/fPao/HO1z3bVHBL7mwNKzJyJGoTI=; b=qxm9omRAe8gSiWh9QGT+JN4WRbamjpQoiyWMdmqrvKjjdFiW6WdGJTxhRFoJBAPTsikwhW fjAZfrH7OEzSFpTC3Zm5CaDDudeezASYh0uDBlpSB+UTEcaXDiM7EDXyVK+1AqH25Um86X CdL5G3o/fccNYof1paB8iO417XSSeeU= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=umich.edu header.s=google-2016-06-03 header.b=FOWrtkFc; spf=pass (imf29.hostedemail.com: domain of tmgross@umich.edu designates 209.85.128.180 as permitted sender) smtp.mailfrom=tmgross@umich.edu; dmarc=pass (policy=none) header.from=umich.edu ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1706760389; a=rsa-sha256; cv=none; b=nv+sZ4p6Y4tNpxSUb38bZ5HuG8s7jLSgQDp5Ie0FztC3ygy7QN00grPa3tBwE2vq9n1dEA bbmWUptU01iZD+SLI3qvjYpt3n0nKMD0kNbg2v0iZiWcBk2Sfv4xQ1KUfxjHukjFjP8mxj huHdkczDqKi2sB9tdyAHapWoLkRSnxc= Received: by mail-yw1-f180.google.com with SMTP id 00721157ae682-6040d33380cso5410997b3.1 for ; Wed, 31 Jan 2024 20:06:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=umich.edu; s=google-2016-06-03; t=1706760388; x=1707365188; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=A9cS7nVgUKdaXy/fPao/HO1z3bVHBL7mwNKzJyJGoTI=; b=FOWrtkFc3otlsKIM+NHa5O6ilodf/izAvKL8fg3f1QYa94SbmLcahof/zqazkCkh+Z fyFy58jgJDQuSGL2GvjNg7ZvSZtZJ314tzADFJgexv/d1UGrm5uI0RFjXu6HZ42YV8hj 62jR5CPSUiRVR6MmDk6fwtLIBiTD2h03EVjZPOYmJJV3Q4pRYfUeKD9cDLDAAiPUb5sl PwR8OVBa9w2q2nRMCDgSNGGSM6Wwr51AVF4Ir8OPMEy5uaqZpJWROiu7Qe4q9kyct3JQ NB6qkzLLVQOyl1ZqXQ/ZJrrwE/l5Mq7IVFAe3gGOdQy5ynVg+iAPI7rYGMAh5+ahxp99 XsNg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1706760388; x=1707365188; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=A9cS7nVgUKdaXy/fPao/HO1z3bVHBL7mwNKzJyJGoTI=; b=Qzk1r1vPuCcllaR1wovQeUbUxe+oOa5KHUfgeMvrOA9A7SUmzZB2C6UIQvp1dgIg9N koFRh9xQPcrGm/WFSm30oNYXHlZGBQ1LxEX4zm/psEi6fH8AZl0u/LpEsh09V+uv6TTO 2pohxoxc4lUFNkqafiEqU98wrOFQzIHcuCOKknmI/XCf5ll5DwKbPkCVf7FviIom933W B7J0rV6qX60aW5GbEwXh4X45Q5+eTxYVj2Z+NaV9YT4qgAX+7vpJZmlCrm+2qTeWy1qJ kqF5492fruclT25m3DFehUAXThn3otAQndtX+Uo29VzyFFuWdIvkntbNYrNK0NTCtSwm 6BRQ== X-Gm-Message-State: AOJu0YyVQBAmsF4B+HFRkQUMoaQgiKmhVEmePHtwMGalCWga4/p9oyIW a/DcRJCCeMakNdOHpP3YdVWUfIONp9pNpoZ2Ud0N8ToXbmpQFYIsamzMWjQUtnDlKplpGEBhI26 htnX/YOn+9gWsekqXss5bfxZkAAg+/A59sKn6Xg== X-Google-Smtp-Source: AGHT+IGfqAEiO8Q4ie3oM3T9CvkiCmr5jcU9wT9zIy6Qq8snln0yc7tE2UthbmMTAGRW+fDaKPTDB7UfSJLu8wFfjCA= X-Received: by 2002:a81:ae64:0:b0:603:cb7e:8df with SMTP id g36-20020a81ae64000000b00603cb7e08dfmr3812460ywk.0.1706760388236; Wed, 31 Jan 2024 20:06:28 -0800 (PST) MIME-Version: 1.0 References: <20240124-alice-mm-v1-0-d1abcec83c44@google.com> <20240124-alice-mm-v1-1-d1abcec83c44@google.com> In-Reply-To: <20240124-alice-mm-v1-1-d1abcec83c44@google.com> From: Trevor Gross Date: Wed, 31 Jan 2024 23:06:17 -0500 Message-ID: Subject: Re: [PATCH 1/3] rust: add userspace pointers To: Alice Ryhl Cc: Miguel Ojeda , Alex Gaynor , Wedson Almeida Filho , Boqun Feng , Gary Guo , =?UTF-8?Q?Bj=C3=B6rn_Roy_Baron?= , Benno Lossin , Andreas Hindborg , Kees Cook , Al Viro , Andrew Morton , Greg Kroah-Hartman , =?UTF-8?B?QXJ2ZSBIasO4bm5ldsOlZw==?= , Todd Kjos , Martijn Coenen , Joel Fernandes , Carlos Llamas , Suren Baghdasaryan , Arnd Bergmann , linux-mm@kvack.org, linux-kernel@vger.kernel.org, rust-for-linux@vger.kernel.org, Christian Brauner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 2D190120007 X-Rspam-User: X-Stat-Signature: 5qrehj9rfi3re38hwp146xqy7ycuk7g1 X-Rspamd-Server: rspam01 X-HE-Tag: 1706760388-559100 X-HE-Meta: U2FsdGVkX18Sw5sye74a+39am7w7hs8KZjRCox5IW49Cb1PmWI/FMgGCORGs9Y26DqAOE6mNFcrQZDNVnLKL+L4C4khi17R0e4GpFudCINPLdblq7CPmZfPi98SG5x9nYTuNachPFmDgRMCoA8r2OiaE9My4lA6XS/FcUvwYSgSeQTGv/2pIQGmQOvmsY1PINZLfNWl21e2qk0qgUcf6OvViRaxSy15cda88DU5CRYECNxIPHYet+OvhMPCxQ37ANiT5o2cef2tJcVw/8oxmxgBRMrMq1MVzFfL2yBNZEvgayY40wQAbYhcsvAbMSLx2VdsunrYNs6pA1GrxnZZamym9L+q27nThavzDmCx4uHKepSlUS3WKvR0ml5++1yOr9KUf2k4UQANkSTXQsRMIXseEdJoFdAj6QRTyN3D6EhOzeLcCT51tkjghfQNZvl35YppakAsZy7Fzijris75WlsdyK9fb7RJcCsq5ZOTWoKQQWLR5gOQy3afN2kIYZRBb0d/NGAzPNA2c+VNvsFhuoNP59SnPCPhjgJmFWkPZe5BirI3TYHC1KYo3xZfAY0MRPrP6tTBmBhZF9b9HCE56v/4KzCrr7cfzy0m0kYSUESkd45CkoYH1TNxVt7DtsXWIChvp/S7KuynNUcuglS8cnX8oMGpIFTwkkecURsAEnEid61vyze2TvD4C440SM73BFwsofbvtJKj4wgwEGjpg9jRUNRSltsFdPUjP/qzgJ8bk4dg4qBu8OV4iODXySNXoEzpNNw45r/XU81/6uLGZJLPMbCJDBRtBbovHEHpkeHTIv+rF1UKz4kFKrtnCXxzdgc+jD5la2BEldt5LKVYA/yfyoTTvXvxBmnshkqdPYar3fl2ELSg22NJL+lDGfYRp+6bHr89+1pIdW4pZzhkIHtAtUECStAmUu4O2Au9xj4n/fPEAeNqK0aK4Qu4LqIyu95gUi9qSOvOdBLvyjuL LNcN6ZuY 0E4dwBTFxyRhCVaI9EsDnZ5OX1GYor+ZLF6/9vED8PHLJjcyWk+zADrElJyluu64x00gTtyAEVOPDcFMSoZl7xh42IJotACEfVAUDbcNmb+5B001F61LcmDbBGtuzst2+3uW9dN333YngRhzv5MuQiSaM2gFUBnPLBTiojO6OdmYjtwGTrycnaYlmVD2iZVLxpcizmeW0cZ1zQQ+yKarj7ot78VnURYH7KHVlcNPuZsPqOtIKGZC65g/ZW+sqv30EkeANvnRyZVjIVOEXqL7U+rIinSSjzTN27RXWabE2av1qH+8j1L9Gl7dTsUgsn/O/ULCPOd3Dfrk2qIeKbI7wo6ajqiq8EypMEYNbfcUfg5G/7FBsxvRtf9UoVoLvjboIdnghvct9KLAID2Os32KV87URoEgiitwcyyKQoCkd6iroNr1kSiCAfgOe4WPbyNa6J33aJblp2Q6QHQHLWe8VdOezXg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jan 24, 2024 at 6:21=E2=80=AFAM Alice Ryhl w= rote: > --- /dev/null > +++ b/rust/kernel/user_ptr.rs > @@ -0,0 +1,222 @@ > +// SPDX-License-Identifier: GPL-2.0 > + > +//! User pointers. > +//! > +//! C header: [`include/linux/uaccess.h`](../../../../include/linux/uacc= ess.h) > + > +// Comparison with MAX_USER_OP_LEN triggers this lint on platforms > +// where `c_ulong =3D=3D usize`. > +#![allow(clippy::absurd_extreme_comparisons)] > + > +use crate::{bindings, error::code::*, error::Result}; > +use alloc::vec::Vec; > +use core::ffi::{c_ulong, c_void}; > + > +/// The maximum length of a operation using `copy_[from|to]_user`. > +/// > +/// If a usize is not greater than this constant, then casting it to `c_= ulong` > +/// is guaranteed to be lossless. > +const MAX_USER_OP_LEN: usize =3D c_ulong::MAX as usize; > + > +/// A pointer to an area in userspace memory, which can be either read-o= nly or > +/// read-write. > +/// > +/// All methods on this struct are safe: invalid pointers return `EFAULT= `. Maybe ... attempting to read or write invalid pointers will return `EFAULT`. > +/// Concurrent access, *including data races to/from userspace memory*, = is > +/// permitted, because fundamentally another userspace thread/process co= uld > +/// always be modifying memory at the same time (in the same way that us= erspace > +/// Rust's [`std::io`] permits data races with the contents of files on = disk). > +/// In the presence of a race, the exact byte values read/written are > +/// unspecified but the operation is well-defined. Kernelspace code shou= ld > +/// validate its copy of data after completing a read, and not expect th= at > +/// multiple reads of the same address will return the same value. > +/// > +/// These APIs are designed to make it difficult to accidentally write T= OCTOU > +/// bugs. Every time you read from a memory location, the pointer is adv= anced by > +/// the length so that you cannot use that reader to read the same memor= y > +/// location twice. Preventing double-fetches avoids TOCTOU bugs. This i= s > +/// accomplished by taking `self` by value to prevent obtaining multiple= readers > +/// on a given [`UserSlicePtr`], and the readers only permitting forward= reads. > +/// If double-fetching a memory location is necessary for some reason, t= hen that > +/// is done by creating multiple readers to the same memory location, e.= g. using > +/// [`clone_reader`]. Maybe something like Every time a memory location is read, the reader's position is advanced= by the read length and the next read will start from there. This helps pre= vent accidentally reading the same location twice and causing a TOCTOU bug. Creating a [`UserSlicePtrReader`] and/or [`UserSlicePtrWriter`] consumes the `UserSlicePtr`, helping ensure that there aren't multiple readers or writers to the same location. If double fetching a memory location... > +/// > +/// Constructing a [`UserSlicePtr`] performs no checks on the provided a= ddress > +/// and length, it can safely be constructed inside a kernel thread with= no > +/// current userspace process. Reads and writes wrap the kernel APIs Maybe some of this is better documented on `new` rather than the type? > +/// `copy_from_user` and `copy_to_user`, which check the memory map of t= he > +/// current process and enforce that the address range is within the use= r range > +/// (no additional calls to `access_ok` are needed). > +/// > +/// [`std::io`]: https://doc.rust-lang.org/std/io/index.html > +/// [`clone_reader`]: UserSlicePtrReader::clone_reader > +pub struct UserSlicePtr(*mut c_void, usize); I think just `UserSlice` could work for the name here, since `SlicePtr` is a bit redundant (slices automatically containing a pointer). Also makes the Reader/Writer types a bit less wordy. Though I understand wanting to make it discoverable for C users. Is some sort of `Debug` implementation useful? > +impl UserSlicePtr { > + /// Constructs a user slice from a raw pointer and a length in bytes= . > + /// > + /// Callers must be careful to avoid time-of-check-time-of-use > + /// (TOCTOU) issues. The simplest way is to create a single instance= of > + /// [`UserSlicePtr`] per user memory block as it reads each byte at > + /// most once. > + pub fn new(ptr: *mut c_void, length: usize) -> Self { > + UserSlicePtr(ptr, length) > + } > + > + /// Reads the entirety of the user slice. > + /// > + /// Returns `EFAULT` if the address does not currently point to > + /// mapped, readable memory. > + pub fn read_all(self) -> Result> { > + self.reader().read_all() > + } std::io uses the signature fn read_to_end(&mut self, buf: &mut Vec) -> Result { ... } Maybe this could be better here too, since it allows reusing the vector without going through `read_raw`. https://doc.rust-lang.org/std/io/trait.Read.html#method.read_to_end > + /// Constructs a [`UserSlicePtrReader`]. > + pub fn reader(self) -> UserSlicePtrReader { > + UserSlicePtrReader(self.0, self.1) > + } I wonder if this should be called `into_reader` to note that it is a consuming method. Indifferent here, since there aren't any non-`self` variants. > + > + /// Constructs a [`UserSlicePtrWriter`]. > + pub fn writer(self) -> UserSlicePtrWriter { > + UserSlicePtrWriter(self.0, self.1) > + } > + > + /// Constructs both a [`UserSlicePtrReader`] and a [`UserSlicePtrWri= ter`]. Could you add a brief about what is or isn't allowed when you have a reader and a writer to the same location? > + pub fn reader_writer(self) -> (UserSlicePtrReader, UserSlicePtrWrite= r) { > + ( > + UserSlicePtrReader(self.0, self.1), > + UserSlicePtrWriter(self.0, self.1), > + ) > + } > +} > + > +/// A reader for [`UserSlicePtr`]. > +/// > +/// Used to incrementally read from the user slice. > +pub struct UserSlicePtrReader(*mut c_void, usize); > + > +impl UserSlicePtrReader { > + /// Skip the provided number of bytes. > + /// > + /// Returns an error if skipping more than the length of the buffer. > + pub fn skip(&mut self, num_skip: usize) -> Result { > + // Update `self.1` first since that's the fallible one. > + self.1 =3D self.1.checked_sub(num_skip).ok_or(EFAULT)?; > + self.0 =3D self.0.wrapping_add(num_skip); I think it would be better to change to named structs to make this more cle= ar. > + Ok(()) > + } > + > + /// Create a reader that can access the same range of data. > + /// > + /// Reading from the clone does not advance the current reader. > + /// > + /// The caller should take care to not introduce TOCTOU issues. Could you add an example of what should be avoided? > + pub fn clone_reader(&self) -> UserSlicePtrReader { > + UserSlicePtrReader(self.0, self.1) > + } > + > + /// Returns the number of bytes left to be read from this. Remove "from this" or change to "from this reader". > + /// > + /// Note that even reading less than this number of bytes may fail.>= + pub fn len(&self) -> usize { > + self.1 > + } > + > + /// Returns `true` if no data is available in the io buffer. > + pub fn is_empty(&self) -> bool { > + self.1 =3D=3D 0 > + } > + > + /// Reads raw data from the user slice into a raw kernel buffer. > + /// > + /// Fails with `EFAULT` if the read encounters a page fault. This is raised if the address is invalid right, not just page faults? Or, are page faults just the only mode of indication that a pointer isn't valid. I know this is the same as copy_from_user, but I don't understand that too well myself. I'm also a bit curious what happens if you pass a kernel pointer to these methods. > + /// # Safety > + /// > + /// The `out` pointer must be valid for writing `len` bytes. > + pub unsafe fn read_raw(&mut self, out: *mut u8, len: usize) -> Resul= t { What is the motivation for taking raw pointers rather than a slice here? In which case it could just be called `read`. > + if len > self.1 || len > MAX_USER_OP_LEN { > + return Err(EFAULT); > + } > + // SAFETY: The caller promises that `out` is valid for writing `= len` bytes. > + let res =3D unsafe { bindings::copy_from_user(out.cast::= (), self.0, len as c_ulong) }; To me, it would be more clear to `c_ulong::try_from(len).map_err(|_| EFAULT)?` rather than using MAX_USER_OP_LEN with an `as` cast, since it makes it immediately clear that the conversion is ok (and is doing that same comparison) > + if res !=3D 0 { > + return Err(EFAULT); > + } > + // Since this is not a pointer to a valid object in our program, > + // we cannot use `add`, which has C-style rules for defined > + // behavior. > + self.0 =3D self.0.wrapping_add(len); > + self.1 -=3D len; > + Ok(()) > + } > + > + /// Reads all remaining data in the buffer into a vector. > + /// > + /// Fails with `EFAULT` if the read encounters a page fault. > + pub fn read_all(&mut self) -> Result> { Same as the above note about read_to_end > + let len =3D self.len(); > + let mut data =3D Vec::::try_with_capacity(len)?; > + > + // SAFETY: The output buffer is valid for `len` bytes as we just= allocated that much space. > + unsafe { self.read_raw(data.as_mut_ptr(), len)? }; If making the change above (possibly even if not), going through https://doc.rust-lang.org/std/vec/struct.Vec.html#method.spare_capacity_mut can be more clear about uninitialized data. > + > + // SAFETY: Since the call to `read_raw` was successful, the firs= t `len` bytes of the vector > + // have been initialized. > + unsafe { data.set_len(len) }; > + Ok(data) > + } > +} > + > +/// A writer for [`UserSlicePtr`]. > +/// > +/// Used to incrementally write into the user slice. > +pub struct UserSlicePtrWriter(*mut c_void, usize); > + > +impl UserSlicePtrWriter { > + /// Returns the amount of space remaining in this buffer. > + /// > + /// Note that even writing less than this number of bytes may fail. > + pub fn len(&self) -> usize { > + self.1 > + } > + > + /// Returns `true` if no more data can be written to this buffer. > + pub fn is_empty(&self) -> bool { > + self.1 =3D=3D 0 > + } > + > + /// Writes raw data to this user pointer from a raw kernel buffer. > + /// > + /// Fails with `EFAULT` if the write encounters a page fault. > + /// > + /// # Safety > + /// > + /// The `data` pointer must be valid for reading `len` bytes. > + pub unsafe fn write_raw(&mut self, data: *const u8, len: usize) -> R= esult { Same as for Reader regarding signature > + if len > self.1 || len > MAX_USER_OP_LEN { > + return Err(EFAULT); > + } > + let res =3D unsafe { bindings::copy_to_user(self.0, data.cast::<= c_void>(), len as c_ulong) }; Same as for Reader regarding `as` casts. > + if res !=3D 0 { > + return Err(EFAULT); > + } > + // Since this is not a pointer to a valid object in our program, > + // we cannot use `add`, which has C-style rules for defined > + // behavior. > + self.0 =3D self.0.wrapping_add(len); > + self.1 -=3D len; > + Ok(()) > + } > + > + /// Writes the provided slice to this user pointer. > + /// > + /// Fails with `EFAULT` if the write encounters a page fault. > + pub fn write_slice(&mut self, data: &[u8]) -> Result { This could probably just be called `write`, which would be consistent with std::io::Write. > + let len =3D data.len(); > + let ptr =3D data.as_ptr(); > + // SAFETY: The pointer originates from a reference to a slice of= length > + // `len`, so the pointer is valid for reading `len` bytes. > + unsafe { self.write_raw(ptr, len) } > + } > +} > > -- > 2.43.0.429.g432eaa2c6b-goog The docs could use usage examples, but I am sure you are planning on that already :) - Trevor