From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-yw0-f200.google.com (mail-yw0-f200.google.com [209.85.161.200]) by kanga.kvack.org (Postfix) with ESMTP id 7A56D6B0038 for ; Wed, 22 Mar 2017 09:54:40 -0400 (EDT) Received: by mail-yw0-f200.google.com with SMTP id 204so605260109ywo.6 for ; Wed, 22 Mar 2017 06:54:40 -0700 (PDT) Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com. [148.163.158.5]) by mx.google.com with ESMTPS id n199si502398ybg.42.2017.03.22.06.54.38 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 22 Mar 2017 06:54:39 -0700 (PDT) Received: from pps.filterd (m0098417.ppops.net [127.0.0.1]) by mx0a-001b2d01.pphosted.com (8.16.0.20/8.16.0.20) with SMTP id v2MDmmaI123261 for ; Wed, 22 Mar 2017 09:54:38 -0400 Received: from e06smtp14.uk.ibm.com (e06smtp14.uk.ibm.com [195.75.94.110]) by mx0a-001b2d01.pphosted.com with ESMTP id 29bt5b9c7j-1 (version=TLSv1.2 cipher=AES256-SHA bits=256 verify=NOT) for ; Wed, 22 Mar 2017 09:54:38 -0400 Received: from localhost by e06smtp14.uk.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted for from ; Wed, 22 Mar 2017 13:54:36 -0000 Date: Wed, 22 Mar 2017 09:54:26 -0400 From: Mike Rapoport Subject: Re: Review request: draft ioctl_userfaultfd(2) manual page References: <487b2c79-f99b-6d0f-2412-aa75cde65569@gmail.com> <9af29fc6-dce2-f729-0f07-a0bfcc6c3587@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <9af29fc6-dce2-f729-0f07-a0bfcc6c3587@gmail.com> Message-Id: <20170322135423.GB27789@rapoport-lnx> Sender: owner-linux-mm@kvack.org List-ID: To: "Michael Kerrisk (man-pages)" , Andrea Arcangeli Cc: lkml , "linux-mm@kvack.org" , linux-man Hello Michael, On Mon, Mar 20, 2017 at 09:11:07PM +0100, Michael Kerrisk (man-pages) wrote: > Hello Andrea, Mike, and all, > > Mike: here's the split out page that describes the > userfaultfd ioctl() operations. > > I'd like to get review input, especially from you and > Andrea, but also anyone else, for the current version > of this page, which includes quite a few FIXMEs to be > sorted. > > I've shown the rendered version of the page below. > The groff source is attached, and can also be found > at the branch here: > > https://git.kernel.org/pub/scm/docs/man-pages/man-pages.git/log/?h=draft_userfaultfd > > The new ioctl_userfaultfd(2) page follows this mail. > > Cheers, > > Michael > > NAME > userfaultfd - create a file descriptor for handling page faults in user > space > > SYNOPSIS > #include > > int ioctl(int fd, int cmd, ...); > > DESCRIPTION > Various ioctl(2) operations can be performed on a userfaultfd object > (created by a call to userfaultfd(2)) using calls of the form: > > ioctl(fd, cmd, argp); > > In the above, fd is a file descriptor referring to a userfaultfd > object, cmd is one of the commands listed below, and argp is a pointer > to a data structure that is specific to cmd. > > The various ioctl(2) operations are described below. The UFFDIO_API, > UFFDIO_REGISTER, and UFFDIO_UNREGISTER operations are used to configure > userfaultfd behavior. These operations allow the caller to choose what > features will be enabled and what kinds of events will be delivered to > the application. The remaining operations are range operations. These > operations enable the calling application to resolve page-fault events > in a consistent way. > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Above: What does "consistent" mean? a?? > a?? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? Andrea, can you please help with this one? > UFFDIO_API > (Since Linux 4.3.) Enable operation of the userfaultfd and perform API > handshake. The argp argument is a pointer to a uffdio_api structure, > defined as: > > struct uffdio_api { > __u64 api; /* Requested API version (input) */ > __u64 features; /* Must be zero */ > __u64 ioctls; /* Available ioctl() operations (output) */ > }; > > The api field denotes the API version requested by the application. > Before the call, the features field must be initialized to zero. > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Above: Why must the 'features' field be initialized a?? > a??to zero? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? Until 4.11 the only supported feature is delegation of missing page fault and the UFFDIO_FEATURES bitmask is 0. There's a check in uffdio_api call that the user is not trying to enable any other functionality and it asserts that uffdio_api.featurs is zero [1]. Starting from 4.11 the features negotiation is different. Now uffdio_call verifies that it can support features the application requested [2]. > The kernel verifies that it can support the requested API version, and > sets the features and ioctls fields to bit masks representing all the > available features and the generic ioctl(2) operations available. Cura?? > rently, zero (i.e., no feature bits) is placed in the features field. > The returned ioctls field can contain the following bits: > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??This user-space API seems not fully polished. Why a?? > a??are there not constants defined for each of the bit- a?? > a??mask values listed below? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > > 1 << _UFFDIO_API > The UFFDIO_API operation is supported. > > 1 << _UFFDIO_REGISTER > The UFFDIO_REGISTER operation is supported. > > 1 << _UFFDIO_UNREGISTER > The UFFDIO_UNREGISTER operation is supported. Well, I tend to agree. I believe the original intention was to use the OR'ed mask, like UFFD_API_IOCTLS. Andrea, can you add somthing? > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Is the above description of the 'ioctls' field cora?? a?? > a??rect? Does more need to be said? a?? > a?? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? This is correct. I wouldn't add anything else. > This ioctl(2) operation returns 0 on success. On error, -1 is returned > and errno is set to indicate the cause of the error. Possible errors > include: > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? G> a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Is the following error list correct? a?? > a?? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? There's also -EFAULT in case copy_{from,to}_user fails. > > EINVAL The userfaultfd has already been enabled by a previous UFFa?? > DIO_API operation. > > EINVAL The API version requested in the api field is not supported by > this kernel, or the features field was not zero. > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??In the above error case, the returned 'uffdio_api' a?? > a??structure zeroed out. Why is this done? This should a?? > a??be explained in the manual page. a?? > a?? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? In my understanding the uffdio_api structure is zeroed to allow the caller to distinguish the reasons for -EINVAL. > UFFDIO_REGISTER > (Since Linux 4.3.) Register a memory address range with the usera?? > faultfd object. The argp argument is a pointer to a uffdio_register > structure, defined as: > > struct uffdio_range { > __u64 start; /* Start of range */ > __u64 len; /* Length of rnage (bytes) */ > }; > > struct uffdio_register { > struct uffdio_range range; > __u64 mode; /* Desired mode of operation (input) */ > __u64 ioctls; /* Available ioctl() operations (output) */ > }; > > > The range field defines a memory range starting at start and continuing > for len bytes that should be handled by the userfaultfd. > > The mode field defines the mode of operation desired for this memory > region. The following values may be bitwise ORed to set the usera?? > faultfd mode for the specified range: > > UFFDIO_REGISTER_MODE_MISSING > Track page faults on missing pages. > > UFFDIO_REGISTER_MODE_WP > Track page faults on write-protected pages. > > Currently, the only supported mode is UFFDIO_REGISTER_MODE_MISSING. > > If the operation is successful, the kernel modifies the ioctls bit-mask > field to indicate which ioctl(2) operations are available for the speca?? > ified range. This returned bit mask is as for UFFDIO_API. > > This ioctl(2) operation returns 0 on success. On error, -1 is returned > and errno is set to indicate the cause of the error. Possible errors > include: > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Is the following error list correct? a?? > a?? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? Here again it maybe -EFAULT to indicate copy_{from,to}_user failure. And, UFFDIO_REGISTER may return -ENOMEM if the process is exiting and the mm_struct has gone by the time userfault grabs it. > EBUSY A mapping in the specified range is registered with another > userfaultfd object. > > EINVAL An invalid or unsupported bit was specified in the mode field; > or the mode field was zero. > > EINVAL There is no mapping in the specified address range. > > EINVAL range.start or range.len is not a multiple of the system page > size; or, range.len is zero; or these fields are otherwise > invalid. > > EINVAL There as an incompatible mapping in the specified address range. > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Above: What does "incompatible" mean? a?? > a?? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? Up to 4.10 userfault context may be registered only for MAP_ANONYMOUS | MAP_PRIVATE mappings. > UFFDIO_UNREGISTER > (Since Linux 4.3.) Unregister a memory address range from userfaultfd. > The address range to unregister is specified in the uffdio_range struca?? > ture pointed to by argp. > > This ioctl(2) operation returns 0 on success. On error, -1 is returned > and errno is set to indicate the cause of the error. Possible errors > include: > > EINVAL Either the start or the len field of the ufdio_range structure > was not a multiple of the system page size; or the len field was > zero; or these fields were otherwise invalid. > > EINVAL There as an incompatible mapping in the specified address range. > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Above: What does "incompatible" mean? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? The same comments as for UFFDIO_REGISTER apply here as well. > EINVAL There was no mapping in the specified address range. > > UFFDIO_COPY > (Since Linux 4.3.) Atomically copy a continuous memory chunk into the > userfault registered range and optionally wake up the blocked thread. > The source and destination addresses and the number of bytes to copy > are specified by the src, dst, and len fields of the uffdio_copy struca?? > ture pointed to by argp: > > struct uffdio_copy { > __u64 dst; /* Source of copy */ > __u64 src; /* Destinate of copy */ > __u64 len; /* Number of bytes to copy */ > __u64 mode; /* Flags controlling behavior of copy */ > __s64 copy; /* Number of bytes copied, or negated error */ > }; > > The following value may be bitwise ORed in mode to change the behavior > of the UFFDIO_COPY operation: > > UFFDIO_COPY_MODE_DONTWAKE > Do not wake up the thread that waits for page-fault resolution > > The copy field is used by the kernel to return the number of bytes that > was actually copied, or an error (a negated errno-style value). > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Above: Why is the 'copy' field used to return error a?? > a??values? This should be explained in the manual a?? > a??page. a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? Andrea, can you help with this one, please? > If the value returned in copy doesn't match the value that was specia?? > fied in len, the operation fails with the error EAGAIN. The copy field > is output-only; it is not read by the UFFDIO_COPY operation. > > This ioctl(2) operation returns 0 on success. In this case, the entire > area was copied. On error, -1 is returned and errno is set to indicate > the cause of the error. Possible errors include: > > EAGAIN The number of bytes copied (i.e., the value returned in the copy > field) does not equal the value that was specified in the len > field. > > EINVAL Either dst or len was not a multiple of the system page size, or > the range specified by src and len or dst and len was invalid. > > EINVAL An invalid bit was specified in the mode field. > > UFFDIO_ZEROPAGE > (Since Linux 4.3.) Zero out a memory range registered with usera?? > faultfd. The requested range is specified by the range field of the > uffdio_zeropage structure pointed to by argp: > > struct uffdio_zeropage { > struct uffdio_range range; > __u64 mode; /* Flags controlling behavior of copy */ > __s64 zeropage; /* Number of bytes zeroed, or negated error */ > }; > > The following value may be bitwise ORed in mode to change the behavior > of the UFFDIO_ZERO operation: > > UFFDIO_ZEROPAGE_MODE_DONTWAKE > Do not wake up the thread that waits for page-fault resolution. > > The zeropage field is used by the kernel to return the number of bytes > that was actually zeroed, or an error in the same manner as UFFa?? > DIO_COPY. > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Why is the 'zeropage' field used to return error a?? > a??values? This should be explained in the manual a?? > a??page. a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > If the value returned in the zeropage field doesn't match the value > that was specified in range.len, the operation fails with the error > EAGAIN. The zeropage field is output-only; it is not read by the UFFa?? > DIO_ZERO operation. > > This ioctl(2) operation returns 0 on success. In this case, the entire > area was zeroed. On error, -1 is returned and errno is set to indicate > the cause of the error. Possible errors include: > > EAGAIN The number of bytes zeroed (i.e., the value returned in the > zeropage field) does not equal the value that was specified in > the range.len field. > > EINVAL Either range.start or range.len was not a multiple of the system > page size; or range.len was zero; or the range specified was > invalid. > > EINVAL An invalid bit was specified in the mode field. > > UFFDIO_WAKE > (Since Linux 4.3.) Wake up the thread waiting for page-fault resolua?? > tion on a specified memory address range. The argp argument is a > pointer to a uffdio_range structure (shown above) that specifies the > address range. > > > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??FIXME a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? > a??Need more detail here. How is the UFFDIO_WAKE operaa?? a?? > a??tion used? a?? > a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a??a?? The UFFDIO_WAKE operation is used in conjunction with UFFDIO_{COPY,ZEROPAGE} operations that have UFFDIO_{COPY,ZEROPAGE}_MODE_DONTWAKE bit set in the mode field. The userfault monitor can perform several UFFDIO_{COPY,ZEROPAGE} calls in a batch and then explicitly wake up the faulting thread using UFFDIO_WAKE. > This ioctl(2) operation returns 0 on success. On error, -1 is returned > and errno is set to indicate the cause of the error. Possible errors > include: > > EINVAL The start or the len field of the ufdio_range structure was not > a multiple of the system page size; or len was zero; or the > specified range was otherwise invalid. > > RETURN VALUE > See descriptions of the individual operations, above. > > ERRORS > See descriptions of the individual operations, above. In addition, the > following general errors can occur for all of the operations described > above: > > EFAULT argp does not point to a valid memory address. > > EINVAL (For all operations except UFFDIO_API.) The userfaultfd object > has not yet been enabled (via the UFFDIO_API operation). > > CONFORMING TO > These ioctl(2) operations are Linux-specific. > > EXAMPLE > See userfaultfd(2). > > SEE ALSO > ioctl(2), mmap(2), userfaultfd(2) > > Documentation/vm/userfaultfd.txt in the Linux kernel source tree > [1] http://lxr.free-electrons.com/source/fs/userfaultfd.c#L1199 [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/userfaultfd.c#n1680 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org