linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Simon Richter <Simon.Richter@hogyros.de>
To: linux-crypto@vger.kernel.org
Cc: linux-mm@kvack.org
Subject: Shoveling data into and out of the crypto subsystem
Date: Mon, 18 Oct 2021 16:22:22 +0200	[thread overview]
Message-ID: <8d718ae1-06a9-72c2-a3c0-71fd3f7af7b4@hogyros.de> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 4250 bytes --]

Hi,

I'm building a small accelerator card that should provide crypto 
primitives, and I'm wondering how large data transfers from and to 
userspace are supposed to work -- especially if these are file backed 
and larger than available memory.

For testing, I've created an 8GB random file, and used kcapi-dgst on it:

     $ strace kcapi-dgst -c sha256 -i test8G.bin --hex
     [...]
     openat(AT_FDCWD, 0x7ffc7e4b5896, O_RDONLY|O_CLOEXEC) = 6
     fstat(6, 0x7ffc7e4a5da0)                = 0
     mmap(NULL, 8589934592, PROT_READ, MAP_SHARED, 6, 0) = 0x7f8d911cf000
     accept(3, NULL, NULL)                   = 7
     sendmsg(7, 0x7ffc7e4a5ca0, MSG_MORE)    = 2147479552
     vmsplice(5, 0x7ffc7e4a5d00, 1, SPLICE_F_MORE|SPLICE_F_GIFT) = 4095
     splice(4, NULL, 7, NULL, 4095, SPLICE_F_MORE) = 4095
     sendmsg(7, 0x7ffc7e4a5ca0, MSG_MORE)    = 2147479552
     vmsplice(5, 0x7ffc7e4a5d00, 1, SPLICE_F_MORE|SPLICE_F_GIFT) = 4095
     splice(4, NULL, 7, NULL, 4095, SPLICE_F_MORE) = 4095
     sendmsg(7, 0x7ffc7e4a5ca0, MSG_MORE)    = 2147479552
     vmsplice(5, 0x7ffc7e4a5d00, 1, SPLICE_F_MORE|SPLICE_F_GIFT) = 4095
     splice(4, NULL, 7, NULL, 4095, SPLICE_F_MORE) = 4095
     sendmsg(7, 0x7ffc7e4a5ca0, MSG_MORE)    = 2147479552
     vmsplice(5, 0x7ffc7e4a5d00, 1, SPLICE_F_MORE|SPLICE_F_GIFT) = 4095
     splice(4, NULL, 7, NULL, 4095, SPLICE_F_MORE) = 4095
     sendto(7, 0x7f8f911ceffc, 4, MSG_MORE, NULL, 0) = 4
     recvmsg(7, 0x7ffc7e4a5cd0, 0)           = 32
     fstat(1, 0x7ffc7e4a5bc0)                = 0
     munmap(0x7f8d911cf000, 0)               = -1 EINVAL (Invalid argument)

This seems wrong to me:

  - Every sendmsg call is 2GB - 4kB. That probably makes sense when 
trying to keep every transfer page aligned.
  - The vmsplice()/splice() transfers 4095 bytes -- that would likely 
trigger a copy and leave the file pointer unaligned after
  - The last sendto() call then cleans up the remaining four bytes and 
still uses MSG_MORE.
  - The munmap() call is just confused.

Is that the optimal way to transfer data from disk to an ahash?

Now my PCIe device can operate directly on DMA memory, and the way I've 
understood the crypto API is that the "src" scatterlist can be mapped 
using dma_map_sg, so somehow the data is in DMA memory at this point, 
which makes me suspect that the data was copied several times in between 
as the result of mmap() is unsuitable for DMA.

crypto+mm Questions so far:

  - How does flow control work for the 2GB sendmsg(mmap()) if the data 
needs to be made available for DMA -- presumably I can't dma_map_sg() 
all of the pages if I have 4 GB physical memory?
  - Is there a zerocopy path for disk->crypto that can be used with 
large data blobs?
  - Are there suitable paths for crypto->disk (for encryption and 
compression)?
  - If the device implements PCIe Address Translation and Page Request 
Interface, can I use the IOMMU to pin pages instead of doing that in a 
driver, i.e. can a crypto driver indicate that the scatterlist can refer 
to virtual memory that need not be pinned or even present yet, and can 
this be used to avoid copies or partial mappings?

Crypto only questions so far:

  - The ahash interface seems to still expect the result to be filled 
out on return, when I kind of expected it to wait for me to send a 
callback. Am I missing something, or do I need to suspend the current 
thread and wake it up from an interrupt? Can I somehow report completion 
from an interrupt handler? Does it make sense to make interrupts CPU affine?
  - The result pointer for ahash points to vmalloc()ed memory -- is 
there a way to get a DMA buffer instead (not that there's a performance 
difference here, but space in the result DMA buffer is another resource 
I need to track otherwise).
  - The POWER9 NX driver has a separate interface for gzip 
compression/decompression of large blobs, is there a technical reason 
why it cannot implement the crypto API?

Basically my goal is to have fast gzip compression and decompression 
support with the same interface on both of my workstations, one of which 
has an FPGA card, and the other has two POWER9 CPUs with NX. :)

    Simon


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

                 reply	other threads:[~2021-10-18 14:22 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8d718ae1-06a9-72c2-a3c0-71fd3f7af7b4@hogyros.de \
    --to=simon.richter@hogyros.de \
    --cc=linux-crypto@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox