let paddedK: [Float] = pad(sequence: kernel, other: x)
現(xiàn)在,我們可以建立paddedX和paddedK之間的一個(gè)卷積:

最后,卷積的結(jié)果是:
// y = [1, 4, 10, 16, 22]
Accelerate的卷積
如果你想加速卷積處理,你可以使用Accelerate框架提供的vDSP_conv函數(shù)。同樣,我需要處理邊界條件和核反轉(zhuǎn)。這一次,我對(duì)輸入數(shù)組和核換個(gè)零填充的方式。另外,我需要反轉(zhuǎn)核(文檔里有解釋),否則我得到的是兩個(gè)序列的相關(guān)性。
以下是用Accelerate的實(shí)現(xiàn):
import Accelerate
let x: [Float] = [1, 2, 3, 4, 5], M = x.count
let kernel: [Float] = [1, 2, 3], N = kernel.count
let T = N+M-1
var res = [Float](repeatElement(0, count: T))
let zeros = [Float](repeatElement(0, count: N-1))
let newXin = zeros + x + zeros
vDSP_conv(newXin, 1, kernel.reverse(), 1, &res, 1, vDSP_Length(T), vDSP_Length(N))
對(duì)于這個(gè)很短的輸入序列,你不會(huì)感激Accelerate框架帶來的加速。但如果我創(chuàng)建了100,000個(gè)元素的輸入數(shù)組,并用和之前示例相同的w內(nèi)核進(jìn)行卷積。在我的MacBook Pro上,Swift的實(shí)現(xiàn)需要318 ms,而Accelerate的vDSP_conv方法只要159 ns。
Metal的卷積
讓我們看一下如何用Metal實(shí)現(xiàn)相同的例子???這篇文 章學(xué)習(xí)如何配置一個(gè)GPU計(jì)算的Metal項(xiàng)目。
在這個(gè)特殊的例子中,我們需要?jiǎng)?chuàng)建3個(gè)Metal紋理(遵守MTLTexture協(xié)議的對(duì)象):第一個(gè)紋理存儲(chǔ)輸入序列,第二個(gè)紋理存儲(chǔ)核,第三個(gè)紋理存儲(chǔ)最終結(jié)果。
以下是創(chuàng)建這些紋理的源代碼:
import Metal
let paddedX: [Float] = input + [Float](repeatElement(0, count: N-1))
let paddedK: [Float] = kernel + [Float](repeatElement(0, count: M-1))
let inputTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(with: .r32Float, width: paddedX.count, height: 1, mipmapped: false)
inputTextureDescriptor.usage = .shaderRead
inTexture = metalContext.device.newTexture(with: inputTextureDescriptor)
let region = MTLRegionMake2D(0, 0, paddedX.count, 1)
inTexture?.replace(region, mipmapLevel: 0, withBytes: paddedX, bytesPerRow: paddedX.count * sizeof(Float32.self))
let kernelTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(with: .r32Float, width: paddedK.count, height: 1, mipmapped: false)
kernelTexture = metalContext.device.newTexture(with: kernelTextureDescriptor)
let kernelRegion = MTLRegionMake2D(0, 0, paddedK.count, 1)
kernelTexture?.replace(kernelRegion, mipmapLevel: 0, withBytes: paddedK, bytesPerRow: paddedK.count * sizeof(Float32.self))
let outputTextureDescriptor = MTLTextureDescriptor.texture2DDescriptor(with: .r32Float, width: paddedX.count, height: 1, mipmapped: false)
outputTextureDescriptor.usage = .shaderWrite
outTexture = metalContext.device.newTexture(with: outputTextureDescriptor)
executeConvolution()
在前面的源代碼里,metalContext是下面的類的一個(gè)實(shí)例:
final class MetalContext: NSObject {
let device: MTLDevice
let commandQueue: MTLCommandQueue
let library: MTLLibrary
override init() {
// Get the device
self.device = MTLCreateSystemDefaultDevice()!
// Create a command queue
self.commandQueue = device.newCommandQueue()
// Get the default library
self.library = device.newDefaultLibrary()!
super.init()
}
}
這只是一個(gè)助手類,我通常用來配置一個(gè)Metal棧的主要對(duì)象。
最后一個(gè)executeConvolution()方法用來編碼GPU命令:
func executeConvolution() {
guard let outTexture = self.outTexture else { return }
let commandBuffer = metalContext.commandQueue.commandBuffer()
let computeCommandEncoder = commandBuffer.computeCommandEncoder()
computeCommandEncoder.setComputePipelineState(computePipelineState!)
computeCommandEncoder.setTexture(inTexture, at: 0)
computeCommandEncoder.setTexture(kernelTexture, at: 1)
computeCommandEncoder.setTexture(outTexture, at: 2)
computeCommandEncoder.dispatchThreadgroups(MTLSizeMake(T, 1, 1), threadsPerThreadgroup: MTLSizeMake(1, 1, 1))
computeCommandEncoder.endEncoding()
commandBuffer.commit()
let region = MTLRegionMake1D(0, T)
var buffer = [Float32](repeatElement(0, count: T))