Swift: TrueDepth 맛보기 - 실시간으로 Depth data 받아오기

7 분 소요

참고

샘플 코드(Streaming Depth Data from the TrueDepth Camera)
AVCaptureDepthDataOutput(documentation)
Guide to KVO in Swift 5 with code examples(Blog-nalexn)
Best practices for context parameter in addObserver (KVO)(StackOverflow)

악보 뷰어 앱을 만들다가, 제스처로 악보를 넘기는 기능을 추가하기로 했다. 고개를 숙여서 카메라에 가까이 다가갔다가 멀어지는 방식을 해보기로 함
그걸 위해 TrueDepth 카메라를 사용해 실시간으로 depth 데이터를 받아와 보자

샘플 코드 보면서 공부했는데 처음 볼 땐 좀 복잡해서, 리팩토링하면서 필요한 부분만 골라 쓴 거 정리하기로 함

Property

private enum SessionSetupResult {
    case success
    case notAuthorized
    case configurationFailed
}

private var setupResult: SessionSetupResult = .success

setupResult
- 세션을 설정한 결과다. 유저의 카메라 사용 허용 여부, 실제 카메라에서 데이터를 가져오기 위한 configure 중의 에러 등을 저장하도록 한다.

private let session = AVCaptureSession()
private var isSessionRunning = false
private var sessionObservation: NSKeyValueObservation?

session
- 캡처 세션이다. 비디오 디바이스(TrueDepth 카메라)로부터의 인풋을 depth data 아웃풋으로 반환해줄 것이다.
isSessionRunning
- 현재 세션의 러닝 상태를 저장해둔다.
sessionObservation
- 현재 세션의 상태를 관찰한다.

private let sessionQueue = DispatchQueue(label: "session queue", attributes: [], autoreleaseFrequency: .workItem)
private let dataOutputQueue = DispatchQueue(label: "data queue", qos: .userInitiated, attributes: [], autoreleaseFrequency: .workItem)

sessionQueue
- 세션 작업은 꽤 무거울 수 있으므로 메인 큐에서 실행하면 UI 응답성이 떨어질 수 있다. 따라서 세션 관리나 관련 작업은 해당 큐에서 하도록 한다.
dataOutputQueue
- 세션을 통해 얻은 depthData 아웃풋의 콜백 큐다. 따라서 처리 결과를 UI에 표시할 일이 있다면 메인 큐에서 해 줘야 함

private let videoDeviceDiscoverySession = AVCaptureDevice.DiscoverySession(deviceTypes: [.builtInTrueDepthCamera],
                                                                           mediaType: .video,
                                                                           position: .front)
private var videoDeviceInput: AVCaptureDeviceInput!
private let depthDataOutput = AVCaptureDepthDataOutput()

videoDeviceDiscoverySession
- 조건에 맞는 비디오 디바이스를 찾기 위한 세션이다. 지금은 내장된 TrueDepth 카메라이고, 연속적으로 계속 값을 얻어야 하므로 .video, 포지션은 전면 카메라로 한다.
videoDeviceInput
- 비디오 디바이스로부터 얻은 데이터, session의 인풋이다.
depthDataOutput
- session의 depth data 아웃풋이다.

샘플 코드에서는 추가적으로

private let videoDataOutput = AVCaptureVideoDataOutput()
private var outputSynchronizer: AVCaptureDataOutputSynchronizer?

가 있는데,
session에 videoDataOutput을 아웃풋으로 추가하고,

outputSynchronizer = AVCaptureDataOutputSynchronizer(dataOutputs: [videoDataOutput, depthDataOutput])
outputSynchronizer!.setDelegate(self, queue: dataOutputQueue)

outputSynchronizer로 비디오 아웃풋과 깊이 데이터 아웃풋의 싱크를 맞춰주는 작업이 있다.
하지만 나는 비디오 결과를 화면에 보여주진 않을 거기 때문에 사용하지 않았다.

세션 준비하기

func configure() {
    if AVCaptureDevice.authorizationStatus(for: .video) != .authorized {
        setupResult = .notAuthorized
        // Show some alerts, etc...
        return
    }
    
    sessionQueue.async {
        self.configureSession()
    }
}

일단 카메라 사용을 유저가 허용 했는지 확인하고 setupResult를 설정한다.
그리고 본격적으로 세션 configure 시작(sessionQueue에서 해야 함)

private func configureSession() {
    if setupResult != .success { return }
    
    session.beginConfiguration()
    
    session.sessionPreset = AVCaptureSession.Preset.vga640x480
    
    do {
        let videoDevice = try prepareVideoDevice()
        try prepareVideoDeviceInput(videoDevice)
        try addInput()
        try addOutput()
        try setFormat(videoDevice)
    } catch {
        logger.log(error.localizedDescription, .error)
        setupResult = .configurationFailed
        session.commitConfiguration()
        return
    }
    
    depthDataOutput.setDelegate(self, callbackQueue: dataOutputQueue)
    
    session.commitConfiguration()
}

setupResult가 success일 때에만(권한이 있을 때만) configure하도록 한다.
세션은 원자적으로 업데이트하기 위해 시작과 끝에 beginConfiguration, commitConfiguration을 꼭 수행하도록 하자

순서는

video device 준비
거기서 비디오 인풋 얻기
세션에 인풋 추가하기
세션에 depth data 아웃풋 추가하기
비디오 디바이스에 depth data 포맷 정하기
depth data 아웃풋을 받아오기 위해 델리게이트 설정

이다.

1. video device 준비

private func prepareVideoDevice() throws -> AVCaptureDevice {
    guard let videoDevice = videoDeviceDiscoverySession.devices.first else {
        throw TrueDepthError.cannotFindVideoDevice
    }
    return videoDevice
}

(TrueDepthError는 임의로 추가한 거)
위에서 정의했던 videoDeviceDiscoverySession에서 비디오 디바이스를 찾아온다.

2. 비디오 인풋 얻기

private func prepareVideoDeviceInput(_ videoDevice: AVCaptureDevice) throws {
    do {
        videoDeviceInput = try AVCaptureDeviceInput(device: videoDevice)
    } catch {
        throw TrueDepthError.cannotCreateVideoInput(error)
    }
}

비디오 디바이스에서부터 캡처 인풋을 얻어온다.

3. 세션에 인풋 추가하기

private func addInput() throws {
    guard session.canAddInput(videoDeviceInput) else {
        throw TrueDepthError.cannotAddVideoInputToSession
    }
    session.addInput(videoDeviceInput)
}

얻어온 인풋을 세션에 추가해준다.
이 때 주의할 점은 반드시 canAddInput으로 해당 인풋을 추가할 수 있는지 확인 후에 추가해야 한다고 함
만약 인풋을 반복해서 세션에 넣으려고 하거나, 이미 다른 캡처 세션에 인풋으로 있는 경우 추가할 수 없다

4. 세션에 depth data 아웃풋 추가하기

private func addOutput() throws {
    if session.canAddOutput(depthDataOutput) {
        session.addOutput(depthDataOutput)
        depthDataOutput.isFilteringEnabled = false
        if let connection = depthDataOutput.connection(with: .depthData) {
            connection.isEnabled = true
        } else {
            print("No AVCaptureConnection")
        }
    } else {
        throw TrueDepthError.cannotAddDepthDataOutputToSession
    }
}

얘도 아웃풋을 추가할 수 있는지 확인 후에 추가해주도록 한다.
isFilteringEnabled는 기본적으로 true 값이므로 원하는 걸 선택하도록 한다. 필터링을 사용할 경우 노이즈를 줄이고 저조도 문제 등으로 누락된 값을 채워 넣어준다.

음 근데

if let connection = depthDataOutput.connection(with: .depthData) {
    connection.isEnabled = true
} else {
    print("No AVCaptureConnection")
}

여기를 보면 원래 코드에서도 connection이 없을 때에 그냥 출력만 하고 딱히 에러 처리를 안 하던데
그래도 되나 싶어서 AVCaptureConnection 문서를 찾아보니,
세션에 addInput(_:) 또는 addOutput(_:) 메서드를 사용할 때 호환되는 모든 입력과 출력 사이에 자동으로 연결을 만든다고 함

그럼 저 코드 없어도 되나? 해서 없애고도 돌렸는데 잘 되네여

5. 비디오 디바이스에 depth data 포맷 정하기

private func setFormat(_ videoDevice: AVCaptureDevice) throws {
    let format = videoDevice.activeFormat.supportedDepthDataFormats
        .filter {
            CMFormatDescriptionGetMediaSubType($0.formatDescription) == kCVPixelFormatType_DepthFloat16
        }
        .max(by: { first, second in
            CMVideoFormatDescriptionGetDimensions(first.formatDescription).width < CMVideoFormatDescriptionGetDimensions(second.formatDescription).width
        })

    do {
        try videoDevice.lockForConfiguration()
        videoDevice.activeDepthDataFormat = format
        videoDevice.unlockForConfiguration()
    } catch {
        throw TrueDepthError.cannotLockDeviceForConfiguration(error)
    }
}

마지막으로 포맷 정하기
지원되는 포맷들 중, 포맷타입이 depth인 애들을 찾아내고, 가장 높은 해상도를 뽑아서 걔로 설정한다.
비디오 디바이스의 configuration도 꼭 원자적으로 수행해주도록 한다.

세션 시작

func startSession() {
    sessionQueue.async {
        if self.setupResult != .success {
            // 시작 실패
            return
        }
        self.addObservers()
        self.session.startRunning()
        self.isSessionRunning = self.session.isRunning
    }
}

이제 세션을 시작해보자!!(마찬가지로 sessionQueue에서 해야 함)
setupResult가 success일 때만(카메라 권한이 있고, configuration 에러가 없을 때) 시작하도록 한다.
옵저버를 추가하고, 세션을 시작하고, isSessionRunning에 세션 상태도 저장해주도록 한다.

옵저버 추가하기

A session can only run when the app is full screen. It will be interrupted in a multi-app layout, introduced in iOS 9, see also the documentation of AVCaptureSessionInterruptionReason. Add observers to handle these session interruptions and show a preview is paused message. See the documentation of AVCaptureSessionWasInterruptedNotification for other interruption reasons.

앱이 풀 스크린을 떠나게 되면 세션에 인터럽션이 생기므로, 이를 관찰해준다.

NotificationCenter.default.addObserver(self, selector: #selector(sessionWasInterrupted),
                                       name: NSNotification.Name.AVCaptureSessionWasInterrupted,
                                       object: session)

@objc
func sessionWasInterrupted(notification: NSNotification) {
    if let userInfoValue = notification.userInfo?[AVCaptureSessionInterruptionReasonKey] as AnyObject?,
        let reasonIntegerValue = userInfoValue.integerValue,
        let reason = AVCaptureSession.InterruptionReason(rawValue: reasonIntegerValue) {
        print("Capture session was interrupted with reason \(reason)")
    }
}

iOS9부터 userInfo에 인터럽트의 이유가 담겨서 오므로 이를 확인해준다.
.videoDeviceInUseByAnotherClient, .videoDeviceNotAvailableWithMultipleForegroundApps 등 다양한 이유가 있다

NotificationCenter.default.addObserver(self, selector: #selector(sessionRuntimeError),
                                       name: NSNotification.Name.AVCaptureSessionRuntimeError, object: session)

@objc
func sessionRuntimeError(notification: NSNotification) {
    guard let errorValue = notification.userInfo?[AVCaptureSessionErrorKey] as? NSError else {
        return
    }
    
    let error = AVError(_nsError: errorValue)
    print("Capture session runtime error: \(error)")
    
    if error.code == .mediaServicesWereReset {
        sessionQueue.async {
            if self.isSessionRunning {
                self.session.startRunning()
                self.isSessionRunning = self.session.isRunning
            }
        }
    }
}

에러 코드가 .mediaServicesWereReset으로 미디어 서비스가 리셋되었고 이전에 성공했었다면, 큰 문제가 없는 것이므로 알아서 재시작해준다.

KVO(Key-Value Observing)

특정 키패스의 값을 관찰할 수 있다.
현재 코드를 보면

private var sessionRunningContext = 0

이렇게 세션 상태 관련 context를 하나 정의하고,

session.addObserver(self,
                    forKeyPath: "running",
                    options: NSKeyValueObservingOptions.new,
                    context: &sessionRunningContext)

세션에다가 running 키패스의 값이 새로 바뀌는 지를 관찰하도록 추가한다.

override func observeValue(forKeyPath keyPath: String?, of object: Any?, change: [NSKeyValueChangeKey: Any]?, context: UnsafeMutableRawPointer?) {
    if context != &sessionRunningContext {
        super.observeValue(forKeyPath: keyPath, of: object, change: change, context: context)
    }
}

그리고 observeValue()를 오버라이드해서, context가 세션 context가 아닐 경우 기존의 함수를 호출한다.

context?

어떤 클래스에서 어떤 키패스를 관찰했는데, 그 하위 클래스나 다른 개체에서 똑같은 키패스를 관찰할 경우, 위 메소드가 여러 번 호출 되고 context가 없다면 keyPath만으로는 그 호출들을 구분할 수 없으므로 사용된다.

그래서 보통 context 변수는 클래스 내에 고유하게 존재할 수 있는 정적 변수로 사용한다고 함

근데 딱히 관찰해서 뭔가 하는 코드는 여기 없는 듯

deinit {
    session.removeObserver(self, forKeyPath: "running", context: &sessionRunningContext)
}

deinit 시에는 옵저버를 없애서 메모리 관리를 해준다.

KVO 2

그런데 해당 방식은 KVO를 쓰는 약간 옛날 방식으로, Objective-C 스타일이라 할 수 있다. context 파라미터는 UnsafeMutableRawPointer다. 그래서 Swift4에서 추가된 녀석으로 바꿔 주면

private var sessionObservation: NSKeyValueObservation?

이렇게 Observation인 NSKeyValueObservation를 가지도록 하고

sessionObservation = session.observe(\.isRunning, options: .new) { session, change in
    guard let isRunning = change.newValue else { return }
    print("Session running: \(isRunning)")
}

이런 식으로 changeHandler를 넘겨주면 됨

deinit {
    sessionObservation?.invalidate()
}

deinit 시에 이렇게 관찰을 invalidate하면 된다.

세션 멈추기

func stopSession() {
    sessionQueue.async {
        if self.setupResult == .success {
            self.session.stopRunning()
            self.isSessionRunning = self.session.isRunning
        }
    }
}

이번엔 세션을 멈춰보자!!(마찬가지로!! sessionQueue에서 해야 함)
간단함

Depth data 아웃풋 사용하기

세션 구성에서 depthDataOutput.setDelegate(self, callbackQueue: dataOutputQueue)으로 델리게이트를 설정했었다
AVCaptureDepthDataOutputDelegate로

func depthDataOutput(_ output: AVCaptureDepthDataOutput,
                     didOutput depthData: AVDepthData,
                     timestamp: CMTime,
                     connection: AVCaptureConnection)

을 정의하면 결과를 받아올 수 있다

func depthDataOutput(_ output: AVCaptureDepthDataOutput,
                     didOutput depthData: AVDepthData,
                     timestamp: CMTime,
                     connection: AVCaptureConnection) {
    let depthFrame = depthData.depthDataMap
    let width = CVPixelBufferGetWidth(depthFrame)
    let height = CVPixelBufferGetHeight(depthFrame)
    assert(kCVPixelFormatType_DepthFloat16 == CVPixelBufferGetPixelFormatType(depthFrame))
    CVPixelBufferLockBaseAddress(depthFrame, .readOnly)
    for row in (0..<height) {
        let rowData = (CVPixelBufferGetBaseAddress(depthFrame)!
                       + row * CVPixelBufferGetBytesPerRow(depthFrame))
            .assumingMemoryBound(to: Float16.self)
        for col in (0..<width) {
            let f = Float(rowData[col])
            // 결과 사용하기
        }
    }
    CVPixelBufferUnlockBaseAddress(depthFrame, .readOnly)
}

depthFrame
- CVPixelBuffer다. 픽셀 버퍼는 메인 메모리에 이미지를 저장한다.
width, height
- 버퍼의 가로, 세로를 구해준다.
assert(...)
- 포맷을 정할 때 kCVPixelFormatType_DepthFloat16인 것들 중 골랐으니까 당연히 같겠지만 확인 한 번 해주자
CVPixelBufferLockBaseAddress(), CVPixelBufferUnlockBaseAddress()
- CPU에서 픽셀 데이터에 액세스하기 전에 락을 걸어줘야 한다. lockFlag는 lock과 unlock 시 같아야 함. 지금은 값을 변경하지 않으니 readOnly를 사용한다
- GPU에서 접근할 때는 락을 할 필요가 없고, 오히려 성능 저하가 있을 수 있다고 함
데이터 접근
- 이제 베이스 주소값, row 당 바이트를 사용하면 원하는 인덱스의 데이터에 접근할 수 있다.
- 우선 (베이스) + (row 인덱스) * (row 당 바이트)로 rowData UnsafeMutableRawPointer에 접근한다
- 이제 assumingMemoryBound()로 Float16 타입인 포인터를 얻을 수 있다.
  - 메모리가 이미 해당 타입에 바인딩되어 있다고 가정하고, 해당 raw 포인터가 참조하는 메모리에 대한 해당 타입 포인터를 반환한다.
  - 예제 코드에서는 swift에 Float16 타입이 없어 UInt16으로 캐스팅했다가 변환하는 코드가 있었으나, swift 5.3부터 Float16이 추가되었으므로 바로 이거 사용해도 됨
- 이제 원하는 col 인덱스에 접근하여 사용 가능.

전체 코드는 여기 → cyj893/MyFlow/TrueDepthHelper.swift

trueDepth

저는 그냥 합 구해서 임계값 따라서 처리하니까 잘 되네여
재밌다 굿

Twitter Facebook LinkedIn

cyj893