如何将音频流式传输到v20160207 iOS中的语音服务API

问题描述:

我一直在研究iOS的Alexa应用一段时间,但我一直在努力将麦克风音频作为流发送到AVS API。如何将音频流式传输到v20160207 iOS中的语音服务API

我成功地预先录制了一个音频样本并将其作为一个整体发送并获得响应。

我只是想知道如何通过NSURLSession http/2连接将数据传输到AVS。

下面的代码片段正是我现在做的:

func sendData() { 
     let request = NSMutableURLRequest(URL: NSURL(string: "https://avs-alexa-na.amazon.com/v20160207/events")!) 
     request.setValue("Bearer \(Settings.Credentials.TOKEN)", forHTTPHeaderField: "authorization") 
     request.HTTPMethod = "POST" 

     let boundry = NSUUID().UUIDString 
     let contentType = "multipart/form-data; boundary=\(boundry)" 
     request.setValue(contentType, forHTTPHeaderField: "content-type") 

     let bodyData = NSMutableData() 

     let jsonData = "{\"context\":[{\"header\":{\"namespace\":\"Alerts\",\"name\":\"AlertsState\"},\"payload\":{\"allAlerts\":[],\"activeAlerts\":[]}},{\"header\":{\"namespace\":\"AudioPlayer\",\"name\":\"PlaybackState\"},\"payload\":{\"token\":\"\",\"offsetInMilliseconds\":0,\"playerActivity\":\"IDLE\"}},{\"header\":{\"namespace\":\"Speaker\",\"name\":\"VolumeState\"},\"payload\":{\"volume\":25,\"muted\":false}},{\"header\":{\"namespace\":\"SpeechSynthesizer\",\"name\":\"SpeechState\"},\"payload\":{\"token\":\"\",\"offsetInMilliseconds\":0,\"playerActivity\":\"FINISHED\"}}],\"event\":{\"header\":{\"namespace\":\"SpeechRecognizer\",\"name\":\"Recognize\",\"messageId\":\"messageId-123\",\"dialogRequestId\":\"dialogRequestId-321\"},\"payload\":{\"profile\":\"CLOSE_TALK\",\"format\":\"AUDIO_L16_RATE_16000_CHANNELS_1\"}}}" 

     bodyData.appendData("--\(boundry)\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 
     bodyData.appendData("Content-Disposition: form-data; name=\"metadata\"\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 
     bodyData.appendData("Content-Type: application/json; charset=UTF-8\r\n\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 
     bodyData.appendData(jsonData.dataUsingEncoding(NSUTF8StringEncoding)!) 
     bodyData.appendData("\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 

     bodyData.appendData("--\(boundry)\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 

     bodyData.appendData("Content-Disposition: form-data; name=\"audio\"\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 
     //  bodyData.appendData("Content-Type: audio/L16; rate=16000; channels=1\r\n\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 
     bodyData.appendData("Content-Type: application/octet-stream\r\n\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 
     bodyData.appendData(audioData!) 
     bodyData.appendData("\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 

     bodyData.appendData("--\(boundry)\r\n".dataUsingEncoding(NSUTF8StringEncoding)!) 
     session = NSURLSession.sharedSession() 
     session.configuration.timeoutIntervalForResource = 60000 
     session.configuration.timeoutIntervalForRequest = 60000 

     let upload = session.uploadTaskWithRequest(request, fromData: bodyData) { (data, response, error) in 
      print("done") 
      if(data?.length > 0) { 
       print("break") 
      } 
      if let httpResponse = response as? NSHTTPURLResponse { 
       if let responseData = data, let contentTypeHeader = httpResponse.allHeaderFields["Content-Type"] { 

        var boundry: String? 
        let ctbRange = contentTypeHeader.rangeOfString("boundary=.*?;", options: .RegularExpressionSearch) 
        if ctbRange.location != NSNotFound { 
         let boundryNSS = contentTypeHeader.substringWithRange(ctbRange) as NSString 
         boundry = boundryNSS.substringWithRange(NSRange(location: 9, length: boundryNSS.length - 10)) 
        } 

        if let b = boundry { 
         let parts = self.parseResponse(responseData, boundry: b) 
         print("got parts") 
//      self.sendSynchronize() 
         self.successHandler?(data: responseData, parts:self.parseResponse(responseData, boundry: b)) 
        } else { 
         print("something went wrong") 
         self.errorHandler?(error: NSError(domain: Settings.Error.ErrorDomain, code: Settings.Error.AVSResponseBorderParseErrorCode, userInfo: [NSLocalizedDescriptionKey : "Could not find boundry in AVS response"])) 
        } 
       } 
      } 
     } 

     upload.resume() 
    } 

该功能获取的每一个叫做320个字节的音频数据,因为这是大小亚马逊建议流:)

映入眼帘!

+0

任何运气@tomwyckhuys?我也在同样的问题中遇到过。我也尝试删除末端边界术语。 –

在对话请求开始时(例如,麦克风打开并开始记录的时刻),您应该输入send the JSON metadata headers only once

您还需要在每次为同一个流调用sendData方法时使用相同的边界值。对于整个请求使用相同的HTTP/2流,这意味着您需要重新构造sendData方法以“适应”。使用uploadTask的示例:withStreamedRequest可能会有所帮助(您可能需要使用它)。

我对Swift HTTP/2 API不熟悉,所以我不知道是否会为你处理连续帧,或者如果你需要自己管理它,所以这是需要注意的。祝你好运,希望这有助于。