如何从Google的AudioSet中提取音频嵌入（功能）？

问题描述：

我正在谈论https://research.google.com/audioset/download.html上提供的音频特征数据集，作为由帧级音频记录组成的tar.gz存档。如何从Google的AudioSet中提取音频嵌入（功能）？

从tfrecord文件中提取一切工作正常（我可以提取关键字：video_id，start_time_seconds，end_time_seconds，labels），但是训练所需的实际嵌入似乎根本不存在。当我从数据集迭代任何tfrecord文件的内容时，只会打印四个键video_id，start_time_seconds，end_time_seconds和labels。

这是我使用的代码：

import tensorflow as tf 
import numpy as np 

def readTfRecordSamples(tfrecords_filename): 

    record_iterator = tf.python_io.tf_record_iterator(path=tfrecords_filename) 

    for string_record in record_iterator: 
     example = tf.train.Example() 
     example.ParseFromString(string_record) 
     print(example) # this prints the abovementioned 4 keys but NOT audio_embeddings 

     # the first label can be then parsed like this: 
     label = (example.features.feature['labels'].int64_list.value[0]) 
     print('label 1: ' + str(label)) 

     # this, however, does not work: 
     #audio_embedding = (example.features.feature['audio_embedding'].bytes_list.value[0]) 

readTfRecordSamples('embeddings/01.tfrecord')

有没有什么绝招提取128维的嵌入？或者他们真的不在这个数据集？

答

解决了它，tfrecord文件需要作为序列例子阅读，而不是作为例子。如果线路

example = tf.train.Example()

由

example = tf.train.SequenceExample()

所述的嵌入替换和所有其他内容然后可以通过简单地运行

print(example)

要观看的上面的代码工作

如何从Google的AudioSet中提取音频嵌入（功能）？

相关推荐