matlab中细胞阵列的比较

问题描述：

我有两个单元格数组，每个单元格存储单元格和bigrams，我已经从文本文件中提取。现在，我必须将每个单元与两个bigram进行比较，以找出在bigram中存在的单元的数量和后来的可能性。任何人都可以帮助我如何排序问题，我已经使用strcmp，但它不工作。我写我下面的代码：matlab中细胞阵列的比较

for i = 1 
    for j = 1:bigramRow 
     bigram1 = regexp(splitBigramCellsA{j},'<s>|\w*|</s>','match'); 
     b1 = cellfun(@(x,y)[x], bigram1(1:end-1)','un',0) 
     match = strcmp(splitUnigramCellsA, splitBigramCellsA{j,1}); 

     if match ==1 
      bigram1count = splitbigramCellsB{j}; 
      unigram1count = splitUnigramCellsB{j}; 
      disp(bigram1count) 
      disp(unigram1count) 
     end 
end 
end

你能解释一下unigrams和bigrams是什么？ splitBigramCells包含什么？ – Jonas

Unigrams是句子中的每个独特单词。 Bigrams是一次采取的两个字。例如：'这是一个美好的一天'，包含了''它是''，'是'，'可爱的'，'美好的一天'。 – Seema

答

如果你能适应在内存中的文字，你可以做到以下几点：

创建的所有单词的单元阵列（按顺序）
通话在单元阵列上是唯一的，并捕获第三个输出。这是以索引数组表示的原始文本，其中每个索引都指向一个单元。
创建所有bigrams为bigrams = [indices(1:2:largestEven),indices(2:2:largestEven);indices(2:2:largestOdd),indices(3:2:largestOdd)]，其中largestEven为2*floor(length(indices)/2)和largestOdd为2*floor((length(indices)+1)/2)+1。
计算例如在bigrams每个单元的频率为tabulate(bigrams(:))

matlab中细胞阵列的比较

相关推荐