问题描述：

高层

我试图在句子和现在只是他们在不同区域的列表崩溃常见字符串的列表。因此，采取这样的：生成所有最长公共子字符串的列表和变化

Please don't kick any of the cats 
Please do kick any of the cats 
Please don't kick any of the dogs 
Please do kick any of the dogs 
Please don't kick any of the garden snakes 
Please do pet any of the garden snakes

并将这种：

Please [don't|do] [kick|pet] any of the [cats|dogs|garden snakes]

另一个例子

Please join thirteen of your friends at the Midnight Bash this Friday 
Don't forget to join your friend John at the Midnight Bash tomorrow 
Don't forget to join your friends John and Julie at the Midnight Bash tonight

变为：

[Please|Don't forget to] 
join 
[thirteen of your friends|your friend John|your friends John and Julie] 
at the Midnight Bash 
[this Friday|tomorrow|tonight]

也许这种做法

怎么样这种方法...

for an array of sentences 
    loop with the remaining sentence 
    find the "first common substring (FCS)" 
    split the sentences on the FCS 
    every unique phrase before the FCS is part of the set of uncommon phrases 
    trim the sentence by the first uncommon phrase 
    end loop

......你到底在问什么？ – JNYRanger

编辑更清楚我所要求的...帮助算法。 – theChrisMarsh

答

有趣的是，我一直在思考关于cre像以前很久以前一样，直到我意识到这实际上是一种AI。需要考虑的因素太多：语法，语法，情况，错误等。但是，如果您的输入始终如此固定，如“请[A1 | A2 | ..] [B1 | B2 | ..]任何[C1 | C2 | ..]“，那么也许一个简单的正则表达式模式会这样做：”^ Please \ s *（？（do | do））\ s *（？\ w +）+ \ s *任何\ s *（？）* $”。

答

将每个唯一字映射到单个对象。然后建立一个条件概率表（请参阅Markov chains），以列举一个单词跟随每个序列的次数。

生成所有最长公共子字符串的列表和变化

高层

更多详细信息

另一个例子

也许这种做法

相关推荐