生成所有最长公共子字符串的列表和变化
问题描述:
高层
我试图在句子和现在只是他们在不同区域的列表崩溃常见字符串的列表。因此,采取这样的:生成所有最长公共子字符串的列表和变化
Please don't kick any of the cats
Please do kick any of the cats
Please don't kick any of the dogs
Please do kick any of the dogs
Please don't kick any of the garden snakes
Please do pet any of the garden snakes
并将这种:
Please [don't|do] [kick|pet] any of the [cats|dogs|garden snakes]
更多详细信息
- 我一直在寻找最长公共子串的算法,但似乎只是比较两个字符串。
- 我只对比较字符串中的全部单词感兴趣。
- 只想从左向右评估字符串。
- 罕见的子串的长度将是不一样的字数(“猫”与“花园里的蛇”)
我要找的算法帮助。我认为这是LCS问题的一个变种,我认为某种后缀树的处理。可能解释和实现的伪代码将是理想的。
另一个例子
Please join thirteen of your friends at the Midnight Bash this Friday
Don't forget to join your friend John at the Midnight Bash tomorrow
Don't forget to join your friends John and Julie at the Midnight Bash tonight
变为:
[Please|Don't forget to]
join
[thirteen of your friends|your friend John|your friends John and Julie]
at the Midnight Bash
[this Friday|tomorrow|tonight]
也许这种做法
怎么样这种方法...
for an array of sentences
loop with the remaining sentence
find the "first common substring (FCS)"
split the sentences on the FCS
every unique phrase before the FCS is part of the set of uncommon phrases
trim the sentence by the first uncommon phrase
end loop
答
有趣的是,我一直在思考关于cre像以前很久以前一样,直到我意识到这实际上是一种AI。需要考虑的因素太多:语法,语法,情况,错误等。但是,如果您的输入始终如此固定,如“请[A1 | A2 | ..] [B1 | B2 | ..]任何[C1 | C2 | ..]“,那么也许一个简单的正则表达式模式会这样做:”^ Please \ s *(?(do | do))\ s *(?\ w +)+ \ s *任何\ s *(?)* $”。
......你到底在问什么? – JNYRanger
编辑更清楚我所要求的...帮助算法。 – theChrisMarsh