检查一列值是否包含在另一列值(TSQL)中?
问题描述:
嘿,我有2个表有很多列,我想找到table1.somecolumn的值包含在table2.someothercolumn中的那些行。例如:检查一列值是否包含在另一列值(TSQL)中?
table1.somecolumn有史密斯,彼得和
table2.someothercolumn有peter.smith
这应该是一个比赛,我怎么会做这样的搜索?
谢谢:)
答
有根据几个可能的解决方案,正是你需要: 使用可以创建辅助表关键字存储每条记录
- 使用的辅助表存储关键字对每条记录或记录和现场。例如。 table_helper(id int主键,record_id int,keyword varchar),record_id - 链接到源表。在table1,table2的触发器中填充此表。查询通用行是table_helper与自身的简单交集。您可以为table1和table2创建一个助手或使用单独的表。
- 使用全文索引。
答
您可以尝试SOUNDEX
或DIFFERENCE
函数来帮助匹配字符串文字。
实施例:
select difference('peter.green', 'Green, Peter')
返回2
,由此:
的整数返回是 字符在SOUNDEX值即 是相同的数目。从0到4的返回值范围为 :0表示弱或 不相似,并且4表示强 相似或相同的值。
请参阅SOUNDEX和DIFFERENCE MSDN上的主题。
更新:
探测法&差异不能正常工作时的单词顺序考虑,但如果你已经安装了全文索引功能,您不需要创建使用这个词的索引打破和解析全文引擎的能力。假设你使用SQL Server 2008,下面的函数将返回标准化术语列表:
SELECT * FROM sys.dm_fts_parser('"Peter Green"', 1033, 0, 0)
,通过它可以CROSS APPLY
到您的查询的其余部分。
请参阅sys.dm_fts_parser主题&部分K.使用在FROM主题中应用以获取更多信息。
例子:(SQL Server企业2008年启用了全文搜索引擎)
if not OBJECT_ID('Names1', 'Table') is null drop table names1
if not OBJECT_ID('Names2', 'Table') is null drop table names2
create table Names1
(
id int identity(0, 1),
name nvarchar(128)
)
insert into Names1 (name) values ('Green, Peter')
insert into Names1 (name) values ('Smith, Peter')
insert into Names1 (name) values ('Aadland, Beverly')
insert into Names1 (name) values ('Aalda, Mariann')
insert into Names1 (name) values ('Aaliyah')
insert into Names1 (name) values ('Aames, Angela')
insert into Names1 (name) values ('Aames, Willie')
insert into Names1 (name) values ('Aaron, Caroline')
insert into Names1 (name) values ('Aaron, Quinton')
insert into Names1 (name) values ('Aaron, Victor')
insert into Names1 (name) values ('Abbay, Peter')
insert into Names1 (name) values ('Abbott, Dorothy')
insert into Names1 (name) values ('Abbott, Bruce')
insert into Names1 (name) values ('Abbott, Bud')
insert into Names1 (name) values ('Abbott, Philip')
insert into Names1 (name) values ('Abdoo, Rose')
insert into Names1 (name) values ('Abdul, Paula')
insert into Names1 (name) values ('Abel, Jake')
insert into Names1 (name) values ('Abel, Walter')
insert into Names1 (name) values ('Abeles, Edward')
insert into Names1 (name) values ('Abell, Tim')
insert into Names1 (name) values ('Aber, Chuck')
create table Names2
(
id int identity(200, 1),
name nvarchar(128)
)
insert into Names2 (name) values (LOWER('Peter.Green'))
insert into Names2 (name) values (LOWER('Peter.Smith'))
insert into names2 (name) values (LOWER('Beverly.Aadland'))
insert into names2 (name) values (LOWER('Mariann.Aalda'))
insert into names2 (name) values (LOWER('Aaliyah'))
insert into names2 (name) values (LOWER('Angela.Aames'))
insert into names2 (name) values (LOWER('Willie.Aames'))
insert into names2 (name) values (LOWER('Caroline.Aaron'))
insert into names2 (name) values (LOWER('Quinton.Aaron'))
insert into names2 (name) values (LOWER('Victor.Aaron'))
insert into names2 (name) values (LOWER('Peter.Abbay'))
insert into names2 (name) values (LOWER('Dorothy.Abbott'))
insert into names2 (name) values (LOWER('Bruce.Abbott'))
insert into names2 (name) values (LOWER('Bud.Abbott'))
insert into names2 (name) values (LOWER('Philip.Abbott'))
insert into names2 (name) values (LOWER('Rose.Abdoo'))
insert into names2 (name) values (LOWER('Paula.Abdul'))
insert into names2 (name) values (LOWER('Jake.Abel'))
insert into names2 (name) values (LOWER('Walter.Abel'))
insert into names2 (name) values (LOWER('Edward.Abeles'))
insert into names2 (name) values (LOWER('Tim.Abell'))
insert into names2 (name) values (LOWER('Chuck.Aber'));
with ftsNamesFirst (id, term) as
(
select id, terms.display_term
from names1 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms
), ftsNamesSecond (id, term) as
(
select id, terms.display_term
from names2 cross apply sys.dm_fts_parser('"' + name + '"', 1033, 0, 0) terms
)
select * from
(
select
ROW_NUMBER() over (partition by nfirst.id order by sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) desc) ranking,
sum(DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term)) Confidence,
nFirst.id Names1ID,
nFirst.name Names1Name,
nSecond.id Names2ID,
nSecond.name Names2Name
from
ftsNamesFirst cross join ftsNamesSecond
left outer join names1 nFirst on nFirst.id = ftsNamesFirst.id
left outer join names2 nSecond on nSecond.id = ftsNamesSecond.id
where DIFFERENCE(ftsNamesFirst.term, ftsNamesSecond.term) = 4
group by
nFirst.id, nFirst.name, nSecond.id, nSecond.name
) MatchedNames
where ranking = 1
输出:
凡与置信度最高的匹配优先(所有其他人都被过滤掉使用窗口排名查询)。
Confidence Names1ID Names1Name Names2ID Names2Name
8 0 Green, Peter 200 peter.green
8 1 Smith, Peter 201 peter.smith
8 2 Aadland, Beverly 202 beverly.aadland
8 3 Aalda, Mariann 203 mariann.aalda
4 4 Aaliyah 204 aaliyah
8 5 Aames, Angela 205 angela.aames
8 6 Aames, Willie 206 willie.aames
这并不完美,但这是一个很好的起点,从它可以调整以提高成功概率。