T-SQL中的随机加权选择
如何根据所有候选行的应用权重在T-SQL中随机选择一个表行?T-SQL中的随机加权选择
例如,我有一个表格中的一组行,其权重分别为50,25和25(合计为100,但不需要),我想随机选择其中的一个统计结果相当于相应的重量。
丹恩的答案包括一个引入平方律的自我连接。 (n*n/2)
行之后的表中有n行。
更理想的是能够解析表格一次。
DECLARE @id int, @weight_sum int, @weight_point int
DECLARE @table TABLE (id int, weight int)
INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)
SELECT @weight_sum = SUM(weight)
FROM @table
SELECT @weight_point = FLOOR(((@weight_sum - 1) * RAND() + 1), 0)
SELECT
@id = CASE WHEN @weight_point < 0 THEN @id ELSE [table].id END,
@weight_point = @weight_point - [table].weight
FROM
@table [table]
ORDER BY
[table].Weight DESC
这将通过表格,设置@id
每个记录的id
值,而在同一时间递减@weight
点。最终,@weight_point
将消极。这意味着前面所有权重的SUM
大于随机选择的目标值。这是我们想要的记录,因此从那时起,我们将@id
设置为自己(忽略表中的任何ID)。
这只会贯穿表格一次,但即使所选值是第一条记录,也必须贯穿整个表格。因为平均仓位是通过表的一半(和更少,如果下令升重量)编写循环也可能会被更快......(特别是如果该权重是在公共组):
DECLARE @id int, @weight_sum int, @weight_point int, @next_weight int, @row_count int
DECLARE @table TABLE (id int, weight int)
INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)
SELECT @weight_sum = SUM(weight)
FROM @table
SELECT @weight_point = ROUND(((@weight_sum - 1) * RAND() + 1), 0)
SELECT @next_weight = MAX(weight) FROM @table
SELECT @row_count = COUNT(*) FROM @table
SET @weight_point = @weight_point - (@next_weight * @row_count)
WHILE (@weight_point > 0)
BEGIN
SELECT @next_weight = MAX(weight) FROM @table WHERE weight < @next_weight
SELECT @row_count = COUNT(*) FROM @table WHERE weight = @next_weight
SET @weight_point = @weight_point - (@next_weight * @row_count)
END
-- # Once the @weight_point is less than 0, we know that the randomly chosen record
-- # is in the group of records WHERE [table].weight = @next_weight
SELECT @row_count = FLOOR(((@row_count - 1) * RAND() + 1), 0)
SELECT
@id = CASE WHEN @row_count < 0 THEN @id ELSE [table].id END,
@row_count = @row_count - 1
FROM
@table [table]
WHERE
[table].weight = @next_weight
ORDER BY
[table].Weight DESC
您只需要对所有准备行的权重求和,然后在该总和中选择一个随机点,然后选择与该选定点协调的记录(每个记录递增地携带累加权重和)。
DECLARE @id int, @weight_sum int, @weight_point int
DECLARE @table TABLE (id int, weight int)
INSERT INTO @table(id, weight) VALUES(1, 50)
INSERT INTO @table(id, weight) VALUES(2, 25)
INSERT INTO @table(id, weight) VALUES(3, 25)
SELECT @weight_sum = SUM(weight)
FROM @table
SELECT @weight_point = ROUND(((@weight_sum - 1) * RAND() + 1), 0)
SELECT TOP 1 @id = t1.id
FROM @table t1, @table t2
WHERE t1.id >= t2.id
GROUP BY t1.id
HAVING SUM(t2.weight) >= @weight_point
ORDER BY t1.id
SELECT @id
我做了一个小您和MatBailie解决方案的基准,看起来您的时间大约是Mat的两倍。在2行和1千万次迭代的桌子上,Mat的解决方案耗时约45秒,解决方案约需85秒。 – 2016-09-01 18:40:50
的“逐步背着一个accumlating [原文]加权求和”部分是昂贵的,如果你有很多的记录。如果你已经有了很大的分数/重量范围(例如:范围足够宽,大多数记录重量是独一无二的,1-5颗星可能不会削减它),你可以做这样的事情来选择一个重量值。我使用VB.Net在这里展示,但是这可以很容易地在纯SQL来完成,以及:
Function PickScore()
'Assume we have a database wrapper class instance called SQL and seeded a PRNG already
'Get count of scores in database
Dim ScoreCount As Double = SQL.ExecuteScalar("SELECT COUNT(score) FROM [MyTable]")
' You could also approximate this with just the number of records in the table, which might be faster.
'Random number between 0 and 1 with ScoreCount possible values
Dim rand As Double = Random.GetNext(ScoreCount)/ScoreCount
'Use the equation y = 1 - x^3 to skew results in favor of higher scores
' For x between 0 and 1, y is also between 0 and 1 with a strong bias towards 1
rand = 1 - (rand * rand * rand)
'Now we need to map the (0,1] vector to [1,Maxscore].
'Just find MaxScore and mutliply by rand
Dim MaxScore As UInteger = SQL.ExecuteScalar("SELECT MAX(Score) FROM Songs")
Return MaxScore * rand
End Function
运行它,并选择具有最大记录得分比返回的重量更轻。如果分数超过一个记录,请随机挑选。这里的优点是你不必保留任何总和,并且你可以调整适合你的口味的概率公式。但是,再次,分数越大,它的效果就越好。
使用随机数生成器完成此操作的方法是集成概率密度函数。使用一组离散值可以计算前缀和(所有值的总和),并将其存储起来。使用此选项,您可以选择大于随机数的最小前缀总和(聚合到日期)值。
在数据库上插入后的后续值必须更新。如果数据集的更新和大小的相对频率没有使得这样做成本过高,这意味着可以从单个s-argable(可以通过索引查找来解析的谓词)查询中获取适当的值。
我做了一些经验性测试,发现您的解决方案对输入数据非常敏感。我的测试数据 - 权重:2,998,迭代次数:1M。重量2应该拿起约2k次。如果表中权重的顺序是升序(2,998),那么它的权重2就是大约500次。如果您反转订单,则约为2500次。如果您将`ROUND`更改为`FLOOR`,升序约为1000次,重量降低约2000倍。这是适当的概率。我已经更新了你的答案。 – 2016-09-01 19:01:46