Excel VBA删除重复vs筛选器
我有一个工作表,其中包含约8000行已被过滤。我试图从工作表列中删除重复项来获取值的集合。通过阅读这篇文章,有两种方法可以做到这一点。如果该值不在新集合中,则循环收集并复制到新集合。
或者将列中的数据复制到临时电子表格中,过滤并将数据复制到另一列,然后将其添加到集合中。Excel VBA删除重复vs筛选器
当处理大量数据时,复制过滤器具有最佳性能,但由于必须创建新工作表,因此该过滤器非常笨重。
我还没有看到它,但是有没有办法在内存中做复制过滤器,而不是创建一个工作表来做到这一点?
重述:
Sub GetColumnValues(Ws As Worksheet, Column As Long, CollValues As Collection)
Dim RowIndex As Long
For RowIndex = 1 To GetLastRow(Ws)
If CollValues.Count = 0 Then
CollValues.Add (Ws.Cells(RowIndex, Column).Value)
Else
If IsInCollection(CollValues, Ws.Cells(RowIndex, Column).Value) = False Then
CollValues.Add (Ws.Cells(RowIndex, Column).Value)
End If
End If
Next RowIndex
End Sub
过滤器和复制:
Sub GetColumnValues(Ws As Worksheet, Column As Long, CollValues As Collection)
Dim rowLast As Long
Dim c As Range
Dim tmpWS As Worksheet
Dim tmpWsName As String
tmpWsName = "TempWS"
Call DeleteWs(TsWb, tmpWsName)
Set tmpWS = TsWb.Sheets.Add()
tmpWS.Name = tmpWsName
rowLast = GetLastRow(Ws)
Ws.Range(Ws.Cells(1, Column), Ws.Cells(rowLast, Column)).Copy
tmpWS.Range("A1").PasteSpecial
rowLast = GetLastRow(tmpWS)
tmpWS.Range(tmpWS.Cells(1, 1), tmpWS.Cells(rowLast, 1)).AdvancedFilter _
Action:=xlFilterCopy, _
CopyToRange:=tmpWS.Range("B1"), _
Unique:=True
rowLast = GetLastRow(tmpWS)
For Each c In tmpWS.Range(tmpWS.Cells(1, 2), tmpWS.Cells(rowLast, 2))
If Len(c.value) > 0 Then
CollValues.Add (c.value)
End If
Next c
Call DeleteWs(TsWb, tmpWsName)
End Sub
我不知道为什么它必须是一个集合,但要获得快速的让所有值的数组无(过滤列表),双打,你可以做这样的:(非常接近你的第一个例子)
Function GetColVal(Ws As Worksheet, Column As Long) As Variant
Dim runner As Variant, outputVal() As Variant, i As Long
ReDim outputVal(Ws.Range(Ws.Cells(1, Column), Ws.Cells(GetLastRow(Ws), Column)).SpecialCells(xlCellTypeVisible).Count)
For Each runner In Ws.Range(Ws.Cells(1, Column), Ws.Cells(GetLastRow(Ws), Column)).SpecialCells(xlCellTypeVisible)
If i = 0 Then
outputVal(0) = runner.Value: i = 1
Else
If IsError(Application.Match(runner.Value, outputVal, 0)) Then outputVal(i) = runner.Value: i = i + 1
End If
Next
ReDim Preserve outputVal(i - 1)
GetColVal= outputVal
End Function
的Application.Match
是在VBA最快的功能之一,而IsInCollection
可以是非常慢......更好的运行For Each ...
循环到集合中添加的一切不是检查集合...
Dim a As Variant
For Each a in GetColVal(Worksheets("SheetX"),7)
MyCollection.Add a
Next
应该比你的例子快得多...仍然我推荐不使用集合,特别是如果你只是使用值...如果可能的话,更好地使用GetColVal
-array ... variantVariable = GetColVal(Worksheets("SheetX"),7)
然后使用变量变量你想干什么就干什么(你也可以粘贴在片直接某处)
一个简单的输出到工作表是这样的:
Dim a As Variant
a = GetColVal(Worksheets("Sheet1"),13) 'values from sheet1 column M
'pasting in one row (starting at A1 in Sheet2)
ThisWorkbook.Sheets("Sheet2").Range(Cells(1, 1), Cells(1, ubound(a) + 1)).value = a
'pasting in one column (starting at C5 in Sheet4)
ThisWorkbook.Sheets("Sheet4").Range(Cells(5, 3), Cells(ubound(a) + 5, 3)).value = Application.Transpose(a)
编辑
要显示不同的东西:
Function GetColumnValues(ws As Worksheet, Column As Long) As Range
With ws
Dim srcRng As Range, outRng As Range, runRng1 As Range, runRng2 As Range, dBool As Boolean
Set srcRng = .Range(.Cells(1, Column), .Cells(GetLastRow(ws), Column)).SpecialCells(xlCellTypeVisible)
For Each runRng1 In a
If outRng Is Nothing Then Set outRng = runRng1
For Each runRng2 In outRng
If Intersect(runRng1, runRng2) Is Nothing Then
If runRng2.Value = runRng1.Value Then dBool = True: Exit For
End If
Next
If dBool Then dBool = False Else Set outRng = Union(outRng, runRng1)
Next
End With
Set GetColumnValues = outRng
End Function
有了这个功能,你会得到一个范围内的所有可以选择或复制到另一个位置(与格式化和其他一切)的细胞。您仍然可以使用For Each ...
将所有元素添加到集合中。我也没有使用Match
来避免“Len> 255” - 错误
是的,只是做阵列,然后检查对数组,然后提供您的结果返回到工作表。我个人喜欢在内存中而不是通过应用程序IDE做事情。
它快得多(特别是数万行),您不必担心屏幕刷新,或让用户想知道如何快速移动所有内容。我通常处理内存中的所有内容,将其交回,然后激活我希望用户看到的工作表。
dim set1Array() as String
dim set2Array() as String
dim set1Rows as Long
dim set2Rows as Long
dim lngX as Long
dim lngY as Long
dim blnDebug as Boolean; blnDebug = true ' flag for debugging
' get count of rows so we know how big to make the arrays
set1Rows = GetLastRow(Ws1)
set2Rows = GetLastRow(Ws2)
' set arrays to the proper size
redim set1Rows(set1Rows - 1, 1)' 1 represents 2 columns since it's 0 based. the second column is a flag for duplicated.
redim set2Rows(set2Rows - 1, 0)' 0 represents 1 column since it's 0 based
' load the arrays with the sheet data
for lngX = 1 to set1Rows
set1Rows(lngX - 1, 0) = Worksheets("Sheet1").range("A" & lngX).Text
next lngX
for lngX = 1 to set2Rows
set2Rows(lngX - 1, 0) = Worksheets("Sheet2").range("A" & lngX).Text
next lngX
' I like to do a debug callout here to see what I got to make sure that I am good to go with the dataset
if blnDebug then
for lngX = 0 to Ubound(set1Rows)
debug.print "set1Rows(" & lngX & ") - col1: " & set1Rows(lngX, 0)
next lngX
for lngX = 0 to Ubound(set2Rows)
debug.print "set2Rows(" & lngX & ") - col1: " & set2Rows(lngX, 0)
next lngX
end if
' now do your comparison
for lngX = 0 to Ubound(set1Rows)
for lngY = 0 to Ubound(set2Rows)
if set1Rows(lngX, 0) = set2Rows(lngY, 0) then
set1Rows(lngX, 1) = "1"
end if
next lngY
next lngX
' now your duplicates are flagged in the set1Rows array
for lngX = 0 to Ubound(set1Rows)
if set1Rows(lngX, 1) = "1" then
' code for duplicated
else
' code for unique
end if
next lngX
正在对一列数据进行比较。所以理想情况下,我想将列复制到数组中,在该数组中删除任何重复的值。然后我会使用该数组进行额外的处理。 – SteveP65
收集的唯一原因是我从outputVal中删除重复项后删除了其他数据。使用集合而不是重新创建新数组更容易。我能想到的唯一方法是创建一个需要删除的单独索引数组,然后执行redim并一次复制到新数组。 虽然你已经给了我一些想法,但谢谢。 – SteveP65
我仍在调查此问题。我发现的一个问题是,如果一个Variant数组中有一个字符串长于255个字符,则Application.Match会失败,并显示类型不匹配错误。如果我将输出数组更改为字符串数组,则可以处理整个数据范围。 – SteveP65
@ SteveP65我添加了另一个功能,你可能想尝试(应该更快,但我无法确定)...显然你会得到一个范围,它可以保留一些你可能想要使用的好处:) –