将数字字符实体引用转换为可读文本
问题描述:
我一直在努力寻找将ASCII字符转换/解码为可读文本的类。将数字字符实体引用转换为可读文本
我在Stack Overflow中找到了这个方法,并且它将很多字符修复为可读的文本。但我仍然例如挣扎:
#&44;
#&46;
#&58;
#&39;
...等等。
我从XML文件与TBXML并在XML编码接收我的数据是:
iso-8859-1
有谁有转换/解码所有的ASCII字符来读取的方法文本?
- (NSString *)stringByDecodingXMLEntities {
NSUInteger myLength = [self length];
NSUInteger ampIndex = [self rangeOfString:@"&" options:NSLiteralSearch].location;
// Short-circuit if there are no ampersands.
if (ampIndex == NSNotFound) {
return self;
}
// Make result string with some extra capacity.
NSMutableString *result = [NSMutableString stringWithCapacity:(myLength * 1.25)];
// First iteration doesn't need to scan to & since we did that already, but for code simplicity's sake we'll do it again with the scanner.
NSScanner *scanner = [NSScanner scannerWithString:self];
[scanner setCharactersToBeSkipped:nil];
NSCharacterSet *boundaryCharacterSet = [NSCharacterSet characterSetWithCharactersInString:@" \t\n\r;"];
do {
// Scan up to the next entity or the end of the string.
NSString *nonEntityString;
if ([scanner scanUpToString:@"&" intoString:&nonEntityString]) {
[result appendString:nonEntityString];
}
if ([scanner isAtEnd]) {
goto finish;
}
// Scan either a HTML or numeric character entity reference.
if ([scanner scanString:@"&" intoString:NULL])
[result appendString:@"&"];
else if ([scanner scanString:@"'" intoString:NULL])
[result appendString:@"'"];
else if ([scanner scanString:@""" intoString:NULL])
[result appendString:@"\""];
else if ([scanner scanString:@"<" intoString:NULL])
[result appendString:@"<"];
else if ([scanner scanString:@">" intoString:NULL])
[result appendString:@">"];
else if ([scanner scanString:@"&#" intoString:NULL]) {
BOOL gotNumber;
unsigned charCode;
NSString *xForHex = @"";
// Is it hex or decimal?
if ([scanner scanString:@"x" intoString:&xForHex]) {
gotNumber = [scanner scanHexInt:&charCode];
}
else {
gotNumber = [scanner scanInt:(int*)&charCode];
}
if (gotNumber) {
[result appendFormat:@"%C", charCode];
[scanner scanString:@";" intoString:NULL];
}
else {
NSString *unknownEntity = @"";
[scanner scanUpToCharactersFromSet:boundaryCharacterSet intoString:&unknownEntity];
[result appendFormat:@"&#%@%@", xForHex, unknownEntity];
//[scanner scanUpToString:@";" intoString:&unknownEntity];
//[result appendFormat:@"&#%@%@;", xForHex, unknownEntity];
NSLog(@"Expected numeric character entity but got &#%@%@;", xForHex, unknownEntity);
}
}
else {
NSString *amp;
[scanner scanString:@"&" intoString:&]; //an isolated & symbol
[result appendString:amp];
NSString *unknownEntity = @"";
[scanner scanUpToString:@";" intoString:&unknownEntity];
NSString *semicolon = @"";
[scanner scanString:@";" intoString:&semicolon];
[result appendFormat:@"%@%@", unknownEntity, semicolon];
NSLog(@"Unsupported XML character entity %@%@", unknownEntity, semicolon);
}
}
while (![scanner isAtEnd]);
finish:
return result;
}
答
通常情况下,您会让NSXMLparser为您处理该作业。你不需要手工完成转换。
如果你在NSXMLParser上做一个谷歌,你会得到很多的例子。
关于术语的注释 - 这些不是“ASCII字符”,它们是“数字字符实体引用”。 – 2010-09-14 16:52:18
啊哈,谢谢。你知道我能做些什么来做我想做的事吗?我试着用NSXMLParser读取我的XML文档,因为我从Anders那里得到了答案。但是这导致了与TBXML相同的方式。 – 2010-09-14 17:10:52
现在我也试用了MWFeedParser的方法stringByEncodingXMLEntities,它可以处理某些字符。但是这些还有很多,比如这些-等等。 – 2010-09-14 17:48:05