Tuesday, September 24, 2013

html file parsing


Will you Daniel, brother of html parsing where there are problems, please advice, thank you.
NSStringEncoding encoding = CFStringConvertEncodingToNSStringEncoding (kCFStringEncodingGB_18030_2000);

NSData * data = [[NSData alloc] initWithContentsOfFile: @ "http://finance.sina.com.cn/china/20130519/140315511917.shtml"];
NSString * str = [[NSString alloc] initWithData: data encoding: encoding];
str = [str stringByReplacingOccurrencesOfString: @ "\" gb2312 \ "" withString: @ "\" utf-8 \ ""];
NSData * newData = [str dataUsingEncoding: NSUTF8StringEncoding];
TFHpple * hpple = [[TFHpple alloc] initWithHTMLData: newData];
NSArray * elements = [hpple search: @ "/ / h1"];
TFHppleElement * element = [elements objectAtIndex: 0];
NSString * content = [element content];
NSLog (@ "% @", content);


Reply:
Wrong, this is the
NSStringEncoding gbEncoding = CFStringConvertEncodingToNSStringEncoding (kCFStringEncodingGB_18030_2000);

NSData * htmlData = [NSData dataWithContentsOfURL: [NSURL URLWithString: PATH]];
NSString * htmlStr = [[NSString alloc] initWithData: htmlData encoding: gbEncoding];
NSString * utf8HtmlStr = [htmlStr stringByReplacingOccurrencesOfString: @ ""
withString: @ ""];
NSData * htmlDataUTF8 = [utf8HtmlStr dataUsingEncoding: NSUTF8StringEncoding];
TFHpple * xpathParser = [[TFHpple alloc] initWithHTMLData: htmlDataUTF8];

NSArray * elements = [xpathParser searchWithXPathQuery: @ "/ / h1"];
TFHppleElement * element = [elements objectAtIndex: 0];
NSString * ele = [element content];
NSLog (@ "% @", ele);



The result is null

No comments:

Post a Comment