Univocity - 不规则的csv解析

问题描述:

我有我需要解析的不规则(尽管一致)“csv”文件。内容是这样的:Univocity - 不规则的csv解析

Field1: Field1Text 
Field2: Field2Text 

Field3 (need to ignore) 
Field4 (need to ignore) 

Field5 
Field5Text 

// Cars - for example 
#,Col1,Col2,Col3,Col4,Col5,Col6 
#1,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text 
#2,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text 
#3,Col1Text,Col2Text,Col3Text,Col4Text,Col5Text,Col6Text 

理想我想使用类似的方法为here

我最终要与像的目的是结束:

String field1; 
String field2; 
String field5; 
List<Car> cars; 

我现在有以下几个问题:

  • 增加了一些探索性测试后,用哈希开始(#)线被忽略。我不想要这个,反正有逃跑吗?
  • 我的目的是为cars部分使用BeanListProcessor,并使用单独的行处理器处理其他字段。然后将结果合并到上述对象中。我在这里想念任何技巧吗?

你的第一个问题是#默认情况下被视为注释字符。为了防止开始#线被视为注释,这样做:

parserSettings.getFormat().setComment('\0'); 

至于你解析结构,有没有办法做到开箱的,但它很容易充分利用该API为了它。下面的工作:

CsvParserSettings settings = new CsvParserSettings(); 
    settings.getFormat().setComment('\0'); //prevent lines starting with # to be parsed as comments 

    //Creates a parser 
    CsvParser parser = new CsvParser(settings); 

    //Open the input 
    parser.beginParsing(new File("/path/to/input.csv"), "UTF-8"); 

    //create BeanListProcessor for instances of Car, and initialize it. 
    BeanListProcessor<Car> carProcessor = new BeanListProcessor<Car>(Car.class); 
    carProcessor.processStarted(parser.getContext()); 

    String[] row; 
    Parent parent = null; 
    while ((row = parser.parseNext()) != null) { //read rows one by one. 
     if (row[0].startsWith("Field1:")) { // when Field1 is found, create your parent instance 
      if (parent != null) { //if you already have a parent instance, cars have been read. Associate the list of cars to the instance 
       parent.cars = new ArrayList<Car>(carProcessor.getBeans()); //copy the list of cars from the processor. 
       carProcessor.getBeans().clear(); //clears the processor list 
       //you probably want to do something with your parent bean here. 
      } 
      parent = new Parent(); //create a fresh parent instance 
      parent.field1 = row[0]; //assign the fields as appropriate. 
     } else if (row[0].startsWith("Field2:")) { 
      parent.field2 = row[0]; //and so on 
     } else if (row[0].startsWith("Field5:")) { 
      parent.field5 = row[0]; 
     } else if (row[0].startsWith("#")){ //got a "Car" row, invoke the rowProcessed method of the carProcessor. 
      carProcessor.rowProcessed(row, parser.getContext()); 
     } 
    } 

    //at the end, if there is a parent, get the cars parsed 
    if (parent != null) { 
     parent.cars = carProcessor.getBeans(); 
    } 

对于BeanListProcessor工作,你需要已经宣布你这样的实例:

public static final class Car { 
    @Parsed(index = 0) 
    String id; 
    @Parsed(index = 1) 
    String col1; 
    @Parsed(index = 2) 
    String col2; 
    @Parsed(index = 3) 
    String col3; 
    @Parsed(index = 4) 
    String col4; 
    @Parsed(index = 5) 
    String col5; 
    @Parsed(index = 6) 
    String col6; 
} 

您可以使用头代替,但它会让你写更多的代码。如果标题总是相同的,那么你可以假设位置是固定的。

希望这会有帮助

+0

感谢您花时间回复杰罗尼莫。也很喜欢使用解析器! – Hurricane