正则表达式找到

问题描述：

之间，我有以下的正则表达式时，有没有领导/ d，“有是系统上1个接口：正则表达式找到

或尾随”，这一切的作品，2017-01 -...

这里是正则表达式：

(?m)(?<_KEY_1>\w+[^:]+?):\s(?<_VAL_1>[^\r\n]+)$

这里是什么，我试图分析样本：

1,"There is 1 interface on the system: 
    Name    : Mobile Broadband Connection 
    Description  : Qualcomm Gobi 2000 HS-USB Mobile Broadband Device 250F 
    GUID    : {1234567-12CD-1BC1-A012-C1A1234CBE12} 
    Physical Address : 00:a0:c6:00:00:00 
    State    : Connected 
    Device type  : Mobile Broadband device is embedded in the system 
    Cellular class  : CDMA 
    Device Id   : A10000f67 
    Manufacturer  : Qualcomm Incorporated 
    Model    : Qualcomm Gobi 2000 
    Firmware Version : 09010091 
    Provider Name  : Verizon Wireless 
    Roaming   : Not roaming 
    Signal    : 67%",2017-01-20T16:00:07.000-0700

我我试图提取字段名，其中例如蜂窝级将等于CDMA但各个领域开始后：

1,"There is 1 interface on the system: (where 1 increments 1,2 3,4 and so on

和拖尾”，之前2017-01 ....

任何帮助是非常赞赏！

我可以问为什么你的字符串是那么长吗？每一条信息都不能成为自己的字符串吗？ – jdmdevdotnet

请查看https://regex101.com/r/qmuNpg/2。像那样的东西？ – ClasG

或者更确切地说https://regex101.com/r/qmuNpg/3 – ClasG

答

您可以使用预见来确保您匹配的字符串出现在",\d序列之前，并且不包括"。后者将确保你将只有双引号，其中第二个有模式",\d之间的匹配：

/^\h*(?<_KEY_1>[\w\h]+?)\h*:\h*(?<_VAL_1>[^\r\n"]+)(?="|$)(?=[^"]*",\d)/gm

看到它的regex101

注：我把g和m修饰底，但如果你的环境在开始时需要(?m)表示法，那当然也适用。

答

您的示例字符串似乎是来自csv文件的记录。这是我能完成这个任务与Python（2.7或3.x版）：

import csv 

with open('file.csv', 'r') as fh: 
    reader = csv.reader(fh) 
    results = [] 

    for fields in reader: 
     lines = fields[1].splitlines() 
     keyvals = [list(map(str.strip, line.split(':', 1))) for line in lines[1:]] 
     results.append(keyvals) 

    print(results)

它可以与其他语言类似的方式来完成。

答

你还没有回答我的意见或任何答案，但这里是我的答案 - 尽量

^\s*(?<_KEY_1>[\w\s]+?)\s*:\s*(?<_VAL_1>[^\r\n"]+).*$

See it here at regex101。

相关推荐