将自定义类别分配给json数据 - pandas

问题描述:

将标签分配给原始数据,而不是从get_dummies获取新的指示符列。我想是这样的:将自定义类别分配给json数据 - pandas

json_input:

[{ID:100,汽车类型: “汽车”,时间: “2017年4月6日1时39分43秒”,区= “A”,类型:“Checked”}, {id:101,vehicle_type:“Truck”,time:“2017-04-06 02:35:45”,zone =“B”,type:“Unchecked”}, {id: 102,vehicle_type:“Truck”,time:“2017-04-05 03:20:12”,zone =“A”,type:“Checked”}, {id:103,vehicle_type:“Car”,time: “2017年4月4日10点05分04秒”,区= “C”,类型: “未检查”} ]

结果:

  • ID,汽车类型,列出的时间范围,区域,类型
  • 100,0,1,1,1
  • 101,1,1,2,0
  • 102,1,2,1,1
  • 103,0,3,3,0

时间stamp- TS 列 - >汽车类型,类型是二进制的,列出的时间范围(1 - >(TS1-TS2),2 - >(TS3-TS4), 3 - >(TS5-TS6)),区域 - >分类(1,2或3)。 我想自动分配这些标签,当我将扁平化的json提供给熊猫中的数据框时。这可能吗? (我不想在熊猫中使用get_dummies中的zone_1,type_1,vehicle_type_3指标列)。如果熊猫不可能,请为这个自动化建议python lib。

+0

向我们展示你的JSON和你想要的结果看起来像。 –

这是我能想出来的。我不知道你在找什么时间范围为

import datetime 
import io 
import pandas as pd 
import numpy as np 
df_string='[{"id":100,"vehicle_type":"Car","time":"2017-04-06 01:39:43","zone":"A","type":"Checked"},{"id":101,"vehicle_type":"Truck","time":"2017-04-06 02:35:45","zone":"B","type":"Unchecked"},{"id":102,"vehicle_type":"Truck","time":"2017-04-05 03:20:12","zone":"A","type":"Checked"},{"id":103,"vehicle_type":"Car","time":"2017-04-04 10:05:04","zone":"C","type":"Unchecked"}]' 
df = pd.read_json(io.StringIO(df_string)) 
df['zone'] = pd.Categorical(df.zone) 
df['vehicle_type'] = pd.Categorical(df.vehicle_type) 
df['type'] = pd.Categorical(df.type) 
df['zone_int'] = df.zone.cat.codes 
df['vehicle_type_int'] = df.vehicle_type.cat.codes 
df['type_int'] = df.type.cat.codes 
df.head() 

编辑 这是我能想出

import datetime 
import io 
import math 
import pandas as pd 
#Taken from http://*.com/questions/13071384/python-ceil-a-datetime-to-next-quarter-of-an-hour 
def ceil_dt(dt, num_seconds=900): 
    nsecs = dt.minute*60 + dt.second + dt.microsecond*1e-6 
    delta = math.ceil(nsecs/num_seconds) * num_seconds - nsecs 
    return dt + datetime.timedelta(seconds=delta) 

df_string='[{"id":100,"vehicle_type":"Car","time":"2017-04-06 01:39:43","zone":"A","type":"Checked"},{"id":101,"vehicle_type":"Truck","time":"2017-04-06 02:35:45","zone":"B","type":"Unchecked"},{"id":102,"vehicle_type":"Truck","time":"2017-04-05 03:20:12","zone":"A","type":"Checked"},{"id":103,"vehicle_type":"Car","time":"2017-04-04 10:05:04","zone":"C","type":"Unchecked"}]' 
df = pd.read_json(io.StringIO(df_string)) 
df['zone'] = pd.Categorical(df.zone) 
df['vehicle_type'] = pd.Categorical(df.vehicle_type) 
df['type'] = pd.Categorical(df.type) 
df['zone_int'] = df.zone.cat.codes 
df['vehicle_type_int'] = df.vehicle_type.cat.codes 
df['type_int'] = df.type.cat.codes 
df['time'] = pd.to_datetime(df.time) 
df['dayofweek'] = df.time.dt.dayofweek 
df['month_int'] = df.time.dt.month 
df['year_int'] = df.time.dt.year 
df['day'] = df.time.dt.day 
df['date'] = df.time.apply(lambda x: x.date()) 
df['month'] = df.date.apply(lambda x: datetime.date(x.year, x.month, 1)) 
df['year'] = df.date.apply(lambda x: datetime.date(x.year, 1, 1)) 
df['hour'] = df.time.dt.hour 
df['mins'] = df.time.dt.minute 
df['seconds'] = df.time.dt.second 
df['time_interval_3hour'] = df.hour.apply(lambda x : math.floor(x/3)+1) 
df['time_interval_6hour'] = df.hour.apply(lambda x : math.floor(x/6)+1) 
df['time_interval_12hour'] = df.hour.apply(lambda x : math.floor(x/12)+1) 
df['weekend'] = df.dayofweek.apply(lambda x: x>4) 

df['ceil_quarter_an_hour'] =df.time.apply(lambda x : ceil_dt(x)) 
df['ceil_half_an_hour'] =df.time.apply(lambda x : ceil_dt(x, num_seconds=1800)) 
df.head() 
+0

我正在寻找像当天同一小时的范围,然后将它们分组为一个类别。基本上也是一种基于范围进行分类的方式 - 时间,数字。 – Milee

+0

谢谢。完善。 – Milee