[Python] Pandas groupby 결과를 list의 dictionary로 바꾸기!!

728x90

여러 행으로 구성된 데이터를 다루다 보면, 어떤 특정한 행에 의해 데이터를 정렬해야 하는 일이 빈번하게 발생한다.

이것은 pandas의 groupby로 아주 간단하게 처리할 수 있다.

우선 간단한 데이터를 만들어보자

Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3

이것은 data라는 이름의 텍스트 파일로 만들어 읽어, dataframe으로 변환해보자.

with open (path to data.txt) as f:
    lines = f.readlines()

data = []
for line in lines:
    data.append([ int(el) for el in line.split()])

데이터를 출력해보면 다음과 같다.

In [81]: data 
Out [81]: 
[[0, 23, 1],
 [1, 5, 2],
 [1, 2, 3],
 [1, 19, 5],
 [2, 56, 1],
 [2, 22, 2],
 [3, 2, 4],
 [3, 14, 5],
 [4, 59, 1],
 [5, 44, 1],
 [5, 1, 2],
 [5, 87, 3]]

Pandas의 dataframe으로 컬럼 이름을 지정하여 저장해보자.

df = pd.DataFrame(data=data, columns=['column1', 'column2', 'column3'])

display(df)

이것은 column1에 의해 groupby해보자.

In [90]: df.groupby(by=['column1'])['column3'].apply(list).to_dict()
Out [90]: {0: [1], 1: [2, 3, 5], 2: [1, 2], 3: [4, 5], 4: [1], 5: [1, 2, 3]}

이렇게 분류하여 원하는 분석을 할 수 있다.

728x90

저작자표시 (새창열림)

'Programming > Python' 카테고리의 다른 글

[Python] Conda 가상환경을 다른 컴퓨터로 옮기기 (ft. 윈도우 버전) (0)	2022.05.04
[Pandas] UnicodeDecodeError: 'utf-8' codec can't decode byte 해결방법 (1)	2022.04.13
[Python] 압축 파일(zip) 다루기 (ft. 압축 풀지 않고 데이터 읽기) (0)	2021.06.14
[Python] pandas 파일 로딩 속도 비교 (CSV vs. Pickle 포맷) (0)	2021.06.02
[Python] Pandas 'settingWithCopyWarning' 경고 메세지 없애기 (0)	2021.05.09