◀️ 🌱 April 🌱 ▶️
일	월	목	금	토
		0	0	0
0	0	0	0	0
0	0	0	0	0
0	0	0	0	0
0	0

[빅데이터분석기사 실기] corr() 함수와 numeric_only 옵션

2024. 11. 25. 17:34

728x90

corr() 함수와 numeric_only 옵션

들어가며

판다스(Pandas) 2.0.0 버전부터 corr 함수의 numeric_only 옵션의 기본값이 False로 변경되었다.
이에 대한 내용을 정리해본다.

설명

판다스(Pandas) 2.0.0 버전부터 corr 함수의 numeric_only 옵션의 기본값이 False로 변경되었다.
- 이전 버전에는 기본값이 True로 설정되어 있어서, 이 옵션을 따로 넣어주지 않아도 됐었다.
따라서 판다스 2.0.0 이상 버전이 적용된 빅데이터분석기사 실기 시험 9회차부터 corr 함수를 사용할 경우, numeric_only=True 옵션을 반드시 지정해줘야 한다.


			
			
			
		
import pandas as pd
 
df = pd.read_csv("data/Titanic.csv")
 
corr_table = df.corr(numeric_only=True)   # numeric_only=True 옵션 지정
print(corr_table)


			
			
			
		
              PassengerId  Survived    Pclass  ...     SibSp     Parch      Fare
PassengerId     1.000000 -0.005007 -0.035144  ... -0.057527 -0.001652  0.012658
Survived       -0.005007  1.000000 -0.338481  ... -0.035322  0.081629  0.257307
Pclass         -0.035144 -0.338481  1.000000  ...  0.083081  0.018443 -0.549500
Age             0.036847 -0.077221 -0.369226  ... -0.308247 -0.189119  0.096067
SibSp          -0.057527 -0.035322  0.083081  ...  1.000000  0.414838  0.159651
Parch          -0.001652  0.081629  0.018443  ...  0.414838  1.000000  0.216225
Fare            0.012658  0.257307 -0.549500  ...  0.159651  0.216225  1.000000
 
[7 rows x 7 columns]

만약 해당 옵션을 지정하지 않을 경우 아래와 같은 오류가 발생한다.


			
			
			
		
Makefile:6: recipe for target 'py3_run' failed
make: *** [py3_run] Error 1
Traceback (most recent call last):
  File "/goorm/Main.out", line 11, in <module>
    corr_table = df.corr()
                 ^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 11049, in corr
    mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 1993, in to_numpy
    result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 1694, in as_array
    arr = self._interleave(dtype=dtype, na_value=na_value)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 1753, in _interleave
    result[rl.indexer] = arr
    ~~~~~~^^^^^^^^^^^^
ValueError: could not convert string to float: 'Braund, Mr. Owen Harris'

혹시 numeric_only 라는 옵션명을 잊어버렸을 경우, 아래와 같이 help 명령을 이용하여 사용 예시 코드를 확인한다.


			
			
			
		
import pandas as pd
 
help(pd.DataFrame.corr)


			
			
			
		
Help on function corr in module pandas.core.frame:
 
corr(self, method: 'CorrelationMethod' = 'pearson', min_periods: 'int' = 1, numeric_only: 'bool' = False) -> 'DataFrame'
    Compute pairwise correlation of columns, excluding NA/null values.
 
    Parameters
    ----------
    method : {'pearson', 'kendall', 'spearman'} or callable
        Method of correlation:
 
        * pearson : standard correlation coefficient
        * kendall : Kendall Tau correlation coefficient
        * spearman : Spearman rank correlation
        * callable: callable with input two 1d ndarrays
            and returning a float. Note that the returned matrix from corr
            will have 1 along the diagonals and will be symmetric
            regardless of the callable's behavior.
    min_periods : int, optional
        Minimum number of observations required per pair of columns
        to have a valid result. Currently only available for Pearson
        and Spearman correlation.
    numeric_only : bool, default False
        Include only `float`, `int` or `boolean` data.
 
        .. versionadded:: 1.5.0
 
        .. versionchanged:: 2.0.0
            The default value of ``numeric_only`` is now ``False``.
 
    Returns
    -------
    DataFrame
        Correlation matrix.
 
    See Also
    --------
    DataFrame.corrwith : Compute pairwise correlation with another
        DataFrame or Series.
    Series.corr : Compute the correlation between two Series.
 
    Notes
    -----
    Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.
 
    * `Pearson correlation coefficient <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
    * `Kendall rank correlation coefficient <https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient>`_
    * `Spearman's rank correlation coefficient <https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient>`_
 
    Examples
    --------
    >>> def histogram_intersection(a, b):
    ...     v = np.minimum(a, b).sum().round(decimals=1)
    ...     return v
    >>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
    ...                   columns=['dogs', 'cats'])
    >>> df.corr(method=histogram_intersection)
          dogs  cats
    dogs   1.0   0.3
    cats   0.3   1.0
 
    >>> df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],
    ...                   columns=['dogs', 'cats'])
    >>> df.corr(min_periods=3)
          dogs  cats
    dogs   1.0   NaN
    cats   NaN   1.0

참고 사이트

pandas.DataFrame.corr — pandas 2.2.3 documentation

Include only float, int or boolean data. Changed in version 2.0.0: The default value of numeric_only is now False.

pandas.pydata.org

728x90

저작자표시 비영리 변경금지

'Certificate > 빅데이터분석기사' 카테고리의 다른 글

[빅데이터분석기사 실기] 제3유형: 가설 검정 연습 문제 (0)	2024.11.27
[빅데이터분석기사 실기] 제2유형 시험 준비 (0)	2024.11.26
[빅데이터분석기사 실기] 제1유형 시험 준비 (0)	2024.11.25
[빅데이터분석기사 실기] help(), dir() 활용하기 (0)	2024.11.25
[빅데이터분석기사 실기] 시험장 들어가기 전에 보기 빠르게 보기 좋은 강의 모음 (1)	2024.11.17
[빅데이터분석기사 실기] 제6회 기출 변형 문제 (제3유형) (0)	2024.11.16
[빅데이터분석기사 실기] 제7회 기출 변형 문제 (제3유형) (0)	2024.11.16
[빅데이터분석기사 실기] 제8회 기출 변형 문제 (제3유형) (2)	2024.11.15

Per ardua ad astra."Hello, World!" 🤖

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

Per ardua ad astra.

"Hello, W

[빅데이터분석기사 실기] corr() 함수와 numeric_only 옵션

corr() 함수와 numeric_only 옵션

들어가며

설명

참고 사이트

'Certificate > 빅데이터분석기사' 카테고리의 다른 글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

	import pandas as pd

	df = pd.read_csv("data/Titanic.csv")

	corr_table = df.corr(numeric_only=True) # numeric_only=True 옵션 지정
	print(corr_table)

	PassengerId Survived Pclass ... SibSp Parch Fare
	PassengerId 1.000000 -0.005007 -0.035144 ... -0.057527 -0.001652 0.012658
	Survived -0.005007 1.000000 -0.338481 ... -0.035322 0.081629 0.257307
	Pclass -0.035144 -0.338481 1.000000 ... 0.083081 0.018443 -0.549500
	Age 0.036847 -0.077221 -0.369226 ... -0.308247 -0.189119 0.096067
	SibSp -0.057527 -0.035322 0.083081 ... 1.000000 0.414838 0.159651
	Parch -0.001652 0.081629 0.018443 ... 0.414838 1.000000 0.216225
	Fare 0.012658 0.257307 -0.549500 ... 0.159651 0.216225 1.000000

	[7 rows x 7 columns]

	Makefile:6: recipe for target 'py3_run' failed
	make: *** [py3_run] Error 1
	Traceback (most recent call last):
	File "/goorm/Main.out", line 11, in <module>
	corr_table = df.corr()
	^^^^^^^^^
	File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 11049, in corr
	mat = data.to_numpy(dtype=float, na_value=np.nan, copy=False)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.12/site-packages/pandas/core/frame.py", line 1993, in to_numpy
	result = self._mgr.as_array(dtype=dtype, copy=copy, na_value=na_value)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 1694, in as_array
	arr = self._interleave(dtype=dtype, na_value=na_value)
	^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
	File "/usr/local/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 1753, in _interleave
	result[rl.indexer] = arr
	~~~~~~^^^^^^^^^^^^
	ValueError: could not convert string to float: 'Braund, Mr. Owen Harris'

	Help on function corr in module pandas.core.frame:

	corr(self, method: 'CorrelationMethod' = 'pearson', min_periods: 'int' = 1, numeric_only: 'bool' = False) -> 'DataFrame'
	Compute pairwise correlation of columns, excluding NA/null values.

	Parameters
	----------
	method : {'pearson', 'kendall', 'spearman'} or callable
	Method of correlation:

	* pearson : standard correlation coefficient
	* kendall : Kendall Tau correlation coefficient
	* spearman : Spearman rank correlation
	* callable: callable with input two 1d ndarrays
	and returning a float. Note that the returned matrix from corr
	will have 1 along the diagonals and will be symmetric
	regardless of the callable's behavior.
	min_periods : int, optional
	Minimum number of observations required per pair of columns
	to have a valid result. Currently only available for Pearson
	and Spearman correlation.
	numeric_only : bool, default False
	Include only `float`, `int` or `boolean` data.

	.. versionadded:: 1.5.0

	.. versionchanged:: 2.0.0
	The default value of ``numeric_only`` is now ``False``.

	Returns
	-------
	DataFrame
	Correlation matrix.

	See Also
	--------
	DataFrame.corrwith : Compute pairwise correlation with another
	DataFrame or Series.
	Series.corr : Compute the correlation between two Series.

	Notes
	-----
	Pearson, Kendall and Spearman correlation are currently computed using pairwise complete observations.

	* `Pearson correlation coefficient <https://en.wikipedia.org/wiki/Pearson_correlation_coefficient>`_
	* `Kendall rank correlation coefficient <https://en.wikipedia.org/wiki/Kendall_rank_correlation_coefficient>`_
	* `Spearman's rank correlation coefficient <https://en.wikipedia.org/wiki/Spearman%27s_rank_correlation_coefficient>`_

	Examples
	--------
	>>> def histogram_intersection(a, b):
	... v = np.minimum(a, b).sum().round(decimals=1)
	... return v
	>>> df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
	... columns=['dogs', 'cats'])
	>>> df.corr(method=histogram_intersection)
	dogs cats
	dogs 1.0 0.3
	cats 0.3 1.0

	>>> df = pd.DataFrame([(1, 1), (2, np.nan), (np.nan, 3), (4, 4)],
	... columns=['dogs', 'cats'])
	>>> df.corr(min_periods=3)
	dogs cats
	dogs 1.0 NaN
	cats NaN 1.0