24 things to know about pandas series

24 things to know about pandas series

To get start with pandas, you need to get confortable with its two data structures: series and dataframe. In this blog, we will talk about multiple applications of series.

1. Creating a series

pandas.Series(data=None, index=None, dtype=None, name=None, copy=False, fastpath=False)

import pandas as pd

# method 1: from a list
ser1 = pd.Series(data=[5, -7, 0])
# method 2: from a dict
mdata = {'Ohio': 35000, 'Texas': 70000, 'Utah': 5000}
ser2 = pd.Series(data=mdata)

20200419-creation-series

2. index

Series.index

20200419-index

3. Indexing & selection

Series indexing (ser2[...]) works analogously to NumPy array indexing, except you can use the Series’s index values instead of only integers.

Moreover, you can also select a subset of the rows from a series with NumPy-like notation using either axis labels (loc) or integers (iloc).

20200419-idx-1

The selection syntax ser2[:2] is provided as a convenience. Passing a single element or a list to the [] operator selects columns.

Slicing with labels like ser2['Ohio':'Texas'] behaves differently than normal Python slicing in that the end-point is inclusive.

20200419-idx-2

4. Hierarchical indexing

Hierarchical indexing is an important feature of pandas that enables you to have multiple (two or more) index levels on an axis. It provides a way for working with higher dimensional data in a lower dimensional form.

data = pd.Series(range(6),
                 index=[['a', 'a', 'a', 'b', 'b', 'c'],
                        [1, 2, 3, 1, 2, 3]])

20200419-hierarchical-idx

With a hierarchically indexed object, so-called partial indexing is possible, enabling you to concisely select subsets of the data:

20200419-hierarchical-idx-2

Hierarchical indexing plays an important role in reshaping data and group-based operations like forming a pivot table. For instance, you can rearrange the dataset into a dataframe with unstack method:

20200419-unstack

5. reindex

An important method on pandas objects is reindex, which means to create a new object with the data consormed to a new index.

Series.reindex(self, index=None, **kwargs)

ser1.reindex([2, 0, 1])

20200419-reindex

6. drop

drop method returns a new object with the indicated value or values deleted from an axis.

Series.drop(self, labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')

ser1.drop(1)

20200419-drop

7. Arithmetic

Using NumPy functions or NumPy-like operations, such as filtering with a boolean array, scalar multiplication, or applying math functions, will preserve the index-value link:

20200419-arithmetic-1

When you are adding together objects, if any index pairs are not the same, the respective index in the result will be the union of the index pairs.

20200419-arithmetic-2

8. in

To see if the value is one of the series’ index.

20200419-in

9. isnull

Detect missing values for an array-like object.

pandas.isnull(obj)

20200419-isnull

10. sorting

Sort Series by index labels.
Series.sort_index(self, axis=0, level=None, ascending=True, inplace=False, kind='quicksort', na_position='last', sort_remaining=True, ignore_index: bool = False)

Sort by the values.
Series.sort_values(self, axis=0, ascending=True, inplace=False, kind='quicksort', na_position='last', ignore_index=False)

20200419-sorting

11. ranking

Compute numerical data ranks (1 through n) along axis.
Series.rank(self: ~FrameOrSeries, axis=0, method: str = 'average', numeric_only: Union[bool, NoneType] = None, na_option: str = 'keep', ascending: bool = True, pct: bool = False)

method:{‘average’, ‘min’, ‘max’, ‘first’, ‘dense’}, default ‘average’. How to rank the group of records that have the same value:

  • average: average rank of the group
  • min: lowest rank in the group
  • max: highest rank in the group
  • first: ranks assigned in order they appear in the array
  • dense: like ‘min’, but rank always increases by 1 between groups.

20200419-ranking

12. is_unique

Return boolean if values in the object are unique.
Series.is_unique

20200419-is-unique

13. isin()

Check whether values are contained in Series.
Series.isin(self, values)

20200419-isin

14. Computing descriptive statistics

20200419-stat

15. unique values

Return unique values of Series object.
Series.unique(self)

Return number of unique elements in the object.
Series.nunique(self, dropna=True)

20200419-unique

16. value_counts

Return a Series containing counts of unique values.
Series.value_counts(self, normalize=False, sort=True, ascending=False, bins=None, dropna=True)

20200419-value-counts

17. Filtering out missing data

Return a new Series with missing values removed.
Series.dropna(self, axis=0, inplace=False, how=None)

20200419-dropna

Detect existing (non-missing) values.
Series.notnull(self)

20200419-notnull

18. Filling in missing data

Fill NA/NaN values using the specified method.
Series.fillna(self, value=None, method=None, axis=None, inplace=False, limit=None, downcast=None)

  • method: {‘backfill’, ‘bfill’, ‘pad’, ‘ffill’, None}, default None. Method to use for filling holes in reindexed Series.
    • pad / ffill: propagate last valid observation forward to next valid
    • backfill / bfill: use next valid observation to fill gap

20200419-fillna

19. Removing duplicates

Indicate duplicate Series values.
Series.duplicated(self, keep='first')

  • keep: {‘first’, ‘last’, False}, default ‘first’. Method to handle dropping duplicates:
    • first : Mark duplicates as True except for the first occurrence.
    • last : Mark duplicates as True except for the last occurrence.
    • False : Mark all duplicates as True.

20200419-duplicated

Return Series with duplicate values removed.
Series.drop_duplicates(self, keep='first', inplace=False)[source]

20200419-drop_duplicates

20. map

Map values of Series according to input correspondence.
Used for substituting each value in a Series with another value, that may be derived from a function, a dict or a Series.
Series.map(self, arg, na_action=None)

20200419-map

21. apply

Invoke function on values of Series.
Can be ufunc (a NumPy function that applies to the entire Series) or a Python function that only works on single values.

Series.apply(self, func, convert_dtype=True, args=(), **kwds)

20200419-apply

22. replace

Replace values given in to_replace with value.
Series.replace(self, to_replace=None, value=None, inplace=False, limit=None, regex=False, method='pad')

20200419-replace

23. concat

Concatenate pandas objects along a particular axis with optional set logic along the other axes.
pandas.concat(objs, Mapping, axis=0, join='outer', ignore_index=False, keys=None, levels=None, names=None, verify_integrity=False, sort=False, copy=True)

20200419-concat-1

20200419-concat-2

24. numpy.where()

Return elements chosen from x or y depending on condition.
numpy.where(condition[, x, y])

20200419-np-where

Reference