`pandas`

provides high-level data structures and functions designed to
make working with structured or tabular data fast, easy and expressive.
In this blog I’ll introduce 2 workhorse data structures: *Series* and *DataFrame*.

## Series

A Series is a one-dimensional array-like object containing a sequence of values
and an associated array of data labels, called its *index*.

The simplest way to form a series is:

The string representation of a Series displayed interactively shows the index on
the left and the values on the right. If you want to specify index, using `index`

parameter in `pd.Series()`

will be helpful. You can get the array representation
and index object of the Series via its values and index attributes, respectively:

Compared with NumPy arrays, you can use labels in the index when selecting single values or a set of values:

Another way to think about a Series is as a fixed-length, ordered dict, as it is a mapping of index values to data values. You should have data contained in a Python dict, you can create a Series from it by passing the dict:

Both the Series object itself and its index have a `name`

attribute, which
integrated with other key areas of pandas functionality:

A Serie’s index can be altered in-place by assignment:

## DataFrame

A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.). There are many ways to construct a DataFrame, though one of the most common is from a dict of equal-length lists or NumPy arrays:

The `columns`

parameter is used to arrange the columns in that order. If not,
they are placed in sorted order.

A column in a DataFrame can be retrieved as a Series either by dict-like notation or by attribute:

Columns can be modified by assignment. For example, the empty ‘debt’ column could be assigned a scalar value or an array of values

Assigning a column that doesn’t exist will create a new column. The `del`

keyword will delete columns as with a dict.

Dicts of Series are treated in much the same way:

As with Series, the `values`

attribute returns the data contained in the
DataFrame as a two-dimensional ndarray:

### Notes

`df[column]`

works for any column name, but`df.column`

only works when the column name is a valid Python variable name.

### Caution

- New columns can be created with the
`df[column]`

, but CANNOT be created with`df.column`

syntax. - The column returned from indexing a DataFrame is a
**view**on the underlying data,**not a copy**. Thus, any in-place modifications to the Series will be reflected in the DataFrame. The column can be explicitly copied with the Series’s`copy`

method.

## Reference

[1] Wes McKinney. 2017. “Chapter 5 Getting Started with pandas” *Python for Data
Analysis DATA WRANGLING WITH PANDAS, NUMPY, AND IPYTHON* p 124-136