Python - Pandas Essentials: A Clean, Original Reference Guide

Pandas Essentials: Complete Reference Guide

This guide provides a clean, original overview of the most important Pandas functions used for data loading, transformation, analysis, and visualization. Ideal for Python developers, data engineers, and machine learning practitioners.

1. Data Loading & Saving

pd.read_csv() – Import CSV files
pd.read_excel() – Load Excel spreadsheets
pd.read_json() – Read JSON data
pd.read_sql(query, con) – Fetch data from SQL
pd.read_html() – Extract tables from HTML
pd.read_clipboard() – Load clipboard content
df.to_csv() – Export to CSV
df.to_excel() – Export to Excel
df.to_json() – Convert to JSON
df.to_sql() – Write to SQL table
df.to_clipboard() – Copy DataFrame to clipboard
df.to_markdown() – Export as Markdown
df.to_latex() – Export as LaTeX
df.to_html() – Export as HTML

2. Inspecting DataFrames

df.head() – View first rows
df.tail() – View last rows
df.info() – Summary of structure
df.describe() – Statistical summary
df.dtypes – Column data types
df.columns – Column names
df.index – Index values
df.axes – Row and column labels
df.shape – Dimensions
df.memory_usage() – Memory usage
df.size – Total elements
df.empty – Check if empty

3. Selecting & Indexing

df["col"] – Select a column
df[["col1","col2"]] – Select multiple columns
df.loc[] – Label-based selection
df.iloc[] – Position-based selection
df.at[] – Fast scalar access (label)
df.iat[] – Fast scalar access (position)
df.where() – Keep values matching condition
df.mask() – Replace values matching condition
df.query() – SQL-like filtering
df.take() – Select rows by index

4. Modifying Data

df.assign() – Add or modify columns
df.insert() – Insert new column
df.update() – Update values from another DataFrame
df.drop() – Remove rows or columns
df.rename() – Rename labels
df.replace() – Replace values
df.eval() – Evaluate expressions

5. Handling Missing Data

df.isna() – Detect missing values
df.notna() – Opposite of isna
df.fillna() – Fill missing values
df.dropna() – Remove missing values
df.interpolate() – Interpolate values

6. Sorting & Ranking

df.sort_values() – Sort by values
df.sort_index() – Sort by index
df.rank() – Rank values
df.nlargest() – Largest N values
df.nsmallest() – Smallest N values

7. Aggregation & Statistics

df.min(), df.max() – Min/Max
df.sum(), df.mean() – Sum/Mean
df.median() – Median
df.mode() – Mode
df.std(), df.var() – Std/Variance
df.count() – Count non-null
df.cumsum() – Cumulative sum
df.cumprod() – Cumulative product
df.cummin(), df.cummax() – Cumulative min/max
df.any(), df.all() – Boolean checks

8. Grouping & Window Functions

df.groupby() – Group data
df.agg() – Aggregations
df.transform() – Transform values
df.ngroup() – Group numbers
df.size() – Group size
df.rolling() – Rolling window
df.expanding() – Expanding window

9. String Operations

str.upper(), str.lower() – Case conversion
str.len() – Length
str.strip() – Trim spaces
str.split() – Split text
str.get() – Extract index
str.contains() – Substring check
str.replace() – Replace text
str.startswith(), str.endswith() – Start/End check
str.extract() – Regex extraction

10. Categorical Data

astype("category") – Convert to category
cat.categories – List categories
cat.codes – Category codes
cat.add_categories() – Add category
cat.remove_unused_categories() – Clean categories

11. Indexing & Reindexing

df.set_index() – Set index
df.reset_index() – Reset index
df.reindex() – Align to new index
df.set_axis() – Rename axis
df.swaplevel() – Swap MultiIndex levels
df.sort_index() – Sort index
df.reorder_levels() – Reorder MultiIndex

12. MultiIndex Tools

pd.MultiIndex.from_tuples() – Create MultiIndex
df.xs() – Cross-section
df.stack() – Columns to rows
df.unstack() – Rows to columns

13. Time Series Tools

pd.to_datetime() – Convert to datetime
.dt.year, .dt.month, .dt.day – Extract components
.dt.weekday – Day of week
.dt.is_month_end – Month-end flag
.dt.is_leap_year – Leap year flag
df.resample() – Resample by time
df.asfreq() – Change frequency
df.shift() – Shift values
df.diff() – Row difference
df.pct_change() – Percent change

14. Reshaping & Combining

df.melt() – Unpivot
df.pivot() – Pivot
df.pivot_table() – Pivot with aggregation
df.concat() – Concatenate
df.merge() – SQL-style merge
df.join() – Join on index
df.add(), df.sub(), df.mul(), df.div() – Arithmetic
df.combine_first() – Fill missing from another DataFrame

15. Apply & Map

df.apply() – Apply function across axis
df.applymap() – Apply function to each cell
df.map() – Map values in Series

16. Visualization

df.plot() – Line plot
df.plot.bar() – Bar chart
df.plot.hist() – Histogram
df.plot.box() – Box plot
df.plot.area() – Area chart
df.plot.scatter() – Scatter plot

Enterprise AI & Cloud Architecture

Search This Blog