Pandas Essentials: Complete Reference Guide
This guide provides a clean, original overview of the most important Pandas functions used for data loading, transformation, analysis, and visualization. Ideal for Python developers, data engineers, and machine learning practitioners.
1. Data Loading & Saving
- pd.read_csv() – Import CSV files
- pd.read_excel() – Load Excel spreadsheets
- pd.read_json() – Read JSON data
- pd.read_sql(query, con) – Fetch data from SQL
- pd.read_html() – Extract tables from HTML
- pd.read_clipboard() – Load clipboard content
- df.to_csv() – Export to CSV
- df.to_excel() – Export to Excel
- df.to_json() – Convert to JSON
- df.to_sql() – Write to SQL table
- df.to_clipboard() – Copy DataFrame to clipboard
- df.to_markdown() – Export as Markdown
- df.to_latex() – Export as LaTeX
- df.to_html() – Export as HTML
2. Inspecting DataFrames
- df.head() – View first rows
- df.tail() – View last rows
- df.info() – Summary of structure
- df.describe() – Statistical summary
- df.dtypes – Column data types
- df.columns – Column names
- df.index – Index values
- df.axes – Row and column labels
- df.shape – Dimensions
- df.memory_usage() – Memory usage
- df.size – Total elements
- df.empty – Check if empty
3. Selecting & Indexing
- df["col"] – Select a column
- df[["col1","col2"]] – Select multiple columns
- df.loc[] – Label-based selection
- df.iloc[] – Position-based selection
- df.at[] – Fast scalar access (label)
- df.iat[] – Fast scalar access (position)
- df.where() – Keep values matching condition
- df.mask() – Replace values matching condition
- df.query() – SQL-like filtering
- df.take() – Select rows by index
4. Modifying Data
- df.assign() – Add or modify columns
- df.insert() – Insert new column
- df.update() – Update values from another DataFrame
- df.drop() – Remove rows or columns
- df.rename() – Rename labels
- df.replace() – Replace values
- df.eval() – Evaluate expressions
5. Handling Missing Data
- df.isna() – Detect missing values
- df.notna() – Opposite of isna
- df.fillna() – Fill missing values
- df.dropna() – Remove missing values
- df.interpolate() – Interpolate values
6. Sorting & Ranking
- df.sort_values() – Sort by values
- df.sort_index() – Sort by index
- df.rank() – Rank values
- df.nlargest() – Largest N values
- df.nsmallest() – Smallest N values
7. Aggregation & Statistics
- df.min(), df.max() – Min/Max
- df.sum(), df.mean() – Sum/Mean
- df.median() – Median
- df.mode() – Mode
- df.std(), df.var() – Std/Variance
- df.count() – Count non-null
- df.cumsum() – Cumulative sum
- df.cumprod() – Cumulative product
- df.cummin(), df.cummax() – Cumulative min/max
- df.any(), df.all() – Boolean checks
8. Grouping & Window Functions
- df.groupby() – Group data
- df.agg() – Aggregations
- df.transform() – Transform values
- df.ngroup() – Group numbers
- df.size() – Group size
- df.rolling() – Rolling window
- df.expanding() – Expanding window
9. String Operations
- str.upper(), str.lower() – Case conversion
- str.len() – Length
- str.strip() – Trim spaces
- str.split() – Split text
- str.get() – Extract index
- str.contains() – Substring check
- str.replace() – Replace text
- str.startswith(), str.endswith() – Start/End check
- str.extract() – Regex extraction
10. Categorical Data
- astype("category") – Convert to category
- cat.categories – List categories
- cat.codes – Category codes
- cat.add_categories() – Add category
- cat.remove_unused_categories() – Clean categories
11. Indexing & Reindexing
- df.set_index() – Set index
- df.reset_index() – Reset index
- df.reindex() – Align to new index
- df.set_axis() – Rename axis
- df.swaplevel() – Swap MultiIndex levels
- df.sort_index() – Sort index
- df.reorder_levels() – Reorder MultiIndex
12. MultiIndex Tools
- pd.MultiIndex.from_tuples() – Create MultiIndex
- df.xs() – Cross-section
- df.stack() – Columns to rows
- df.unstack() – Rows to columns
13. Time Series Tools
- pd.to_datetime() – Convert to datetime
- .dt.year, .dt.month, .dt.day – Extract components
- .dt.weekday – Day of week
- .dt.is_month_end – Month-end flag
- .dt.is_leap_year – Leap year flag
- df.resample() – Resample by time
- df.asfreq() – Change frequency
- df.shift() – Shift values
- df.diff() – Row difference
- df.pct_change() – Percent change
14. Reshaping & Combining
- df.melt() – Unpivot
- df.pivot() – Pivot
- df.pivot_table() – Pivot with aggregation
- df.concat() – Concatenate
- df.merge() – SQL-style merge
- df.join() – Join on index
- df.add(), df.sub(), df.mul(), df.div() – Arithmetic
- df.combine_first() – Fill missing from another DataFrame
15. Apply & Map
- df.apply() – Apply function across axis
- df.applymap() – Apply function to each cell
- df.map() – Map values in Series
16. Visualization
- df.plot() – Line plot
- df.plot.bar() – Bar chart
- df.plot.hist() – Histogram
- df.plot.box() – Box plot
- df.plot.area() – Area chart
- df.plot.scatter() – Scatter plot
Comments
Post a Comment