Pandas module is a well-liked open-source tool for handling and analyzing data. It offers effective tools for manipulating tabular data, such as the ability to read and write data in a variety of formats, clean and transform data, choose and filter data based on a variety of criteria, aggregate and summarize data, and visualize data. This blog will tell you few capabilities about pandas. To explore more you can check out pandas official documentation.

Install Pandas

pip install pandas

Load File

Pandas supports different types of file formats including XML, HTML, JSON, XLSX, CSV, ZIP, TXT.


  import pandas as pd
  dataframe = pd.read_csv('filename.csv') # Loading data from a CSV file
  dataframe = pd.read_excel('filename.xlsx') # Loading data from an Excel file
  dataframe = pd.read_json('filename.json') # Loading data from a Json file

Viewing Data

Use the head method to view the first few rows of the DataFrame, and the tail method to view the last few rows.


  dataframe.head() # Displays the first 5 rows
  dataframe.head(10) # Displays the first 10 rows
  dataframe.tail() # Displays the last 5 rows
  dataframe.tail(10) # Displays the last 10 rows
  dataframe.nlargest(2, 'column_name') # Top n rows with the largest values
  dataframe.nsmallest(2, 'column_name') # Top n rows with the smallest values
  dataframe.info() # Displays the summary of the dataframe
  dataframe.describe() # Generates descriptive statistics

Additional methods:

dataframe.columns - View column names
dataframe.index - View index name
dataframe['column_name'].value_counts() - Count unique occurrences
dataframe['column_name'].tolist() - List column values

Data Selection

Use loc and iloc methods to select specific columns and rows:


  dataframe['column_name'] # Selecting a single column
  dataframe[['column1', 'column2']] # Selecting multiple columns
  dataframe.loc[row_index, 'column_name'] # Label-based selection
  dataframe.iloc[row_index, column_index] # Index-based selection

Data Manipulation

Pandas provides powerful methods for data manipulation:


  dataframe['new_column'] = dataframe['column1'] + dataframe['column2'] # Create new column
  dataframe.drop(['column1', 'column2'], axis=1, inplace=True) # Drop columns
  dataframe.rename(columns={'old_name': 'new_name'}, inplace=True) # Rename columns
  dataframe.replace(to_replace='old_value', value='new_value', inplace=True) # Replace values

Filtering Data


  dataframe[dataframe['column_name'] > value] # Filter by condition
  dataframe[dataframe['column_name'].isin(['value1', 'value2'])] # Filter by list of values

Grouping Data


  dataframe.groupby('category')['value'].mean() # Group by category and calculate mean

Sorting Data


  dataframe.sort_values('column_name', ascending=True) # Sort by column

Merge, Concat, and Join


  dataframe1.merge(dataframe2, on='column_name', how='inner') # Inner join
  pd.concat([dataframe1, dataframe2], axis=0) # Concatenate

String Operations


  dataframe['column_name'].str.lower() # Convert text to lowercase

Reshaping Data


  dataframe.pivot_table(values='value', index='index_column', columns='column_name') # Pivot table

Handling Time Series Data


  dataframe['date_column'] = pd.to_datetime(dataframe['date_column']) # Convert to datetime

In conclusion, pandas is a powerful library for data processing and analysis in Python. It provides a wide range of capabilities for working with tabular data effectively.