Pandas module is a well-liked open-source tool for handling and analyzing data. It offers effective tools for manipulating tabular data, such as the ability to read and write data in a variety of formats, clean and transform data, choose and filter data based on a variety of criteria, aggregate and summarize data, and visualize data. This blog will tell you few capabilities about pandas. To explore more you can check out pandas official documentation.
Install Pandas
pip install pandas
Load File
Pandas supports different types of file formats including XML, HTML, JSON, XLSX, CSV, ZIP, TXT.
import pandas as pd
dataframe = pd.read_csv('filename.csv') # Loading data from a CSV file
dataframe = pd.read_excel('filename.xlsx') # Loading data from an Excel file
dataframe = pd.read_json('filename.json') # Loading data from a Json file
Viewing Data
Use the head method to view the first few rows of the DataFrame, and the tail method to view the last few rows.
dataframe.head() # Displays the first 5 rows
dataframe.head(10) # Displays the first 10 rows
dataframe.tail() # Displays the last 5 rows
dataframe.tail(10) # Displays the last 10 rows
dataframe.nlargest(2, 'column_name') # Top n rows with the largest values
dataframe.nsmallest(2, 'column_name') # Top n rows with the smallest values
dataframe.info() # Displays the summary of the dataframe
dataframe.describe() # Generates descriptive statistics
Additional methods:
dataframe.columns- View column namesdataframe.index- View index namedataframe['column_name'].value_counts()- Count unique occurrencesdataframe['column_name'].tolist()- List column values
Data Selection
Use loc and iloc methods to select specific columns and rows:
dataframe['column_name'] # Selecting a single column
dataframe[['column1', 'column2']] # Selecting multiple columns
dataframe.loc[row_index, 'column_name'] # Label-based selection
dataframe.iloc[row_index, column_index] # Index-based selection
Data Manipulation
Pandas provides powerful methods for data manipulation:
dataframe['new_column'] = dataframe['column1'] + dataframe['column2'] # Create new column
dataframe.drop(['column1', 'column2'], axis=1, inplace=True) # Drop columns
dataframe.rename(columns={'old_name': 'new_name'}, inplace=True) # Rename columns
dataframe.replace(to_replace='old_value', value='new_value', inplace=True) # Replace values
Filtering Data
dataframe[dataframe['column_name'] > value] # Filter by condition
dataframe[dataframe['column_name'].isin(['value1', 'value2'])] # Filter by list of values
Grouping Data
dataframe.groupby('category')['value'].mean() # Group by category and calculate mean
Sorting Data
dataframe.sort_values('column_name', ascending=True) # Sort by column
Merge, Concat, and Join
dataframe1.merge(dataframe2, on='column_name', how='inner') # Inner join
pd.concat([dataframe1, dataframe2], axis=0) # Concatenate
String Operations
dataframe['column_name'].str.lower() # Convert text to lowercase
Reshaping Data
dataframe.pivot_table(values='value', index='index_column', columns='column_name') # Pivot table
Handling Time Series Data
dataframe['date_column'] = pd.to_datetime(dataframe['date_column']) # Convert to datetime
In conclusion, pandas is a powerful library for data processing and analysis in Python. It provides a wide range of capabilities for working with tabular data effectively.
