DEV Community

Cover image for Step-by-Step with Pandas: Basic Operations to Intermediate Mastery 🐍🐼
Anand
Anand

Posted on

Step-by-Step with Pandas: Basic Operations to Intermediate Mastery 🐍🐼

Pandas is a powerful and flexible data manipulation library for Python. It provides data structures like Series (one-dimensional) and DataFrame (two-dimensional) for working with structured data efficiently. Here, I'll cover some basic and intermediate advanced concepts in Pandas.

description

Basic Concepts

  1. Series:
    • A one-dimensional array-like object containing a sequence of values and an associated array of data labels, called its index.
   import pandas as pd
   s = pd.Series([1, 3, 5, 6, 8])
   print(s)
Enter fullscreen mode Exit fullscreen mode
  1. DataFrame:
    • A two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns).
   df = pd.DataFrame({
       'A': [1, 2, 3],
       'B': [4, 5, 6],
       'C': [7, 8, 9]
   })
   print(df)
Enter fullscreen mode Exit fullscreen mode
  1. Reading and Writing Data:

    • Reading data from CSV:
     df = pd.read_csv('data.csv')
    
  • Writing data to CSV:

     df.to_csv('output.csv', index=False)
    
  1. Indexing and Selection:

    • Selecting a column:
     df['A']
    
  • Selecting multiple columns:

     df[['A', 'B']]
    
  • Selecting rows by index:

     df.iloc[0]  # First row
     df.loc[0]  # Row with index 0
    
  1. Data Cleaning:

    • Handling missing values:
     df.dropna()  # Drop rows with missing values
     df.fillna(0)  # Replace missing values with 0
    

Intermediate Concepts

  1. GroupBy:

    • Grouping data and performing aggregate functions.
     grouped = df.groupby('A')
     grouped.mean()
    
  2. Merging and Joining:

    • Combining DataFrames using merge and join operations.
     df1 = pd.DataFrame({'key': ['A', 'B', 'C'], 'value': [1, 2, 3]})
     df2 = pd.DataFrame({'key': ['B', 'C', 'D'], 'value': [4, 5, 6]})
     merged = pd.merge(df1, df2, on='key', how='inner')
     print(merged)
    
  3. Pivot Tables:

    • Creating pivot tables to summarize data.
     df.pivot_table(values='value', index='key', columns='category', aggfunc='sum')
    
  4. Applying Functions:

    • Applying custom functions to DataFrames.
     df['new_column'] = df['A'].apply(lambda x: x * 2)
    
  5. Reshaping Data:

    • Melting and pivoting DataFrames to reshape data.
     melted = pd.melt(df, id_vars=['A'], value_vars=['B', 'C'])
     pivoted = melted.pivot(index='A', columns='variable', values='value')
    
  6. Time Series:

    • Handling and manipulating time series data.
     df['date'] = pd.to_datetime(df['date'])
     df.set_index('date', inplace=True)
     df.resample('M').mean()
    
  7. Handling Duplicate Data:

    • Removing or handling duplicate rows in DataFrames.
     df.drop_duplicates()
    
  8. Advanced Indexing:

    • Using hierarchical indexing for multi-level data.
     arrays = [np.array(['bar', 'bar', 'baz', 'baz']),
               np.array(['one', 'two', 'one', 'two'])]
     df = pd.DataFrame(np.random.randn(4, 2), index=arrays, columns=['A', 'B'])
    
  9. Performance Optimization:

    • Using techniques like vectorization, avoiding loops, and using efficient data structures to improve performance.

Conclusion

Mastering Pandas is essential for anyone involved in data analysis and manipulation. By understanding the basics such as Series and DataFrames, indexing, and data cleaning, you build a solid foundation. Progressing to intermediate concepts like GroupBy operations, merging DataFrames, pivot tables, and time series analysis allows you to handle more complex data tasks efficiently. Leveraging these skills not only enhances your ability to analyze data but also optimizes your workflow, making you a more effective and proficient data professional. With Pandas, you can unlock powerful capabilities to turn raw data into actionable insights.


About Me:
πŸ–‡οΈLinkedIn
πŸ§‘β€πŸ’»GitHub

Top comments (0)