PIQ : 101-150

🛠️ Data Engineering and Wrangling

How do you create a DataFrame from nested dictionaries or JSON structures?
How would you split a column with delimited strings (e.g., "a,b,c") into separate columns?
How do you conditionally update values in one column based on another column’s values?
What does cut() vs qcut() do in Pandas? How are they different?
How do you sample a fraction of a DataFrame while maintaining stratification (proportions)?

⚙️ Pandas Internals and Indexing

What is the difference between a shallow copy and a deep copy in Pandas?
How does Pandas handle operations with mismatched indexes between Series or DataFrames?
How can you reindex a DataFrame to align with a new index, and how do you handle missing rows?
How do you convert a MultiIndex into a flat DataFrame with normal columns?
How does Pandas handle arithmetic operations when null values are involved?

📏 Data Summary & Quality Assurance

How would you summarize the distribution of numeric data in a DataFrame?
How can you identify duplicate columns in a DataFrame?
What’s the best way to identify columns with constant or low variance?
How can you check if a column is monotonically increasing or decreasing?
How would you validate if a DataFrame is sorted by a column (or multiple columns)?

🚀 Performance Optimization

How can you speed up large file ingestion in Pandas? Mention specific parameters.
What techniques can you use to reduce memory usage in large DataFrames?
How would you vectorize a function that was previously applied via apply() or loops?
When should you consider using dask.dataframe or polars instead of Pandas?
How do you chunk a large file and process it iteratively in Pandas without loading the full dataset into memory?

📊 Exploratory Data Analysis (EDA)

21 . How do you get the number of unique values across all columns in a DataFrame?

How can you display summary statistics grouped by a categorical feature?
How do you find the most frequent value in each column of a DataFrame?
How do you count the number of missing values per column and per row?
How would you use corr() to explore relationships between numerical variables?

🔄 Transformation & Feature Engineering

How do you perform one-hot encoding using Pandas without using sklearn?
How can you generate lag features for a time series DataFrame using Pandas?
How do you apply a different function to each column using apply() or agg()?
How do you handle mixed data types in a single column (e.g., strings and numbers)?
How do you normalize values in each row to sum to 1?

🗃 MultiIndex and Hierarchical Data

How do you create a hierarchical index from multiple columns?
How do you swap levels in a MultiIndex DataFrame?
How do you filter rows from a MultiIndex DataFrame using a condition on one level?
How do you unstack and restack MultiIndex levels?
What are the advantages and pitfalls of using MultiIndex structures in Pandas?

🔗 Integration & Interoperability

How can you pass a DataFrame directly to a SQL database using Pandas?
How do you load data from an API returning JSON using pd.json_normalize()?
How do you export a DataFrame with datetime columns to Excel and preserve formatting?
How can you compare Pandas and Polars in terms of performance and syntax?
When would you use .values, .to_numpy(), or .to_list() — and what are the trade-offs?

Last updated 8 months ago