PIQ : 51-100

🧮 Data Manipulation & Analysis

How do you apply a custom function row-wise or column-wise in a DataFrame?
What is the purpose of explode() in Pandas? Give an example use case.
How do you flatten a hierarchical (MultiIndex) column structure after a groupby() or pivot_table()?
How do you calculate correlation between columns in a Pandas DataFrame?
How would you calculate percentage change between rows in a numeric column?
How do you remove rows where a column contains only whitespace or empty strings?
How do you detect and remove outliers using the IQR method in Pandas?
What’s the difference between drop_duplicates() and duplicated()?
How do you convert strings like "1,200.50" to float in Pandas?
How can you normalize or standardize columns in a DataFrame?
What are the different join types supported by merge() in Pandas?
How do you merge DataFrames on multiple columns (composite key)?
What’s the difference between stack() and melt()? When should you use each?
How do you perform a cross join (Cartesian product) between two DataFrames?
How do you shift time series data forward or backward in Pandas?
How do you interpolate missing datetime values in a time series DataFrame?
How do you extract year, month, or weekday from a datetime column in Pandas?
How do you downsample or upsample time series data?
What are vectorized operations in Pandas? Why are they preferred over loops?
How can you identify memory usage of each column in a DataFrame and optimize it?
How do you find the top 3 most frequent values in a column using Pandas?
Explain how Pandas handles NaT (Not a Time) and how it differs from NaN.
How would you compare two DataFrames and find differences (rows or cells)?
What’s the difference between DataFrame.iterrows() and DataFrame.itertuples()? Which is faster and why?
How do you create a new column that ranks rows within each group (e.g., using groupby)?
Describe how to perform a left join with merge() and keep only unmatched rows.
How do you reshape a DataFrame from wide to long format and back using melt() and pivot()?
What are chained assignments in Pandas, and why should they be avoided?
How do you bin continuous numeric data into discrete intervals (e.g., age groups)?
How would you use Pandas to detect data quality issues in a newly loaded dataset? (E.g., missing values, unexpected data types, outliers)
What’s the difference between query() and traditional filtering with boolean masks? Which one is more readable or efficient?
How do you apply different aggregation functions to different columns in groupby().agg()?
Explain pipe() in Pandas. How can it help in writing cleaner transformation pipelines?
What are the differences between merge_asof() and merge()? Give a use case for merge_asof().
What’s the difference between isin() and str.contains() when filtering rows?
How do you validate that a column contains only values from a known set?
How would you unit test a function that returns a Pandas DataFrame?
What causes a SettingWithCopyWarning in Pandas, and how do you prevent it?
How do you compare two DataFrames for equality, including NaNs?
What does DataFrame.equals() check compared to == or .eq()?
How do you quickly plot histograms or line charts from DataFrame columns?
How would you summarize and visualize missing values in a large dataset?
What’s the role of value_counts(normalize=True)? Where would you use it?
How can you write a DataFrame to a compressed file (like .zip, .gz, or .parquet)?
What’s the difference between saving to .csv, .pickle, and .parquet? When would you choose each?
How do you integrate Pandas with APIs (e.g., parsing JSON into a DataFrame)?
How can you read multiple files (e.g., CSVs in a folder) and concatenate them into one DataFrame?
How do you handle column name conflicts when merging DataFrames?
How would you perform batch data transformation (ETL style) using Pandas with 1M+ rows?
How do you persist intermediate DataFrames during a long processing pipeline efficiently?

PreviousPIQ : 1-50 NextPIQ : 101-150

Last updated 8 months ago