PIQ : 51-100
đź§® Data Manipulation & Analysis
How do you apply a custom function row-wise or column-wise in a DataFrame?
What is the purpose of
explode()in Pandas? Give an example use case.How do you flatten a hierarchical (MultiIndex) column structure after a
groupby()orpivot_table()?How do you calculate correlation between columns in a Pandas DataFrame?
How would you calculate percentage change between rows in a numeric column?
How do you remove rows where a column contains only whitespace or empty strings?
How do you detect and remove outliers using the IQR method in Pandas?
What’s the difference between
drop_duplicates()andduplicated()?How do you convert strings like "1,200.50" to float in Pandas?
How can you normalize or standardize columns in a DataFrame?
What are the different
jointypes supported bymerge()in Pandas?How do you merge DataFrames on multiple columns (composite key)?
What’s the difference between
stack()andmelt()? When should you use each?How do you perform a cross join (Cartesian product) between two DataFrames?
How do you shift time series data forward or backward in Pandas?
How do you interpolate missing datetime values in a time series DataFrame?
How do you extract year, month, or weekday from a datetime column in Pandas?
How do you downsample or upsample time series data?
What are vectorized operations in Pandas? Why are they preferred over loops?
How can you identify memory usage of each column in a DataFrame and optimize it?
How do you find the top 3 most frequent values in a column using Pandas?
Explain how Pandas handles NaT (Not a Time) and how it differs from NaN.
How would you compare two DataFrames and find differences (rows or cells)?
What’s the difference between
DataFrame.iterrows()andDataFrame.itertuples()? Which is faster and why?How do you create a new column that ranks rows within each group (e.g., using
groupby)?Describe how to perform a left join with
merge()and keep only unmatched rows.How do you reshape a DataFrame from wide to long format and back using
melt()andpivot()?What are chained assignments in Pandas, and why should they be avoided?
How do you bin continuous numeric data into discrete intervals (e.g., age groups)?
How would you use Pandas to detect data quality issues in a newly loaded dataset? (E.g., missing values, unexpected data types, outliers)
What’s the difference between
query()and traditional filtering with boolean masks? Which one is more readable or efficient?How do you apply different aggregation functions to different columns in
groupby().agg()?Explain
pipe()in Pandas. How can it help in writing cleaner transformation pipelines?What are the differences between
merge_asof()andmerge()? Give a use case formerge_asof().What’s the difference between
isin()andstr.contains()when filtering rows?How do you validate that a column contains only values from a known set?
How would you unit test a function that returns a Pandas DataFrame?
What causes a
SettingWithCopyWarningin Pandas, and how do you prevent it?How do you compare two DataFrames for equality, including NaNs?
What does
DataFrame.equals()check compared to==or.eq()?How do you quickly plot histograms or line charts from DataFrame columns?
How would you summarize and visualize missing values in a large dataset?
What’s the role of
value_counts(normalize=True)? Where would you use it?How can you write a DataFrame to a compressed file (like
.zip,.gz, or.parquet)?What’s the difference between saving to
.csv,.pickle, and.parquet? When would you choose each?How do you integrate Pandas with APIs (e.g., parsing JSON into a DataFrame)?
How can you read multiple files (e.g., CSVs in a folder) and concatenate them into one DataFrame?
How do you handle column name conflicts when merging DataFrames?
How would you perform batch data transformation (ETL style) using Pandas with 1M+ rows?
How do you persist intermediate DataFrames during a long processing pipeline efficiently?
Last updated