Mastering Data Analysis: A Deep Dive into Trade Flows using Pandas Grouping

The transition from foundational learning to executing a complete data analysis project is a critical step in any data science internship. This publication reflects the core technical and analytical skills gained during my DataraFlow internship, culminating in a detailed analysis of Nigeria's international trade flows using real-world data from the United Nations Comtrade database.
Core Learning: The Power of Data Grouping
My ninth week focused intensely on advanced data manipulation techniques, particularly leveraging the Pandas library in Python. The central concept mastered was the Split-Apply-Combine paradigm, a powerful model for solving complex data analysis problems.
This paradigm breaks down intricate tasks into three manageable steps:
Split: Using the
groupby()method to divide a large dataset into multiple smaller "mini-dataframes" based on key features (likeCommodityorTrade Flow).Apply: Applying a function to each of these isolated groups. This could be either a summary/aggregation function (e.g., calculating a total, average, maximum) or a filtration function (e.g., keeping only groups that meet a certain threshold).
Combine: Automatically recombining the results from each group back into a single, comprehensive output DataFrame.
We learned how to use built-in aggregation functions like sum(), mean(), max(), and min() directly within the aggregate() method, allowing for concise and powerful generation of summary reports. Furthermore, we practiced grouping by multiple columns (e.g., ['Commodity', 'Year']) to generate highly granular reports, such as breaking down annual trade into categories like 'Processed Milk Exports in 2014'.
Project Methodology and Approach
The capstone task was to use the skills gained to conduct an exploratory data analysis on a self-selected national trade dataset. I chose to analyze Nigeria's 2023 monthly trade data for three key commodity categories:
Mineral fuels, mineral oils and products of their distillation; bituminous substances; mineral waxes. (HS Code 27)
Cocoa beans; whole or broken, raw or roasted (HS Code 180100)
Fertilizers, mineral or chemical; nitrogenous, urea, whether or not in aqueous solution (HS Code 310210)
1. Data Acquisition and Preparation
The first challenge was successfully loading the trade data. The raw CSV contained a parsing issue due to an unexpected trailing comma.


Handling Raw Data: I addressed the CSV structure issue by inspecting the raw file, anticipating the extra column, and then using Pandas to load the data with appropriate data types for commodity codes.
Cleaning and Structuring: The initial dataset had 48 columns. I pared this down to a manageable subset, focusing on:
Year,Period,Trade Flow,Partner,Commodity, andTrade Value (US$). I then created a more readable metric, 'Trade Value in Million (US$)', which is a conversion of theTrade Value (US$)into million.# define the columns to subset COLUMNS = ['refYear', 'period','flowDesc','reporterDesc', 'partnerDesc', 'cmdDesc','cmdCode','primaryValue'] # subset the columns tradeDataSub = tradeData[COLUMNS] # There is a need to define the column headers to something more descriptive new_column = {"refYear":"Year", "period":"Period", "flowDesc":"Trade Flow", "reporterDesc":"Reporter", "partnerDesc":"Partner", "cmdDesc":"Commodity", "cmdCode":"Commodity Code", "primaryValue":"Trade Value (US$)" } tradeDataSub = tradeDataSub.rename(columns=new_column) # Define function to convert Trade Value to Million US$ def millionUSD(value): return value / 1000000 # apply the function to the Trade Value column tradeDataSub["Trade Value in Million (US$)"] = tradeDataSub["Trade Value (US$)"].apply(millionUSD)Subsetting for Country Analysis: The original data included a 'World' partner row, which represents the total trade for that month. I created a dedicated DataFrame that excluded these 'World' totals to ensure the analysis focused only on trade with individual country partners.
tradeDataSub_world = tradeDataSub[tradeDataSub['Partner'] == 'World'] tradeDataSub_countries = tradeDataSub[tradeDataSub['Partner'] != 'World']
2. Exploratory Data Analysis (EDA)
The subsequent analysis focused on answering fundamental questions about Nigeria's 2023 trade:
A. Trade Balance and Monthly Trends
By grouping the data by Period (month) and calculating the total trade values, the monthly patterns emerged:
# FIND WHICH MONTHS SAW THE LARGEST TOTAL VALUE OF IMPORTS AND EXPORTS
# IMPORTS BY MONTH
imports_by_month = tradeDataSub_countries_imports.groupby('Period')['Trade Value in Million (US$)'].sum().sort_values(ascending=False)
print("\nMonthly Import Values (sorted from highest to lowest):")
for period, value in imports_by_month.items():
print(f" Period {period}: ${value:>12,.2f}M")
print(f"\nMonth with LARGEST imports: Period {imports_by_month.idxmax()} with ${imports_by_month.max():,.2f}M")
print(f"Month with SMALLEST imports: Period {imports_by_month.idxmin()} with ${imports_by_month.min():,.2f}M")
print(f"Average monthly imports: ${imports_by_month.mean():,.2f}M")
# EXPORTS BY MONTH
exports_by_month = tradeDataSub_countries_exports.groupby('Period')['Trade Value in Million (US$)'].sum().sort_values(ascending=False)
print("\nMonthly Export Values (sorted from highest to lowest):")
for period, value in exports_by_month.items():
print(f" Period {period}: ${value:>12,.2f}M")
print(f"\nMonth with LARGEST exports: Period {exports_by_month.idxmax()} with ${exports_by_month.max():,.2f}M")
print(f"Month with SMALLEST exports: Period {exports_by_month.idxmin()} with ${exports_by_month.min():,.2f}M")
print(f"Average monthly exports: ${exports_by_month.mean():,.2f}M")
# COMBINED ANALYSIS
# Create a combined view
combined_monthly = pd.DataFrame({
'Imports': imports_by_month,
'Exports': exports_by_month
})
combined_monthly['Total Trade'] = combined_monthly['Imports'] + combined_monthly['Exports']
combined_monthly = combined_monthly.sort_values('Total Trade', ascending=False)
print("\n" + combined_monthly.to_string())
print(f"\n\nMonth with LARGEST total trade: Period {combined_monthly['Total Trade'].idxmax()} with ${combined_monthly['Total Trade'].max():,.2f}M")
print(f"Month with SMALLEST total trade: Period {combined_monthly['Total Trade'].idxmin()} with ${combined_monthly['Total Trade'].min():,.2f}M")
Largest Total Trade: November 2023 (Period 202311) recorded the highest total trade value (Imports + Exports) at $9,531.65 million.
Monthly Extremes:
Largest Imports: January 2023 at $2,365.99 million.
Largest Exports: December 2023 at $7,397.30 million.

Below is a visualization to support the values obtained:

B. Main Trading Partners (Overall)
By grouping the data by Commodity Code and summing the trade value for all partners, I identified the primary trade relationships for each commodity:
Commodity (HS Code) | Top Overall Partner | Total Trade Value (Million US$) |
Mineral Fuels (27) | Netherlands | $10,306.60M |
Fertilizers (310210) | Brazil | $494.81M |
Cocoa Beans (180100) | Netherlands | $263.27M |
The Netherlands stands out as the most significant partner, dominating trade in both high-value Mineral Fuels and agricultural Cocoa Beans.
3. Advanced Grouping and Insights
To derive deeper insights, the data was grouped by the triple key: ['Trade Flow', 'Commodity', 'Year'].
A. Import vs. Export Partner Concentration
This multi-level grouping confirmed key trade dependencies:
Commodity | Trade Flow | Top Partner (2023) | Total Value (Million US$) | Insight |
Mineral Fuels | Export | Netherlands | $7,459.73M | Netherlands serves as a major hub for Nigerian crude oil and gas exports. |
Mineral Fuels | Import | Belgium | $5,294.94M | Nigeria relies on Belgium as a primary source for imported mineral fuels (likely refined products). |
Cocoa Beans | Export | Netherlands | $263.14M | Confirms the Netherlands' role as a key processing/distribution hub for Nigerian cocoa. |
Fertilizers | Export | Brazil | $494.81M | Brazil is a dominant market for exported Nigerian fertilizers, reflecting its large agricultural sector. |
Fertilizers | Import | Egypt | $23.88M | Imports are significantly smaller than exports, indicating a strong trade surplus in this commodity. |
B. Assessment of Seasonal Trends
By visually assessing the aggregated monthly totals:
Imports (largely Mineral Fuels) showed a notable mid-year dip (June/July), followed by a recovery, suggesting potential seasonal or logistical bottlenecks for imports during the rainy season or in the middle of the year.
Exports displayed a strong end-of-year peak (November/December), which is typical for agricultural commodities like cocoa (harvest season) and possibly related to year-end global demand for fuel products.
Future Analytical Questions
The successful application of grouping techniques opens the door to much more complex and valuable analysis. Moving forward, potential questions that could be addressed using this methodology include:
Partner Concentration & Dependency: What percentage of Nigeria's trade value is concentrated in the top 5 partners? Are we over-dependent on a single country, identifying potential supply chain risk?
Trade Balance by Partner: For the top 10 partners, what is the net trade balance (Exports minus Imports)? Which relationships generate the largest surplus or deficit?
Commodity Specialization: Which partners are exclusive buyers or sellers of a specific commodity, indicating sole-source dependencies?
Growth and Trends: What is the year-over-year growth or decline rate in trade for each of the top partners?
This project demonstrated that mastering data grouping is essential for transforming raw, heterogeneous trade data into structured, meaningful, and actionable business intelligence.



