首页 > > 详细

讲解 ECON3173 – Cross Section and Panel Data Analysis讲解 Python编程

ECON3173 – Cross Section and Panel Data Analysis

Individual Project: Guidelines and Questions

This document provides guidelines and questions for the Individual Project of ECON3173, which accounts for 40% of the total marks.

Honoring   the   precepts    of   academic   integrity    and   applying   their    principles   are fundamental responsibilities of all students and scholars at BNBU. You are advised to read through the BNBU Guidelines for Handling Academic Dishonesty file on iSpace before you start your assignment. Any form. of plagiarism or cheating can result in various disciplinary and corrective activities. Using generative AI tools is not allowed.

Deadline: by Dec.14, 2025.

Submission Method:

a)    Please   submit  your  typing  assignment  report  in  a  single   PDF  file  to  Turnitin Submission Link: Report’ via iSpace. The file name of your PDF submissions should have  the  following  format: ECON3173_Project_Student  ID_Name  in  Pinyin   (e.g., ECON3173_Project_190000001_Mi Lin).

b)    Save    your    data     and    .do    file(s)     in    a     zip    file.    Name     your    zip    file     as ECON3173_Project_Student ID_Name in Pinyin. Then, upload your file to ‘Submission Link:  Stata  Data  and  Program’ via  iSpace.  You  are  expected  to  submit  2  .do  files, namely ParA.do and PartB.do, respectively, should be able to replicate each part of your submitted work.

c)    Use the ‘ECON3173_Individual  Project_Report Template’ file on the iSpace to input your report. Ensure you provide a question number for each part of your work.

Format Requirements

Cover page:

Please enter your name and student ID at the top of the report template cover page, available on iSpace.

Word limit:

The  required  minimum word   count  is   1,500  words, with a maximum of 2,000 words in total, excluding tables, graphs, and appendices.

Referencing:

Your report should include appropriate references in  APA format to a variety of necessary literature sources and a wide- ranging bibliography of academic aspects of economics.

Font / Size:

Cambria 12 or Times New Roman 12.

Spacing / Sides:

1.0 / Single-sided / Single-line spacing between two paragraphs.

Pagination required:

Yes

Margins:

At 2.50 to both left and right, and ‘justified’.

Project Theme: Access to External Finance and Firm Performance

Introduction:

In this project, you are invited to empirically investigate the determinants of financial access and its subsequent causal impact on firm performance (measured by Sales) using the World Bank Enterprise Survey Data (WBESD).

One of the most cited constraints for firms in developing economies is a lack of access to external  finance.  You  will  test  whether  alleviating  financial  constraints  (e.g.,  gaining access to credit) causes firms to expand output.

The project is divided into two analytical stages:

l Determinants   of    Credit: Using    cross-sectional    techniques   to    model   the probability of a firm having a loan;

l Impact of Credit: Using panel data techniques to test if gaining access to credit causes firms to expand output.

The WBESD  database  collects  information  on  firm  performance,  growth,  and  related factors. The entire database is available to researchers and includes all survey questions at the firm level.

Guidelines to download and prepare data for this individual project:

a)   Please visithttps://login.enterprisesurveys.org/to register your user account for the WBESD database (see the snapshot below). Registration is free.

b)  There are a total of 168 economies represented in the World Bank Enterprise Surveys Database (WBESD). Among these, 83 economies have a time span of at least three years. For their individual projects, students are required to use data from a panel of random combinations of three different economies out of the 83 economies.

Data  allocation  protocol: Students  must  first  pick  a  lottery  ticket  number.  An “Individual Project Lottery Ticket Sign-up Sheet” will be available in iSpace from 9 p.m. on Friday, 28/11/2025. Please sign up for a lottery ticket number by Sunday, 30/11/2025. We will operate on a 'first-come, first-served' basis.

A lucky draw will be conducted in class on Monday, 01/12/2025, to assign specific economies to each lottery ticket number.

c)   Once  registration  is  completed,  log  in  and  download  the  data  following  the  steps below:

i.     Login with your username and password. You will be directed to the ‘Full Survey Data’ page.

ii.     Select ‘Panel data’ under ‘Survey Type’ on the left. Ensure you are on the ‘Data by Economy’ view instead of ‘Combined Data’. See the snapshot below.

iii.     Download  your  economies’  corresponding  data and documentation for all the available years.

For example, Afghanistan has two panel data files, one for 2005 and 2009, and the other for 2008, 2010, and 2014. Then download both of them.

iv.     Extract the data and survey documentation files into a working folder on your PC.

The data file is now ready to open in Stata.

d)  Appendix A at the end of this document offers guidelines for data construction and

cleaning when working with WBESD data. Read it carefully before you begin.

Answer ALL of the Following Questions

Note that this is not an essay-type assignment. Please answer the questions one by one. For each question, the performance of the Stata do files accounts for 20% of the marks. Support    your     answers     with     regression     tables,     graphs,      Stata     output,     and explanations/discussions.

Part A: Data Management and Exploratory Analysis (15%)

Q1 (5%) Data Preparation:

Use the Stata command “append” to combine data from all years and all the selected economies into a single Stata data file with a panel data format and complete the following data preparation tasks:

●     Select and rename the variables according to Table 1 below. ‘Old name’ refers to the  variable  name  in  the  original  dataset,  while  ‘New name’  is  the  new corresponding name to be defined.

●     Generate a new dummy variable  creditdum: Equals 1 if the firm has a line of credit or loan from a financial institution (k8 = yes); otherwise 0.

●    Generate a new dummy variable  Femaledum: Equals  1 if the firm has female participation in ownership (b4 = yes); otherwise 0.

●     Generate a new variable   ln(sales): The natural logarithm of total annual sales. Table 1: Variable List

Survey Questions

Old name

New name

The year the survey was conducted

year

year

Panel ID (the same ID for each firm across different years)

panelid

panelid

What percentage of  this firm is owned by Private foreign individuals, companies, or organizations %

b2b

foreign

During the past fiscal year, what were this establishment’s total annual sales?

d2

sales

Total number of permanent, full-time workers at the end of the last fiscal year

l1

labor

Year of Survey – Year establishment began operations + 1

year b5 + 1

age

Q2     (10%) Conduct exploratory data analysis:

●  Provide summary statistics for the variables created in Q1.

●  Compare the average  ln(Sales) for firms with  credit (creditdum = 1) versus those without (creditdum = 0). Is the difference statistically significant?

●  Briefly comment on the prevalence of credit access across the different economies in your sample.

Part B: Cross-Sectional Analysis (20%)

Q3     (20%) Determinants of Access to Credit:

Before analyzing the effect of credit, we must understand who gets credit. Restrict your sample to the most recent survey year only (treat this sub-sample as cross- sectional data).

Estimate the probability of having a credit line based on firm characteristics:

pro(creditdumi = 1|x) = F(β0 + β1ln(Labor)i + β2Agei + β3Foreigni + β4Femaledumi)   (1)

l Estimate the model using both the Probit and Logit estimators. Report the results side-by-side. Compare the Pseudo-R2. Do the models yield consistent inferences regarding significance?

l Interpret the coefficient of  Femaledumi from the Logit model. Then, calculate and report the average marginal effects for all variables in the Probit model.

●  Explain why the raw coefficients in non-linear binary response models cannot be interpreted as simple marginal effects (unlike in OLS).

Part C: Panel Regression and Causal Inference (65%)

Q4     (15%) Baseline Fixed Effects Model

Revert to the  full  Panel  Dataset  (all  years  and  all  three  economies).  Consider  a standard  performance  model  in  which  sales  depend  on  labor  inputs  and  firm characteristics. Report all the results side by side.

ln(Sales)it = β0 + β1ln(Labor)it + β2Ageit + β3Foreignit + uit         (2)

●  Estimate equation (2) using OLS, Fixed Effects (FE) estimator controlling for time- invariant individual effects, FE estimator controlling for individual-invariant time effects,  and  FE  estimator  controlling  for  both  time  and  individual  effects. Provide examples of individual effects and time effects in the current context. Comment on your regression results.

●  Compare the result of the FE estimator controlling for both time and individual effects to a Random Effects (RE) model using the Hausman test. Interpret the test result.

●  Comment on the elasticity of sales with respect to labor in your preferred model.

Q5 (15%) The Effect of Credit Access (Naive Approach)

Expand your model from Q4 to include  credit_dum as the mainvariable of interest.

ln(sales)it = β0 + β1credit_dumit + yx it + μi + δt + E it                  (3)

●  Explore  the  WBESD  database  to  include  appropriate  other  control  variables based on the literature as you see fit. Give justifications for adding these extra control variables.

●  Run the regression and interpret the coefficient  β1,  and  explain the estimated result.

●  Discuss to what extent we could use the estimated coefficient on  credit_dumit for causal inference?

Q6     (15%) Causal Inference: Further Investigation

To better address causality, implement a Difference-in-Differences (DiD) strategy focusing on firms that changed their credit status.

●  Define a Treatment Group (Firms that did not have credit in period  t 一 1 but gained it in period   t) and a Control Group (Firms that never had credit).

●  Estimate the standard Two-Way Fixed Effects (TWFE) DiD equation:

yit  = αi + λt + δDiD(Treati × postt) + βxit + E it                    (4)

●  Report the estimator for  δDiD.

●  Discuss the Parallel Trends Assumption required for this estimator to be valid. Q7     (20%) Robustness

To what extent could we use the estimated coefficient on  Treati × postt  obtained in   Q6  for  causal  inference?  How  could  we   ensure  that  the   Parallel  Trends Assumption  holds?  Is  the  treatment  effect  long-lasting?  Is  the  treatment  effect homogeneous?

Illustrate a suitable empirical strategy for the above questions. Estimate the model using your chosen approach, and compare the results with those from Q6. Interpret and discuss the findings. Explore the WBESD database to include other variables as you see fit.

Appendix: Guidelines for Data Construction and Cleaning

(Read this carefully before starting your Stata analysis)

The World Bank Enterprise Survey Data (WBESD) is a rich resource, but it requires careful cleaning to be usable for empirical studies. Real-world data is rarely “ready to run”. Follow the steps below to construct your dataset.

Phase 1: Data Merging and Compilation

1. File Selection:

l Do not download single-year cross-section files (e.g., “Vietnam 2015”).

l Download the  “Panel”  datasets.  These files usually have  names like Vietnam- 2015-2023-Panel-Data.dta. They contain the crucial “panelid” variable that links firms across time.

2. Combining Economies (The append Strategy):

l You need three economies. Do not try to merge them side-by-side. You want to stack them on top of each other (long format).

l Stata Workflow: Open the first country’s dataset, generate a country ID, save it. Open the second, generate a country ID, append the first, etc.

l Code Hint in Stata:

use "Vietnam_Panel.dta", clear

gen country_name = "Vietnam"

save "combined_data.dta", replace

use "Senegal_Panel.dta", clear

gen country_name = "Senegal"

append using "combined_data.dta"

save "combined_data.dta", replace

3. Variable Standardization:

l Check   variable   names   across    countries.   While   the   World    Bank   tries   to standardize  (e.g.,  d2  is  always  Sales),  sometimes  older  files  use  d2_2015  or sales_val.

l Use the command lookfor sales or lookfor labor to find the correct variable codes in each dataset before appending.

Phase 2: Cleaning and Consistency

1. Handling Missing Values and Codes:

l WBESD often uses special codes for missing data:

o -9 = Don't Know

o -7 = Refusal

o -8 = Does not apply


l Crucial Step: You  must  convert  these  to   Stata  missing  values   (.)   before calculating means or running regressions. If you treat -9 as a real number, your averages will be wrong.

l Code Hint in Stata:

mvdecode _all, mv(-9 -8 -7)

2. Outliers and Monetary Values:

l Sales (d2) are reported in local currency units (LCU).

l Do not compare raw nominal sales between Vietnam (Dong) and Senegal (CFA Franc) directly.

l Solution: We use log_sales and Country Fixed Effects (or Firm Fixed Effects). The Logarithm roughly normalizes the scale differences.

l Winsorizing: Real data often has data entry errors (e.g., a firm reporting 1000% growth).  It  is  good  practice  to  winsorize  the  top/bottom  1%  of  continuous variables, such as sales and employee counts.

l Code Hint in Stata (requires ssc install winsor2):

winsor2 sales, cut(1 99) replace

Phase 3: Handling Panel Time Gaps

This is the most challenging part ofusing WBESD. Unlike annual stock market data, these surveys happen irregularly (e.g., 2013, 2016, 2020).

1. Declaring Panel Data:

l You cannot just use panelid if IDs are repeated across countries (e.g., Firm #1 in Vietnam and Firm #1 in Peru).

l Create a unique ID: egen unique_id = group(country_name panelid)

l Declare data: xtset unique_id year

2. Defining the "Treatment" (Switchers):

l A firm is "Treated" in the DiD sense ifit goes from No Credit (k8=0) in one wave to Yes Credit (k8=1) in the next.

l Identify the year the switch happened. Since there are gaps, we assume the switch happened between the survey waves.

3. Imputing  Dynamics  for  Event  Studies (this is only relevant if  you  choose  to conduct event studies):

l Because you don’t have data for every year (e.g., data exists for   t = 2015 and t = 2019, but missing   2016, 2017, 2018), you cannot create a standard “Year-1, Year-2” event plot.

l The “Relative Wave” Solution: Instead of "Years since treatment", use “Waves since treatment”.

l Constructing the Variable: If a firm is treated in 2019 (it had no credit in 2015, but has credit in 2019):

o 2015 is Time   t = -1 (Pre-treatment)


o 2019 is Time   t = 0 (Treatment/Post)

o 2023 is Time   t = 1 (Post-treatment persistence)

Use these “Relative Time” indicators to plot your coefficients if needed.

Phase 4: Common Pitfalls to Avoid

The “Inconsistent Panel” Trap:

l Some firms appear in 2015, 2018, and 2023 but are missing in 2020.

l For the First Difference or Lagged models, Stata will drop these firms because it cannot calculate   (t) - (t - 1).

l Check: Use xtdescribe to see your pattern. Ideally, keep firms that are present in consecutive waves for the DiD analysis.

l Creating a Time Index: Do not use the calendar year as your time index for xtset. Instead, generate a sequential Wave Index, e.g.,

gen wave = .

replace wave = 1 if year == 2015 (for example)

replace wave = 2 if year == 2028 ... and so on.

Use xtset unique_id wave to declare the panel.


联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!