辅导 LING 226 Assignment 2, 2023辅导留学生Python程序

LING 226 Assignment 2, 2023

Short Program and Written Reflection 2 (25% of total grade)

The goal of this assignment is to ask and answer a question about the linguistic profiles of different texts. The question being asked should be motivated by your understanding of and possible assumptions about the texts. These motivations and assumptions do not necessary have to be correct, but they should be logical/reasonable/justifiable. For example, we could assume that children’s books are written with less sophisticated vocabulary when compared to adult books – this is a reasonable assumption. We might also expect a horror novel to include more negative language when compared to a romance novel. In both cases, we can test these questions by constructing a linguistic profile of the texts and comparing them.

For the child/adult book comparison, we may want to create a lexical profile of frequency, diversity, concreteness, and potentially other measures of complexity. For the horror and romance comparison, we may want to construct a profile based on emotional vocabulary, sentiment, and other measures related to affect. In both cases, more than one measure would be employed to compare the texts (hence creating a linguistic profile). If more than one text is used for each category, we would calculate an average score for each category, and compare the results. Your job is to generate and a research question which you answer through calculating and comparing the linguistic profiles of different categories of texts. Specifically, please generate and test one question related to:

1. A small corpus of your own design which contains at least two categories of texts

。 A good minimum for your corpus will be ten texts per category, with two categories, with approximately equal total words per category.

。 You are free to use texts from anywhere, including data provided by Stephen, questions from The Current, or elsewhere.

In addition to developing and comparing the linguistic profiles of your texts, you need to further complicate your analysis by examining the nature of your results when accounting for at least one distributional or syntactic property of the texts. This means information such as part of speech, collocations, or word vs. phrase-level measures (ngrams) should be incorporated into your research questions. For example, instead of just comparing the lexical sophistication between children and adult books for all words in a text, you would instead run separate analyses for nouns and verbs. Or, you might be interested in profiling the nature of collocations and/or ngrams for different measures. The choice is yours and might further interact with your research question (e.g., you might find that sentiment ratings of adjectives pattern with specific nouns that follow).

Comparison of lexicon and distributional/syntactic information

Lexicon Information	Distributionaland Syntactic information
Sentiment & Emotion Ratings	Bigrams, ngrams
Age of Acquisition	Part of speech
Concreteness	Collocates

Your Python Code

You should create code cells and functions which:

• Load in and preprocess your text(s)

◦ (The specific choices you make for preprocessing should be appropriate for your analysis)

• Analyse your data for various lexical and syntactic features

• Output your analysis either into the notebook, or written to file in a spreadsheet/text document

• You data should be made available either by reading it in through URL or provided with your submission

Just as before, the course notebooks have everything you need to create these functions. You can reuse any and all of the functions in the course notebooks to create your program. You will however likely need to make modifications in order to adapt these functions / code cells to your particular analysis.

Your Written Reflection

In your notebook, you should prepare a report describing on your analysis, which should include these sections:

• Research Questions

◦ Clear statements of your research questions

◦ The rationale behind your research questions

◦ Predictions of what you will find

• Data

◦ Explanation of the data and what categories it represents

◦ How you gathered the data for your own corpus

• Analysis

◦ Description the linguistic profiles

◦ Decisions about preprocessing and other preparations of the texts

◦ Which lexical features were included, and why?

• Results & Discussion

◦ Interpretation and discussion of your results (i.e., answering your research questions)

■ Did your expectations hold? Why or why not?

◦ Any remaining questions / limitations based on what happened during your analysis

Your report should be about 500-600 words long (up to 900 if attempting challenge). You should submit your assignment as a .ipynb notebook file in Nuku/Canvas by the due date. You should also provide your corpus data (or load the data in via URL). Your notebook should have a text cell at the start with includes your name, your student ID, and whether you are attempting to complete the challenge (see below). The notebook should include all of the code cells, plus your written report as text cells. You are free to mix code and text cells as you deem appropriate.

Marking Guidelines

A-level papers will contain two clearly articulated research questions and the rationale behind the questions. There are clearly stated decisions behind why certain linguistic features are chosen to construct the linguistic profiles, as well as well as the choice(s) behind distributionaland syntactic measures. The presentation and interpretation of the results are used to provide direct answers to the research questions. The report also reflects on any limitations or remaing questions. A link and/or data files are provided for the corpus. All of the code cells will work properly. The paper includes a successful attempt at the challenge.

B-level papers will contain two research questions and provide some rationale for the questions. The decisions behind why certain lingusitic features were chose may be unclear, as are the reasons for choosing particular distributionaland syntactic measures. Answers to the resarch questions are provided. A link and/or data files are provided for the corpus. All of the code cells will work properly. A challenge may be attempted, but to limited success.

C-level papers will contain unclear research questions and/or research questions with unclear motivation or justification. The linguistic profiles may be under explained and/or weakly connected to the research questions. Data presentation and interpretation will lack detail and/or not clearly connect to the research questions. Some attempt is made to answer the research questions. A link and/or data files are provided for the corpus. All of the code cells will work properly. No challenge is attempted.

D-level papers will have unclear research questions and/or poorly motivated linguistic profiles.

Attempts are made to answer the research questions, which may be only partially successful. Data may not be included, and some code cells may not work. No challenge is attempted.

A-level Challenge

A-level papers need to go above and beyond the rest. Students need a way to play to their strengths. The challenge provides that opportunity. Students can either flex their computer science skills, showcase their critical thinking abilities and/or domain knowledge outside of computer science, or some combination of both. In either case, you should be driven by a desire to have your assignment used as an exemplar for next year’s cohort of students.

I want to leverage my computer science skills:

Just as in assigment 1, expand the boundaries of your program beyond the level of the code provided in the notebooks in order to provide more sophisticated presentation and analyses of your data. You might want to explore different visualisations in Python, and/or play around which some classification functions (e.g., some machine learning or statistical comparisons). You would still write a report which meets the criteria of A.

I want to leverage my non-computer science knowledge:

Include more justification and external research for your research questions and analysis. You might conduct your anaysis as a replication of some published research, or instead continue a research direction already established in prior literature. Your interpretation of your results is connected not only to your data, but also results of previous studies.

联系我们