Project
Project Ideas
The following are possible ideas for project (but not limited to
this).
- Develop a working MT/ASR/TTS/ST system for some new (no
high-quality available MT) and low-resource languages
(e.g. Spanish-to-Tamil), explore and solve challenges along the
way
- Improving methods to better utilize monolingual data
- Extending and improving Vocabulary and Tokenization for NMT
- Improving evaluation quality and efficiency, certain
human-assisting tools for evaluation, conduct study.
- Computer-assisted and interactive translation methods
- MT for multimodal data, e.g. video translation, speech
translation
- Integrating domain knowledge into MT/ASR/TTS system
- Novel hardware-based MT/ASR/TTS/ST system, e.g. Compress MT
model to very small size and build a system (with inference but
not training) on mobile phones, or extending existing CUDA
library (e.g. LightSeq) to support more complex models.
- Extending a massive NMT (e.g. LegoMT) to a few more languages
Project Proposal
Please submit your project proposal (one submission per team and be
sure to include all your teammates with the submission) to Canvas.
The proposal document should be about 1 page (maximum of 2 pages).
Be sure to describing the following elements:
- What research problem are you planning to address?
- What multilingual challenge(s) are you planning to address as
part of this project?
- What are the existing state-of-art methods on this problem? Is
the source code/model available?
- Possible directions for proposed method.
- What datasets are you planning to use? Please provide the link
to the dataset accompanied by a small description. If you have
yet to download the dataset, please explain any issues you may
be having.
- What is the evaluation metric?
- Who is your team and how are you planning to split the
workload between team members? Can you provide a rough
timeline/milestones you plan to follow?
- What CPU, GPU and storage infrastructure do need for this
project? Are you interested in using Amazon Web Services (AWS)
or Google Cloud Platform (GCP) as part of your project? Please
estimate the amount of computation time required.
Project Final Report
The final report is intended to be structured as a
workshop/conference paper, as mentioned in the first lecture slides.
Y'all may have gotten an idea of the typical sections involved in a
paper in HW3. Nonetheless, to help with the broad sections typically
involved in a research paper, please see below:
Kindly note that this is meant as a guide, and you can be flexible
in case you have experience writing research papers. There is an
upper-limit cap of 8 pages and Appendix length is unrestricted.
- Introduction/Motivation
This essentially lays out the motivation for the problem, talks
about why we need to work on it, the key contributions presented
in the work.
- Related Work/Background
This talks about key papers/works that provide context to your
current work. Instead of listing down multiple past works, talk
about the ones that minimally differ from your work, and how.
- Methodology
This section talks about your method, raises research questions
and how you are going to address them.
Experimental Section/Results
This section can describe your experiments and the results you
obtain.
- Experiments can also be merged with the methodology
section if that is more appropriate.
- Analysis/Ablations
Typically, you would have multiple factors involved in your
experimental setting. Analysis sections help you probe deeper
into the results and help piece out contributions from
individual modeling decisions made.
- Conclusion/Discussion
This would list the main takeaways from your work, discuss some
future ideas (if any) and engage in discussion.
- Limitations
This section lays out some known limitations of your work.
- [Project Only] Team Member Contributions
List out each individual's contributions in this section.
Kindly make note of the grading scheme that will be followed for the
project. Please note that each project is different so some things
can be subjective (especially in the A to A+ range). However, we
will try our best to follow this scheme:
- A+: A respectable research contribution that is novel
and effective, and could be submitted largely as-is as a paper
to an academic conference. All elements above are high quality.
- A: A respectable contribution that is largely complete
and promising, but the description, sophistication of
methodology, experiments, or analysis are not as fleshed out and
complete as A+ assignments.
- A-: All required elements are present, but one or more
are lacking to some extent.
- B+ or B: The project is complete, but one or more of
the required elements is seriously lacking.
- B- or below: The project or description thereof is
seriously incomplete.
If you are unsure about what project to pick / need more information
/ want to discuss, Instructor/TAs are here to help you! Please come
to office hour (as a group) to discuss about your proposal.