How to do a Literature Search - CS290: Seminar on Effective Research Practices & Academic Culture / Fall 2023

The literature search is likely one of the first foundational skills you’ll practice as a researcher. It provides you context for the significance/implication of your problem, what makes it difficult to solve, what has been previously done, and what approach you wish to try. Many of the papers you will skim throughout your search may further end up in the related works section of a future paper, where you will contrast them with your work.

Conducting a literature search is hard for several reasons!

First, you may be starting with a broad/non-specific research prompt in a field where, at this point, most of the vocabulary is new to you! This may mean that simple google searches may not result in any useful papers.
Moreover, you may find it takes you a long time to read even single paper. This may leave you wondering how to sift through tens to hundreds of them?
Lastly, even when focusing on a single paper, you may find it difficult to discern which parts of a technical paper relate to your problem and how.

Especially at the start of your Ph.D., the above difficulties are normal and expected; through deliberate and guided practice, you will improve at these skills. In this guide, we provide you with a strategy to get you started.

Overview: the ins-and-outs (or inputs and outputs) of a literature search

Broadly, one can break the literature search into four main components. We will provide an overview for them here, and a more detailed strategy in the following section:

Refining your search criteria to better utilize search engines (e.g. Google Scholar):
- In: A project statement, likely in the form of a high-level motivation and desired outcome
- Out: A set of keywords that will lead to relevant papers in a google search
  
  Tip: Ask colleagues/mentors familiar with your line of work for a suggested set of keywords
Finding papers
- In: A set of keywords
- Out: A broad, unstructured list of relevant papers
  
  Tip: Your papers will likely come from googling key words, searching up the citation tree (i.e. what papers cited this one?), and searching down the citation tree (i.e. what papers did this one cite?). The hardest part here is being able to quickly deduce whether or not the paper is relevant at all (likely from skimming the abstract/introduction)!
Categorizing papers:
- In: A paper from your broad list of papers
- Out: A categorization of how this paper relates to your project, and an idea of how much time to devote reading it
  
  Tip: Decide whether the paper motivates your problem, solves the same problem (read for understanding), a similar problem (read for contrastive sentence), or a subproblem (read for assumptions/feasibility in your setting)?
Organizing your literature list:
- In: A set of categorized papers
- Out: An organized set of papers (i.e. grouped by topic, with succinct descriptions)
  
  Tip: This is possibly the most underrated part! Be realistic about tracking your papers in a way that works for you. Try to organize so that your future self can reference this list and get the main takeaways from each paper without cognitive overload (e.g. rereading the whole paper).

The four components, in detail

Your literature is an active process; as you are engaging in each component below, you should always be asking yourself whether/how this paper relates to your project and contribution.

1. Refining your search criteria

At this first step of the literature search, you probably have a rough idea of what kind of problem setting you’re in but need precision to best utilize search engines like Google Scholar. This is the part where you associate your broad question with a set of keywords.

Sometimes you may stumble upon the appropriate keywords with a few minutes of searching on google; however, often you may not. This is normal – consider asking colleagues/mentors in the relevant field for a suggested set of keywords (or for some keywords to avoid and why). Even better, they may be able to recommend you some relevant papers to start with.

2. Finding papers

There are two tricks to the art of finding relevant papers. The first is very intuitive and involves growing your set of papers: from your set of keywords, you can find papers (e.g. via Google Scholar). You may also choose to look up keyword + “lecture notes,” “survey,” or “blog” to give you a sense with what’s long established in a field vs. what’s recent/new. Note that it is important we are conscious about our citation practices so that we do not further perpetuate systemic inequities in our scientific communities. As such, we additionally ask you to familiarize yourself with scholars in your field who are from minoritized backgrounds, and to specifically look at their work when conducting your literature search.

Tip: If a paper is behind a paywall, you probably can still access it via Harvard Hollis. Follow these instructions to set up convenient access.

If you find even a single relevant paper, you’re in luck! From this paper – the “root” paper” – you can quickly find others by:

Following the “citation tree” down – what other papers did the root paper cite?
Following the “citation tree” up – what other papers have cited this one? Note: you can conduct this step by using the “cited by” link of a paper on Google Scholar.
Oftentimes, repeating this process recursively on papers you find by traversing the citation tree will yield a large number of potentially relevant papers.

While following the citation tree, however, you may notice the number of tabs open in your browser growing at an unprecedented rate. This is where the second trick comes in: learning to quickly identify whether or not a paper is relevant to your project. We emphasize that this not a trivial skill, but it is an important one to develop, as it will save you lots of time. You can do this as follows:

The main question you should ask yourself when opening a paper is: “Is this paper solving a problem that is related to my own?”
There are two places in a paper where you’re most likely to find the answer: the abstract/introduction (what problem are they solving?) and the beginning of the methods/problem setting section (what formalism are they using to solve the problem? and is this formalism valid for your setting?).
While keywords are helpful, they are not perfect, so avoid depending them completely. For example:
- A paper that doesn’t use any of the keywords you identified may actually solve a very similar problem, though from a completely different perspective.
- A paper that uses all the right keywords may in fact have nothing to do with your problem! The paper may solve a problem that is too specific to relate to your setting or the keywords may have a different meaning in a different field (e.g. in statistics, the terms “fixed effects” and “random effects” carry different meanings depending on the sub-community).

3. Categorizing papers

A paper can be relevant to your project in a variety of ways, and this may affect how much time you spend reading the paper, what you look for when reading it, and how to summarize it:

Category A: provides motivation for your problem

What to read for? A statement of the problem, why it’s important/significant.
How it relates to your paper? You will likely reference it in your introduction, and it may inform your evaluation
What to include in your summary?
- A sentence about the problem, its importance/significance

Category B: attempts to solve the same problem

What to read for? A deep understanding (see how to read a research paper)
How it relates to your paper? Likely a baseline, contingent on how well your problem settings align
What to include in your summary?
- A sentence that describes the method
- A sentence about whether experiments support this result (and why), as well as intuition on settings in which this method may not generalize
- Intuition about where you think the method shouldn’t work and
- You may additionally want to search briefly for a codebase and include link

Category C: attempts to solve a similar problem

What to read for? A contrastive sentence – a sentence that describes how the paper differs from your research problem or fails to solve your problem:
- Does the method rely on assumptions that don’t generalize to your setting?
- Does your problem have other criteria that the paper is not designed to meet?
How it relates to your paper? Likely a part of your related works, in which you will use the contrastive sentence to relate their paper to your’s
What to include in your summary? (you guessed it!) the contrastive sentence.

Category D: attempts to solve a subproblem

What to read for? Feasibility – are the assumptions required for this method applicable to your own problem setting?
How it relates to your paper? It could inspire part of your methods/solution, or lead you to other papers that inspire part of your methods/solution
How to summarize?
- Assumptions the paper makes
- Whether it could work in your setting, and why/why not

4. Organizing your literature list

Staying organized is key to ensure you don’t waste time re-reading or re-searching papers in the future. There are many tools that can help you stay organized:

Citation managers (e.g. Zotero, Mendeley, EndNote) to help you manage papers by saving them through a browser-extension, organize the papers into folders/categories, and automatically create BibTeX files for your paper.
Since citation managers don’t always export the correct publication info when exporting to BibTeX, you may additionally want to use the Google Scholar extension.
Documents to summarize the papers you read (e.g., a good old-fashioned Google Docs, HackMD, Overleaf, etc.)

Your peers, collaborators and mentors may all have additional great suggestions for how they stay organized, and we recommend you ask them!

So how should you summarize each paper? One approach you may consider is:

Problem statement (one sentence)
Relevance to my project from above (same problem, similar problem, or sub-problem)?
Brief description: usually one sentence of contributions, or an expanded explanation of how it relates to my paper.

But of course, experiment to see what works best for you.