Podcast: Study Design

Podcast: Study Design


The purpose of this podcast is to define
some basic epidemiologic concepts and the study designs that you will most
often come across and need to understand as you read clinical research studies. By
the end of this podcast, you should understand two major study designs: the
cohort study and the case control study, why case-control study can be thought
of as nested within a cohort study, why the case control design is useful, why
the commonly used descriptions prospective and retrospective are better
avoided. As this podcast progresses, we expect you to have a basic understanding
of the difference between incidence and prevalence and the difference between a
cross-sectional study and a longitudinal study. For the sake of time, we will not
discuss these definitions now. If you are unsure of these terms, please review the
short podcast available on the same webpage you found this podcast. What
usually interests us in medical research is some version of the question, ‘Does
factor X cause condition Y?’ X could be a drug, an environmental factor, an
exposure to some kind of agent, a diagnostic test, an exercise program,
whatever. Y could be the development of a disease, the cure of a disease, the
prevention of a condition, the diagnosis of some disease, etc. Cause is therefore a
loose term that stands for X having some effect on Y. In all cases, to answer the
question one needs persons both with and without X and with and without Y. If we
have only a group of people with disease Y, for example, we can find out how many
have factor X, but we cannot know whether X caused Y. Why? Because it could be the
factor X exists in the general population in the same proportion as in
our subjects with disease Y, in which case it would be unlikely that X caused
Y. To show that X causes Y, we would want to see that more persons with X develop
why then do persons without X. So to determine causation, we need to have some kind of a comparison group. The way in which we define our study population, including our comparison group, is a matter of study
design. Note that we will often refer to acts as exposure, but that this can refer
to an intervention such as a drug or procedure. Likewise we will usually refer
to Y as disease, but this can also refer to an outcome of some kind as well, Since
we have two major factors: exposure and disease, we have two major study designs
based on whether we find subjects for the study based on exposure or disease.
When our study population, including our comparison group, is defined by exposure
and non-exposure and we then determine whether the subjects develop the disease
over the follow-up time, the study is a cohort study. When the study population
is defined by disease – that is we find persons who have the disease and persons
who don’t have the disease – and we then determine the exposure for each subject,
then the study is a case-control study. Cohort studies come in two forms. When
the exposure is assigned to a subject by the study investigators, it is a clinical
trial. When the exposure or intervention is determined by the subject, or the
subject’s doctor, or the environment, or circumstance, but not the study
investigators, then the study is just called a cohort study. In a cohort study,
we start with a group of subjects who are almost always initially free of the
disease or outcomes that we are interested in. In this case, let’s say
that our outcome of interest is a GI bleed. This cohort is free of persons who
have had a previous GI bleed. At the beginning of the study, we usually know
whether the subjects are exposed or unexposed. In this example, perhaps we
have a group of aspirin users – the exposure – and are comparing them to a
group of non-aspirin users. We then follow the subjects over time and
observe whether or not they develop a GI bleed. We can then determine whether
subjects who use aspirin develop a GI bleed more often than subjects who do
not use aspirin. We will talk about how we do this in a later podcast. For now,
the important thing to note is that the study population is defined by exposure –
that is whether the subjects are exposed or unexposed; whether they use aspirin or not. Again we are choosing our study population based
on exposure status, outcome – whether or not they develop a GI bleed – is assessed
afterwards. This is the essence of a cohort study. Cohort studies come in two
varieties. When the investigator assigns an exposure or intervention to each
subject in this case, whether or not the subject uses aspirin or not, the study is
a controlled trial. In this design, subject preference or any other factor
related to the subject plays no role in whether the subject is exposed or
unexposed. It therefore simulates a laboratory study where the investigator
changes one variable while keeping all other variables constant and then
assesses the outcome. In an observational cohort study, which is usually just
called a cohort study, the investigator only observes how exposure is
distributed among population. Exposure can therefore depend upon many factors
including subject preference, physician preference, underlying disease, age, sex,
and many other variables. For example, a patient and her physician might consider
the patient’s age, a history of heart disease, her renal function, and her
personal preference before deciding to start her on daily aspirin. All these
could be related both to why she is starting aspirin, and why she is at risk
of a GI bleed. Factors like these may make a cohort study much more difficult
to analyze than a controlled trial. Ideally trials are not just controlled, but also
randomized. Remember that under ideal circumstances, like in a laboratory
experiment, we would like only the exposure or intervention to differ
between the two groups we are studying. Here we only want the use of aspirin to
be different between the two groups. This way, we can tease out the effect of
aspirin on GI bleeding without having to worry about all the other factors that
could influence a patient to take aspirin. For instance, our two groups
should have the same distributions of age, gender, and number of people with H.
pylori infection because all these can be related to developing a GI bleed. We
could try to manually assign each subject to one of the two groups, trying
to make sure that each of these other factors was equally distributed between
the two groups, but there are two big problems with this. One, it’s very time
intensive. Two, what if some other factor that you haven’t thought about is also
important. Maybe this factor – say alcohol abuse – is actually causing the outcome
rather than the intervention you are interested in. Randomization ensures that
all these confounding factors, both measured and unmeasured, are distributed
evenly between the two exposure groups. So rather than having to manually assign
subjects to one of the two groups, we let chance take care of it for us. Obviously
the more people we randomize, the more likely it is that all these confounders
will be equally distributed between the two groups. When small numbers of
subjects are randomized, it is more likely that there will be differences between
the two groups, due to the vagaries of chance alone. We’ve learned that if the
study groups are defined by exposure, then the study is a cohort study. If the
groups are defined by disease or outcome status, the study is a case control study.
We choose our study population based on disease and then go back and determine
exposure. In our example, we would first choose subjects based on whether they
had or did not have the GI bleed and then go back and determine whether or
not they took aspirin. The best way to think of a case control study is as a
sample that is nested within a cohort study. Here we start with the same study
population we’ve already seen in the cohort study. In the corresponding case
control study, we first find all the subjects who had a GI bleed. These are
the cases. If we then determine their exposure status, we will find the 10 on top
have taken aspirin and the 5 on the bottom have not. However we know that we need to have a comparison group of subjects who have not had a GI bleed to
answer a cause-and-effect question. In a case control study, our comparison
subjects are called controls. Here we found the same number of controls
persons without a GI bleed as cases. You can also see that the same numbers of
cases and controls used aspirin. Therefore aspirin must not have had any
effect on the development of GI bleeding in this example. Often the hardest part
of a case control study is deciding who to select as controls.
Because we want our control group to be similar to our case group, we want our
controls to come from the same underlying population as the cases. So
the best group to get our controls from is the same underlying group that we got
our cases from. Here that would be the same group that would have formed our
cohort, if we had done a cohort study. The preferred method of finding controls is
to choose them randomly from the underlying cohort without regard to
whether or not they ever develop disease. We’ll see why this is best in the next
podcast on effect measures. But the other and more common way to choose controls
is to choose them from subjects without disease. This is shown here. Our controls
in this case are selected from subjects in the underlying cohort who never
developed a GI bleed. You might notice one problem with this. If we choose
controls from only the non diseased, either aspirin users or non aspirin
users are likely to be over-represented. Here you can see that there is a higher
proportion of non aspirin users to aspirin users than in the original
cohort where they were equally represented. If we take a random sample
from this group, who never developed a GI bleed we are more likely to select
non-aspirin using controls. This method of selecting controls the very common is
less preferable than the first method. Put another way, the purpose of the
control group in a case control study is to determine how exposure is distributed
in the underlying population that gave rise to the cases. This is the exposure
distribution we want to compare in cases and controls. Yet another way to say this
is that control should represent the whole underlying source population, not
just the non diseased. By now, you may have gotten the idea that a case control
study can be more difficult to design and implement than a cohort study. So why
would anyone bother to do a case control study in the first place? Why not only
design cohort studies? What if, instead of having only 15 cases among 40 subjects,
we had only 15 cases among this many subjects? Here the disease is rare in the
case of a rare disease like scleroderma or a glycogen storage disease, we would
need to follow up a very large cohort of patients over a long period of
time to see just a few cases of disease develop. This would be a very time
consuming and costly cohort study, both in terms of money and manpower. On the
other hand with a case control study, the investigator can first find a sufficient
number of cases, then identify a much smaller number of randomly selected
controls and then compare exposures in this more limited group. This is likely
to be much more efficient in terms of time, money, and manpower. Finally a short
comment on so-called retrospective and prospective studies. Traditionally cohort
studies are thought of as being prospective, and case control studies as
retrospective. This is because cohort studies were traditionally followed over
time,collecting data all the while, while case control studies were not. They
required investigators to actually question cases and controls about their
exposures. Because collecting data in this way was subject to recall bias,
something we’ll talk about in a future podcast, these studies results were less
trustworthy than those from cohort studies. Hence prospective studies were
thought to be better than retrospective studies. There’s an important point here
though. The key is how data are collected. If the data are collected in real time,
then the study is prospective. If one has to go back to try to collect data, it’s
retrospective. So if a case control study is done in a cohort of for example,
Medicare users – a group who have already had data about their treatments
collected – and exposure is assessed using drug prescriptions, for say aspirin, is
the study retrospective or prospective? We would argue that this is a
prospective study. The data, in this case prescriptions for aspirin, are collected
in real time. The subjects did not have to think back to remember their drug use.
The point to remember is that just because the study is a case control
study, it is not necessarily a retrospective study. Data could have been
collected prospectively. Because of this potential confusion, we prefer to avoid
these terms altogether. We prefer to identify whether the study is a cohort
or case control study and then determine how data was collected. This way readers
can for themselves whether there is
potential for bias in how the data was collected. So now let’s do some examples.
In your manila folders are three studies. Let’s first look at the one by Solomon
and colleagues entitled “Cardiovascular outcomes in new users of coxibs and
non-steroidal anti-inflammatory drugs”. As stated in the abstract, the purpose of
this study was to examine in a large group of new users of coxibs and
NSAIDs, the rate of cardiovascular events, their time course, and whether baseline
cardiovascular risk modified the rate ratios for future events. The purpose is
restated again in the introduction, just before the methods section, which is
normally where we would find the study objectives. Identifying the study design
in a study is usually straightforward. There is usually a statement early in
the methods section that identifies it. Here, it is in the second paragraph and
clearly identified by the subheading study design. This is a cohort study. In
this paper, the authors identified this back in the abstract, as well under the
methods section there. However not all papers would give such a straightforward
statement of design, so it is wise to get into the habit of looking deeper to find
what the authors actually did. In the very next sentence of this paper, the
authors go on to say, “In the primary analysis, these groups – that is those who
used either NSAIDS or coxibs – were compared with subjects who did not use
one of these agents, but who did initiate use of unrelated agents for the
treatment of hypothyroidism or glaucoma. This sentence makes it clear that they
are identifying their sample group by exposure. The participants either used NAIDS or Coxibs, the exposed groups, or they used some other unrelated drug, the
unexposed group. By looking a little deeper like this, we also realize that
there are in a sense two parallel studies going on here. One in which the exposure
is NSAID use and the comparison is some other drug, and another in which the
exposure is coxib use and the comparison is some other drug, but NSAIDS are not
being directly compared to coxibs. Note that the authors of this study described
it as a longitudinal cohort study, but do not describe it as a prospective or
retrospective study. Which do you think it is?
The authors are using a patient care database in which data has been
collected in real time. The authors did not go back to the subjects and ask them
about their past NSAID exposures. In this sense, this is clearly a prospective
study. However this database was not developed for research. No one sat down
before the data was collected and tried to determine what information needed to
be collected for the purpose of the study, and the authors are clearly
standing at the end of this study and are looking back in time. So in this
sense this could be thought of as a retrospective study. However we would
argue that this is in fact a prospective study, but even better we would urge you
to avoid these terms completely and ignore them when reading a paper. It is
much more informative and important to examine the details of the study to see
what the authors have actually done and to try to find features of the design
whatever they are that could bias the results. The second study will look at is
teriparatide or alendronate in glucocorticoid-induced osteoporosis by
Saag and colleagues. The purpose of this study can once again be found in both
the abstract and in the last paragraph of the introduction. It is to compare
alendronate, a bisphosphonate used to treat osteoporosis, with teriparatide, an
analog of PTH, for the treatment of osteoporosis associated with
glucocorticoid use. As in the previous study, the design is located right at the
beginning of the methods section. It says “in this randomized double-blind clinical
trial”. Again, if we look a little deeper in the same paragraph, we will find the
study sample. In this case, exposure to alendronate or teriparatide. Note that in
this cohort study, unlike in the first, subjects are assigned to exposure to one
drug or the other. Finally, locate “Association of chronic inflammation not
its treatment with increased lymphoma risk in rheumatoid arthritis” by Baecklund
and colleagues. The purpose of this study is again stated in both the abstract and
in the last paragraph of the introduction. The author’s note that RA
is associated with an elevated risk of malignant lymphoma and that the goal was
to investigate which patients are at highest risk of malignant lymphoma and
specifically whether this risk is driven by
level of inflammatory activity. The study design is identified in the abstract.
This is a matched case-control study. We’re going to ignore the matching for
now and simply observe that it is identified as a case control study. Note
that this time, the authors did not explicitly state that this is a case
control study in the methods section. However, they do in the last paragraph of
the introduction, and it becomes evident as they describe the study. Note that in
this paper, the authors first identify the underlined cohort of subjects that
they have chosen the cases and controls from. A group of patients are identified from
a Swedish national inpatient register all diagnosed with RA. So they’ve told us
exactly what the underlying cohort is from which they’re going to select cases
and controls. They then go on to describe cases as all patients from this cohort
who were diagnosed with lymphoma as identified in a linked database of
Swedish cancer patients. Note that the cases are clearly identified on the
basis of their disease status. For each case, the authors then randomly selected
a control from the same RA cohort describing how controls were matched
cases. Again no mention is made of exposure and selected controls, so even
if they had not been explicit in identifying the study design, the authors
are clearly describing a case control study. In this podcast, we’ve discussed the
number of important concepts. We’ve discussed the two major study designs:
the cohort study and the case control study. We’ve observed that a controlled
clinical trial is simply a form of cohort study. We’ve also discussed how
case control studies can be thought of as being nested within an underlying
cohort study. We also understand that case control studies are used because they’re
often more efficient than cohort studies. Finally, we discussed why the terms
prospective and retrospective are somewhat arbitrary, potentially confusing,
and better avoided. This concludes this podcast.

Leave a Reply

Your email address will not be published. Required fields are marked *