Activist-driven transgender research methods are reckless and will lead to harms

The current field of transgender medicine is like a reckless, “Wild West” free-for-all in which activist clinicians run small, terribly biased observational studies and then “spin” narratives that seem to “confirm” benefit. Based on these studies, other researchers go through the motions of conducting systematic reviews and developing evidence-based guidelines. The motions they go through, however, are only a masquerade of evidence-free smoke and mirrors, just for show. These practices will undoubtedly lead to harms.

In the past decade there has been a sharp increase in the numbers of people presenting to care with gender dysphoria [1-4]. The growth has been especially marked in adolescents and young adults [1-4]. Reports from clinics around the world have also noted an inversion of the expected sex ratio, and gender dysphoria is now far more commonly seen in young female patients than it is in young males [1-5]. There is evidence that most gender dysphoria in young people may be part of a social contagion [6].

The treatment model currently in vogue among clinicians who care for patients with gender dysphoria, including child patients, is called affirmative care [7-11]. In the “affirmative” regime, no-one is allowed to question whether any human being was ever born with an innate, opposite-sex “gender identity,” and of course there is no scientific evidence that anyone ever was. If a female patient says she “identifies as a man,” she must in fact be a man. Doctors then “affirm” such patients into a treatment regimen that normally includes a lifetime taking opposite-sex hormones, as well as receiving major surgeries [12]. The slow and careful “transsexual” gatekeeping process of previous decades, including a year or two spent “living as” a member of the opposite sex, before any hormones or surgery were offered, is long gone [12-13]. In the United States, patients may be prescribed opposite-sex hormones at their first clinical visit [14]. In Los Angeles, girls (who believe they are boys) as young as age 13 are having their healthy breasts amputated [15].

Given the adoption of “affirmative care” practices by many physicians, psychologists, and professional medical societies [8] – practices that confer “patient for life” status on healthy young people – one would expect there to be rather strong evidence of benefit for these drastic interventions, as well as for harms if they are not offered. This is not the case at all.

The transgender intervention literature is wholly observational and almost entirely without controls. It is a miasma of selection bias, unmeasured confounding and missing data. Many transgender research studies in the past few decades have been conducted by activist researchers who seem deeply committed to “proving” the benefits of transgender interventions, no matter how speculative or tendentious the research question. As I will show later in this Commentary, there may be substantial “spin” in reports of such studies, portraying study methods and outcomes favorably and minimizing (or not reporting) adverse events and harms. Convenience sample data are used to make solemn pronouncements about suicide risk [16]. In long-term follow-up of patients after “sex reassignment” surgery, it is common to find that one-quarter, one-third, one-half or even larger proportions of patients have simply disappeared, with investigators failing to account adequately for them [17-18].  There is good reason to be highly skeptical of the reported benefits of any transgender intervention. One cannot draw firm conclusions from this evidence, except to conclude that it is abysmally poor evidence. The reasons why investigators do their work so badly are obscure. This laissez-faire attitude also seems to have trickled down into secondary analyses of transgender research.

For example, in 2017, anonymous authors at Cornell University produced a document titled “What does the scholarly research say about the effect of gender transition on transgender well-being?” [19]. This document purports to be a “systematic literature review,” but is not one. It pretends to use a rigorous systematic review process to create the impression that transgender interventions are safe and effective. Because anecdotal reports suggest that many people who do not know better are taking the Cornell review’s spurious “findings” at face value, I evaluated the methods and reporting of that document.


I conduct my analysis of the Cornell document with two instruments commonly used to critically appraise systematic reviews. “A Measurement Tool to Assess Systematic Reviews” version 2 (AMSTAR 2) is a 16-item checklist used to assess whether a systematic review’s methods are unbiased, comprehensive and indeed systematic [20]. The “Preferred Reporting Items for Systematic Reviews and Meta-Analyses” (PRISMA) checklist includes 27 items and is designed to improve consistency and transparency in the reporting of systematic review methods [21]. The instruments are complementary to one another.

While the purpose of AMSTAR 2 is to assess the quality and rigor of a systematic review’s methods, the PRISMA checklist is designed to help systematic review authors to report their methods and findings consistently and transparently. It is also intended for the use of others in evaluating whether a given systematic review reports its methods consistently and transparently. Many peer-reviewed scientific journals now require that submitted systematic reviews be accompanied be a completed PRISMA checklist, with all items noted as having been done.

Following my assessment with these instruments, I enumerate the Cornell review’s other serious problems in a narrative discussion. I then discuss another attempt by clinicians of transgender research to “hijack” evidence-based medicine [22] by merely pretending to follow evidence-based methods. Finally, using an example from the recent transgender literature, I show the biased methods used and spurious outcomes reported by activist investigators of primary studies.


AMSTAR 2: The Cornell document fared poorly under examination with the AMSTAR 2 instrument. All questions answered with “No” or “Not reported” would optimally have been answered with “Yes.” This review’s methods appear to have been grossly inadequate. Please see the Appendix for details.

PRISMA checklist: The authors of the Cornell review failed to meet nearly every criterion of the PRISMA checklist. All items denoted as “Not done” would optimally have been answered with “Done.” Reporting of this review’s methods and findings was very sloppy. Indeed, the review could hardly have been reported with less rigor. Please see the Appendix for details.


I evaluated the Cornell review in terms of its methods and its reporting of these methods. The review fared very badly on both accounts. From the perspective of evidence-based medicine and scientific rigor, the review failed completely in meeting even the most meagre standards.

I have seen many bad systematic reviews published, but none have been so far off the mark as this Cornell review. Perhaps it is not surprising that this review’s authors chose to remain anonymous. Had their names been associated with this document, it could have had a negative impact on their careers. What is surprising is that despite there being plenty of guidance in systematic review methods available online, for free, with workshops even offered at their own university, these authors chose simply to “go through the motions.” They even mentioned PRISMA at the beginning of their “Methods” section, but then conducted and reported their review in an irresponsible ad hoc manner.

Although PRISMA is intended to guide reporting of systematic reviews, and not serve as a “roadmap” to conducting them, the authors had the PRISMA guidance in hand. Surely, they must have seen the long list of things they were failing to do. One or two more clicks on the internet, and they might have found some proper guidance in methods for conducting an adequate systematic review. Instead, these authors cloaked themselves in PRISMA’s name and then did as they pleased.

The Cornell review’s research question was vague and very poorly formulated, almost as though the authors were looking into a crystal ball when they developed it: “What does the scholarly research say about the effect of gender transition on transgender well-being?” The inclusion criteria for their review were so permissive that eligible reports could be on any topic relevant to transgender “transition,” or as I would define it, the medicalized performance of opposite-sex stereotypes. Studies of any design were eligible, even those without comparator conditions and qualitative studies. They need only to have reflected some intervention recommended by the World Professional Association for Transgender Health (WPATH) [23]. Reported outcomes could be any expression of current feelings, self-reported quality of life, relationship satisfaction and other measures – and the authors explicitly do not require a minimum follow-up period. The authors’ choice of “quality of life” or “well-being” as key outcomes meant that they would likely have excluded studies reporting depression, suicide attempts and completed suicide, but that did not formally or informally assess “quality of life” or “well-being” as such.

Considering the poor methodological quality of the transgender literature – and especially considering that they were, after all, trying to conduct a systematic review – it is bizarre that the authors did not consider it necessary to assess the risk of bias in each study.

Did these authors ever intend to be objective in their work? I doubt it. They did not even pay attention to what they were doing.  We can see this in the “objectives” of the Cornell authors’ review, which may have been written by a computer, or by someone who cared very little about communicating clearly and transparently with the review’s readers.

Our objective was to aggregate scholarship that adds in some way to the world’s knowledge about the policy issue in question. Adding to knowledge does not necessarily mean drawing new conclusions but can include strengthening existing knowledge by corroborating what prior studies have shown. Our purpose is not to pick and choose research that endorses a particular policy view but to include the broadest reasonable range of relevant scholarship so that users may both obtain an overview of the present state of scholarly knowledge on topics that are currently matters of public debate, and further examine that research directly if desired. We recognize that the peer-review process is imperfect but we operate on the principle that it represents the best method we have for holding research accountable to both good faith and sound methodologies. [19]

It is rare to see such stilted, uncomfortable, meaningless writing. It is like writing that somehow materialized at a séance.

The next passage mentions a “strict set of criteria for selecting studies based on credibility, relevance and usefulness,” but these criteria are nowhere reported. The passage then seems to ramble on incoherently, and it appears that at least some of the passage was copied and pasted from an unrelated project’s document.

The Cornell authors do not report conducting any sort of analysis. They do not bother to assess bias risk. They do not even put together a table with study characteristics, much less a list of excluded studies, with reasons for exclusion. All outcomes reported favorably in their included studies are affirmed without question as solid scientific evidence.

These are the first four “findings” of the Cornell document, out of eight in total:

  1. The scholarly literature makes clear that gender transition is effective in treating gender dysphoria and can significantly improve the well-being of transgender individuals.
  2. Among the positive outcomes of gender transition and related medical treatments for transgender individuals are improved quality of life, greater relationship satisfaction, higher self-esteem and confidence, and reductions in anxiety, depression, suicidality, and substance use.
  3. The positive impact of gender transition on transgender well-being has grown considerably in recent years, as both surgical techniques and social support have improved.
  4. Regrets following gender transition are extremely rare and have become even rarer as both surgical techniques and social support have improved. Pooling data from numerous studies demonstrates a regret rate ranging from .3 percent to 3.8 percent. Regrets are most likely to result from a lack of social support after transition or poor surgical outcomes using older techniques.

These “findings” are absurdly optimistic and grossly exceed the limits of what the evidence shows. Their statements characterizing this evidence are made without the slightest caution. The authors pretend to be certain of the benefit in all outcomes of transgender interventions, when in fact these outcomes are highly uncertain. Written in a quasi-authoritative tone, the “findings” seem designed to encourage people in the ruminative phase of gender dysphoria to justify their transgender “transition.” People who have no means to assess the value of this evidence are likely to be led astray by the review’s enthusiastic statements.

In view of the uncertain outcomes, the authors missed their chance to do something useful. One excellent thing about rigorously-conducted systematic reviews on topics in which the methodologic quality of primary literature is very poor, with large gaps in what is known about long-term outcomes, is that the review can specifically point out these deficiencies and evidence gaps, both to suggest caution to users of systematic reviews and to guide the planning and conduct of future primary studies. Transgender research could only have benefited from this.

Faking the GRADE.  “Going through the motions” is apparently not rare in analyses by activist researchers of transgenderism. The University of California, San Francisco (UCSF) is home to the Center of Excellence for Transgender Health. In 2016, this entity published a document titled “Guidelines for the Primary and Gender-Affirming Care of Transgender and Gender Nonbinary People” [24]. The document was edited by Dr. Madeline Deutsch, an associate professor of family and community medicine, who also identifies as transgender. In the first chapter, “Grading the evidence,” the authors claim they will use the GRADE methodology to develop their guideline’s recommendations [25]. GRADE is the “global standard” methodology used to assess the certainty of evidence from rigorous, up-to-date systematic reviews, and based on that evidence, to develop health recommendations [26-28].

The UCSF authors do cite an appropriate article to provide general information about GRADE [29]. As the authors describe their process, however, it becomes clear that they have done no systematic reviews and are completely unfamiliar with GRADE.

Selected recommendations in these Guidelines have been graded using adaptation of some components of the GRADE scoring system, with the addition of two additional domains to describe details of the research which underlies the recommendation, as well as the population(s) in which such research was conducted. Each graded recommendation will include mention of the population(s) in which research was conducted (transgender (T), non-transgender (NT), or both (T/NT) (Table 1); an indication of, among all sources informing that particular recommendation, the strongest form of underlying evidence (meta-analyses, randomized trials, observational studies, expert opinion) (Table 2). Lastly, an overall grading of the strength of recommendation is made (strong, moderate, weak) which is based on the above criteria as well as strength of the consensus recommendation as determined by expert opinion interpretation of available data (Table 3.). [25]

The methods proposed by the UCSF authors, a word-salad of empty terms, have nothing at all to do with GRADE. It was just name-dropping. Did they really think no-one would notice? This blather might fool readers who themselves are unfamiliar with GRADE but could not possibly fool anyone who has actually used these methods.

The guidance in the UCSF document is astonishing. In a chapter called “Initiating hormone therapy,” written by Deutsch, we find that almost anyone, even a naturopathic provider (!) is eligible to start a patient on a lifelong drug regimen, one with significant risk of harm [30-33]:

Prescribing gender affirming hormones is well within the scope of a range of medical providers, including primary care physicians, obstetricians-gynecologists, and endocrinologists, advanced practice nurses, and physician assistants. Depending on the practice setting and jurisdiction, other providers with prescriptive rights (naturopathic providers, nurse midwives) may also be appropriate to prescribe and manage this care. [14]

A terrible cohort. Let us examine the kinds of problems that systematic reviewers who are serious about their work may find in “gender-affirming” primary studies.

A 2018 paper by Olson-Kennedy and colleagues [15] reports on a cohort of 68 adolescent and young adult women, self-diagnosed as transgender “men,” whom Olson-Kennedy had referred to surgeons for bilateral radical mastectomy (“top surgery”).

olson-1Thirty-one (46%) young women were age ≤ 17 years; 16 (24%) were age ≤ 15 years. Two were 13 years old. The outcome of “regret” was assessed in a follow-up period of “less than one year” to five years, with outcomes for 59 (87%) women assessed at one year or less. Olson-Kennedy and colleagues report this in a table but do not otherwise mention it or seem to think that this might not be appropriate. Twenty-eight (41%) young women had only begun their medicalized transgender experience very recently, starting a testosterone regimen within the preceding two years. Six (9%) young women had been on testosterone for less than six months; at least one young woman began taking testosterone less than one month prior to surgery. Investigators did not obtain data from 26 (28%) patients lost to follow-up, a proportion that in most areas of clinical care would be considered unacceptable.

Most patients suffered at least one post-surgical complication. These included temporary loss of nipple sensation (59%); loss of sensation in other areas (41%); long-term loss of nipple sensation (32%); excessive scarring (15%); postoperative hematoma (10%); complications from anesthesia (7%); and other complications. Notwithstanding these problems, Olson-Kennedy and colleagues sought to minimize them by saying that “[S]erious complications were rare” [15].

Although some patients feel regret very soon after transgender surgery, it commonly takes several years, often 10 years or more, for patients to realize they have made a mistake [18, 33]. It was thus far too soon to obtain a meaningful estimate of “regret” from these patients. There was one acknowledged case of regret at that early follow-up point. Even so, Olson-Kennedy and colleagues spin their results, declaring in the article’s abstract that there was “close to zero” regret among the patients [15]. Given the irreversible nature of this drastic, experimental surgery in healthy young women, the study’s cross-sectional design, the very premature data collection, the high complication rate and the large proportion of missing data, Olson-Kennedy and colleagues then make a completely unwarranted recommendation. Based on their study’s results, they say, “changes in clinical practice and in insurance plans’ requirements for youth with gender dysphoria who are seeking surgery seem essential. Youth should be referred for chest surgery based on their individual needs, rather than their age or time spent taking medication” [15].

I give this example to illustrate the fact that while rigorous systematic reviews look carefully at every detail of included studies, noting several types of epidemiologic bias, missing data, adverse events, conflicts of interest, spin, and things that just don’t make sense, the Cornell authors seem automatically to have believed the conclusions of every study reporting positive outcomes. Studies reporting negative outcomes, or no effect were not taken into account.

This is a potentially dangerous approach. Policy makers and clinicians often take systematic reviews quite seriously, but most lack the skills or the time to discern whether a systematic review is done well or poorly, nor to know whether a review’s overly favorable “findings” should be trusted. It is a similar situation with health guidelines.


Activist-driven transgender research methods are incompetent and reckless. Anonymous Cornell University authors did very poorly in conducting a systematic review. “Findings” of this document should be ignored. Similarly, UCSF’s transgender guidelines were developed using spurious, ad hoc methods. Both institutions should strongly consider removing these documents from their web sites to prevent potential patient harms that may accrue if individuals, clinicians and policy-makers were to take their “findings” and “recommendations” at face value.


