Data Methods Discussion Site

collaboration
teaching
2018
This article lays out the rationale and overall design of a new discussion site about quantitative methods.
Author
Affiliation

Vanderbilt University
School of Medicine
Department of Biostatistics

Published

June 19, 2018

Background

I have learned more from Twitter than I ever thought possible, from those I follow and from my followers. Quick pointers to useful resources has been invaluable. I have also gotten involved in longer discussions. Some of those, particularly those related to design and interpretation of newly published studies (especially randomized clinical trials — RCTs), have gotten very involved and controversial. Twitter is not designed for in-depth discourse, and I soon lose track of the discussion and others’ previous points. This is particularly true if I’m away from a discussion for more than 24 hours. Also, some Twitter discussions would have been more civil had there been a moderator.

There are excellent discussion boards related to statistics, e.g. stats.stackexchange.com, medstats, and talkstats, and a variety of sites related to medical research (including clinical trials), epidemiology, and machine learning. An informal Twitter poll was conducted on 2018-04-26 - 2018-04-27, resulting in 242 responses from those in my Twitter sphere. Of those, 0.71 were in favor of creating a new site vs. 0.29 who wanted to solely use Twitter for discussions on the intended topics.

Plan

After much research, I’ve chosen discourse.org for the platform for a new discussion board. This will require putting up a server to host the site. Fortunately all the software needed (linux, ruby, discourse, etc.) is free. After the site is up and running, more moderators will be required. The site name will be datamethods.org. We hope to have it running in July 2018.

As the purpose of communication/collaboration between quantitative experts and clinical/translational researchers is a key function of the Biostatistics Epidemiology and Research Design (BERD) program of the national Clinical and Translational Science Award program of NIH/NCATS, the Vanderbilt BERD (Biostatistics Epidemiology and Research Design) program will support this discussion site under its CTSA funded VICTR center, and the national CTSA BERD consortium is likely to also be involved. This has the potention to bring dozens of experienced statisticians and epidemiologists to the table to assist clinical and translational investigators and research consumers with their study design, analysis, and interpretation questions.

discourse.org recognizes participation and helpfulness. A good example may be found here. The software also makes it very easy to find your place in a large number of discussions, and to upvoted answers to question.

Topics

The areas that will be emphasized in the new discussion site follow. Global emphasis is on fostering communication between quantitatively-skilled persons and researchers not specializing in the math side of things.

  • quantitative methods in general, including enhancing numeracy of those participants who are not into math or statistics
  • general statistical issues such as analysis of change scores and categorization of continuous variables
  • measurement issues
  • interpretation of published statistical analyses
  • statistical design of particular studies/clinical trials
  • statistical analysis issues in published biomedical and epidemiologic research papers
  • choosing optimal statistical graphics for presenting study results
  • discussing statistical models and machine learning for biomedical and epidemiologic problems

The site will be organized into the following major and minor categories, with lots of tags available to further distinguish and cross-reference topics. Your input about these categories is needed. Words in brackets are the topic names used within the web site, if different from the description.

  • statistics and data analysis [data analysis]
    • descriptive and exploratory [descriptive]
    • formal analysis [formal], including inference, confidence limits, statistical tests
    • uncertainty
    • missing data and measurement error [data problems]
    • models
    • modeling strategies (including model specification, nonlinearities, interactions and heterogeneity of treatment effect, avoiding categorization, how to sequence multiple steps) [modeling strategy]
    • variable selection
    • data reduction, clustering, unsupervised learning [data reduction]
    • accuracy and information measures [accuracy]
    • model validation and interpretation [validation]
    • bayes
    • machine learning
    • comparative methods performance [comparative methods]
    • causal inference
    • probability
    • inference and generalizability [generalization]
  • research methods
    • measurement
    • study and experimental design [design]
    • sample survey design [sample survey]
    • study interpretation
    • meta-analysis
    • research data management [data management]
  • treatment comparison
    • general issues and ethics [general]
    • design issues for a specific trial [design]
    • statistical interpretation of specific studies [interpret]
    • methods for comparison of nonrandomized treatments [observational] including propensity scores and instrumental variables approaches
  • computing
    • systems
    • tools
    • languages
    • web applications
    • databases
    • big data
  • graphics
    • static
    • dynamic
    • programming
  • education
    • teaching and learning methods [teaching learning]
    • statistical [stat]
    • math and numeracy [math numeracy]
    • scientific method & design [scientific method]
    • collaboration
    • knowledge dissemination [dissemination]
    • career development
  • topical areas
    • diagnosis
    • biomarker research
    • cardiology
    • cancer
    • psychology
    • nutrition
    • epidemiology
    • health policy
    • drug development
  • journal articles
    • journal club
    • reviews
  • news (courses, webcasts, meetings, etc.)
  • meta (discussion, pointers, etiquette about the site)

More major categories can be added as needed. As alluded to above, a number (by default limited to 5) of tags can be added to any post to allow easy search and cross-referencing beyond just using categories and subcategories. Once a user gains a certain “trust level” she can define new tags in the system for everyone. Initially defined tags are propensity and data-reduction. discourse makes it easy to construct URLs that anyone can share, that can bring up all posts in a certain category or containing a certain tag.

To discuss this proposal, post a tweet mentioning @f2harrell, or use the commenting facility on this post.

Role of Twitter

Informed by this, my opinion is that twitter is good at the following things:

  • short-term discussions that are not emotional
    • Short messages
    • Discussions needing a “memory” are best done using a modern discussion board platform
    • Those that are emotional need moderators to flag inappropriate content (including personal attacks) for review and fixing, or to delete content
  • pointers to other resources including longer discussions on discussion boards
  • requests for interested persons to join the longer discussions elsewhere
  • quick polls
  • news alerts

Twitter is time-oriented, but discussion boards are topic-oriented (then time-ordered within topic/subtopic). Discussion boards such as discourse.org have a permanent memory and do not have any real message length limits.

Discourse Information

Ways to Create an Account

  • Link to your Google, Twitter, or Github account
  • Soon to be added: authentication by Facebook and Yahoo

Discussion Archive

The following comments were added too the original blog platform for this article in 2018.


Drew Levy: “We never think alone.” An initiative such as this could have significant ramifications for both individual ignorance and collective wisdom. Our collective wisdom is fostered in the ways we create intelligence using the community around us. [Language borrowed from Steven Pinker.]

How can I help?

Yadev: Hi prof Harrell. I am a surgeon with interest in research and biostatistics. Thanks a lot for your venture.I am your follower on Twitter I would like to know how to access this .

Sreenath Madathil: Dear Prof. Harrell, Thank you very much. I really enjoyed Stan-Discourse site and helped a lot for my thesis. As the discourse website says this would be a real escape from the email silos. Will join definitely.

David C. Norris, MD: Thank you for your leadership and community-building, Frank! I can already think of one discussion I would like to start in your forum, around some analyses I’m pursuing in NINDS data. Might Ethics deserve a category of its own, even if it necessarily pervades all of the others (and indeed statistical practice as a whole)?

Frank Harrell: Thanks for saying that David. That’s a great question. Since this is going to be data-oriented site, I’m not sure if ethics should be its own category. On the other hand, ethics pervades other categories especially related to human subjects research. So I’m torn. Would love to hear more opinions.

Duncan Lowes: Dear Prof Harrell. I am very interested in this proposal and will join. I hope to be able to contribute where I can.

Frank Harrell: Thanks Duncan. The best thing that you and others can do right now is to think of the major categories we should use, and the rules of etiquette.