I refreshed my memory on R, using a free workbook on the DataCamp. I visited many tutorial websites, attended a couple of workshops, and took few MOOC courses. I would say the training session on the DataCamp is far better.
Main obstacles that deterred me from learning R
1. Language
Trainers/lecturers use R jargon excessively, without explaining what or making relevance to other software packages. Don't "pipe" me if you have not described to me that is %>%, passing the object to the following line of command.
2. Version control
That applies to online courses. Their version of R is old. Some exercises were based on their version, while our time has moved on, and so has R & RStudio. Despite me doing the exercise correctly, the old system of R/RStudio on the MOOCs does not accept it and does not let me progress.
3. Tuition
I often think statisticians are in their world, communicating with others in the context of formulas. Most of the time, they do not have to think further when a problem is solved mathematically. For example, if an R exercise says 'Add a variable to show how much time was made up/lost during the flight', I bet many new R learners get confused (or maybe just me?) about what this instruction wants us to do.
Yes, I can tell that I need to add a variable by creating an object followed by <-. And then what?
Maybe mutate? Oh, don't forget to call an original data frame and piped before mutating. Then, sort the result by 'arrange' it? But, what are variables to create what I am supposed to do?
A little more instruction with sequences will be helpful. That is always my mind when I create a Stata practicum for students. Explain why we are doing certain exercises and how we might accomplish it by referring back to key concepts!
What DataCamp did?
DataCamp may not be 100% foolproof, but I would say skip the intro and intermediate courses if you are only interested in running statistical analyses using R. After going through with Data Wrangling (sorting, manipulating, filtering) exercises using the package "tidyverse", I finally saw the light. I can use R!
My suggestion to teaching R to new learners
1. Explain the importance of tidyverse to RStudio + how to upload the package to the RStudio
I will not give detailed instructions here because there are tons of videos/written instructions on how to do it, and it is straightforward. But I emphasise all instructors to encourage learners. Many learners who want to take up R/RStudio are inspired to learn and have experience in data analyses.
Show them they have nothing to fear, given that they installed tidyverse and uploaded it right after opening up RStudio each time. Yes, each time. And do not forget to mention that they need to update the package periodically.
2. We do not need to memorise a command to upload non-R datasets. R studio has a menu "Import Data".
The good thing about R Studio is a friendly version of R. It is self-explanatory, and you can just click "Import". Your data will be there.
3. Don't riddle with jargons
R has their own languages that can make more sense to programmers such as, data frame, function, list, vectors. I say I don't spend hours after hours on programming basics.
Show fundamental procedures on 'analyses",, i.e. Data wrangling along with what people should do/not do, like the execution of the line of command will end with a hard return.
4. Make references to other software packages like SPSS and Stata
Believe me, making reference to what we know already is the best way to remember and apply.
But why do we need to learn R? Isn't it the same difference?
What I am amazed at this time is R Markdown. It is a powerful function to make a Word/HTML/Powerpoint document. Stata has a Markdown function, too, to generate a table and report. In my previous entries on Stata, I also mentioned the command putdocx to create a seamless report on Word too. But, RMarkdown can do this along with analysing the data.
In my opinion, R has more graphical capabilities. It might be well hidden in Stata, but R can do 3D visuals with heatmaps, so it is easier to see interaction effects in multiple linear/logistic regression.
Being free is the main factor, but R/RStudio is far faster in action than any main statistical programmes. With the paradigm of data science moving towards Big data and Machine Learning, sticking to Stata does *NOT* do any favours to researchers anymore. And more importantly, because of the cost, there are more R users than any other programme users combined.
It means more analytical packages available to do the job. Yes, analytical capabilities.
So if you are Early/Mid Career Quantitative Researchers who still have more to do, I highly recommend you covert to R.
And some are even moving to Python! Our research paradigm is moving very fast, folks.
Comments
Post a Comment