random, infrequent posts about R and programming.

mediocre economist.
aspiring data scientist.
ultimate player.

I created a new, small package called xmltools that helps simplify the process of converting XML data into tidy data frames. It has not yet been tested on a ton of XML files so it may have some bugs. I also have not created any tests. But, at least for me, it helps drastically cut down on the code I have to write to get the data I want from an XML file.

Rstudio’s Mine Cetinkaya-Rundel had a post about the highcharter package, a wrapper for the Highcharts javascripts library that lets you create super sweet interactive charts in R. Joshua Kunst’s highcharter package has become my go-to plotting package once I reach the production phase and know I will be using HTML.

This post is just a copy of README.md file for the repo https://github.com/ultinomics/rmarkdown2docx. But it’s got everything you need to get your R Markdown file (.Rmd) to a clean, useful MS Word file (.docx). Description This set of scripts help convert the output of Rmd files to docx files. It is done by creating a clean html file, then opening, converting, and saving the html to docx using Applescript and Microsoft Word.

Often, one gets a PDF file that is a scan of a book or text, which cannot be searched (boo!). A good (but not perfect) solution is to use Optical Character Recognition (OCR) to convert the pdf to a txt file and search that instead. Here is my solution. Requirements Command line tools convert tesseract I installed both using homebrew. I’m using Mac OS X 10.