I created a new, small package called xmltools that helps simplify the process of converting XML data into tidy data frames. It has not yet been tested on a ton of XML files so it may have some bugs. I also have not created any tests. But, at least for me, it helps drastically cut down on the code I have to write to get the data I want from an XML file.
This post is just a copy of README.md file for the repo https://github.com/ultinomics/rmarkdown2docx. But it’s got everything you need to get your R Markdown file (.Rmd) to a clean, useful MS Word file (.docx). Description This set of scripts help convert the output of Rmd files to docx files. It is done by creating a clean html file, then opening, converting, and saving the html to docx using Applescript and Microsoft Word.
Often, one gets a PDF file that is a scan of a book or text, which cannot be searched (boo!). A good (but not perfect) solution is to use Optical Character Recognition (OCR) to convert the pdf to a txt file and search that instead. Here is my solution. Requirements Command line tools convert tesseract I installed both using homebrew. I’m using Mac OS X 10.