Free Tools for Freelancers
Gathering, cleaning, and analyzing data can take some time, but it doesn’t need to be costly. Here are some of our favorite, free data tools:
What Is It? QGIS is an open-source geographic information systems software.
Where to Find It? You can download QGIS here.
What Do We Use It For? QGIS has much of the same functionality as other GIS software like ArcMap. It is useful for visually representing data that has a location component.
What Is It? R is an open-source programming language and software for statistics, data analysis, and visualization.
Where to Find It? Base R can be downloaded here. Many people use the RStudio IDE for programming in R. RStudio makes it easier to manage and view files, variables, and plots. Because R is open-source, you can download packages containing additional functionality for many of your needs not supported in base R. For information on installing packages and a list of some of the most commonly-used packages, check out this link.
What Do We Use It For? We regularly use R for statistical modeling, such as linear regression, data manipulation and analysis, and visualization. You can check out some ways we’ve used R here.
Google Fusion Tables
What Is It? This is an online tool for mapping and data visualization.
What Do We Use It For? We have used Fusion Tables to create choropleth maps, heat maps, and point maps. Some other examples are available here.
What Is It? Converts PDF tables into Excel files (.xlsx). It’s useful when the tables you’re reading don’t paste in the same format that the PDF is structured in.
Where to Find It? https://pdftables.com/
What Do We Use It For? We helped a data journalism student here at UNC convert North Carolina homicide data from .pdf files into Excel files, then we produced a quick guide for the student on how we would go about cleaning the dataset.
Where to Find It? https://www.outwit.com/products/hub/
What Is It? OutWit Hub allows those who aren’t familiar with Python to scrape structured information from a webpage. It requires a lot of logical thinking and a lot of trial and error to structure an effective query on your first use, especially if the information isn’t superbly structured.
What Do We Use It For? After we helped the data journalism student with her project mentioned above, we found OutWit and wondered how helpful it would have been for scraping another dataset she asked about that was copied into the HTML of its webpage rather than in PDF form. After spending just a few hours with it, the screenshot above shows you how far we got with it. There are a quite a few errors in there, but with possibly an hour or two longer, we suspect the data scraped would have been relatively clean with not more manual cleaning needed. (This time commitment is the part of the balancing act: How much time do you want to spend obtaining and cleaning your data? vs. How urgent or important is this particular data to your project?)
Where to Find It? https://www.cometdocs.com/
What Is It? Cometdocs does much of the same thing that PDFTables does with the added bonus that it can also convert files into HTML files, word documents, text files and others.
What Do We Use It For? We used this platform to convert a large PDF file with data on volunteer fire departments into an excel file. The original data set was structured very well and was probably very useful for the the government office that created it. But its structure wasn’t ideal for analysis using Excel or SQL, and as Cometdocs (and likely any other PDF scraper) simply copied the format of the PDF, it took about one and a half week’s worth of time to clean afterward.
What Is It? Stack Overflow is a Q&A website for programming.
Where to Find It? http://stackoverflow.com
What Do We Use It For? Stack Overflow has saved our bacon many a time when it comes to programming. About 99% of the time we get an error message or have a coding question, we Google it and look for the first Stack Overflow link. Nearly always, someone else has asked the same question, or at least a question similar enough that we can figure out our issue.
R For Data Science
What Is It? This is a textbook covering how to use R for data cleaning, analysis, and visualization. One of the authors, Hadley Wickham, is an R guru and something of a stats/R celebrity. He developed the tidyverse package for R, so this book covers much of the functionality of that package.
What Do We Use It For? We have used this textbook to learn much of the R that we know, especially dplyr data manipulation and ggplot2 data visualization functions.
Where to Find It? https://www.w3schools.com/
What Do We Use It For? We have primarily used W3Schools for SQL. Even after we learn a SQL technique, we constantly have to look up the syntax, and this is one of our primary references.
What Is It? This website covers basic concepts in probability and statistics through interactive visualizations.
Where to Find It? http://students.brown.edu/seeing-theory/
What Do We Use It For? We just discovered this a few days ago, so it hasn’t come in handy just yet. We will likely use it for things like learning and/or brushing up on our knowledge of statistics.
Read the Docs
What Is It? This is a multifunctional site. It’s largely a host for documentation for the open source community, but some of the documentation are guides for building things or using tools.
Where to Find It? https://docs.readthedocs.io/en/latest/
What Do We Use It For? We used it for a tutorial on writing a scraper in Python. You’ll need to understand how to use your computer’s terminal or command-line interface. Otherwise, the tutorial discussed where to download or find the four prerequisite software, languages, etc. you’ll be working with.
Codecademy + beyond
What Is It? Codecademy is a great free resource for beginners to learn and practice languages from HTML/CSS and SQL to Python and Ruby.
Where to Find It? https://www.codecademy.com/
What Do We Use It For? We haven’t used Codecademy for anything specific. We at Carolina Data Desk are students with spring breaks and winter breaks and such, and many of us take what free time we have to learn new skills. Maybe that Python scraper mentioned above was more challenging than we anticipated, or maybe we got started working with a database in SQL but got stuck somewhere and realized our SQL chops weren’t what we thought they were. That’s a good time to pause that project and take some time with Codecademy or a similar platform to learn a little more about Python, the command line or SQL.
Note: Many people finish a Codecademy or similar tutorial and think the next step is to go tackle a big project. While that may work for some, it can daunting or you can get stuck and frustrated, vowing to yourself to never work with that language again. We don’t want you to do that. Check out this video on a few great post-Codecademy resources for exercising your new skills in that intermediate time before a big project.
Where to Find It? https://www.youtube.com/
What Is It? That’s right, YouTube. The wealth of information on here is mind-blowing, even though it may not always seem like the obvious go-to.
What Do We Use It For? Similar to Codecademy, we haven’t used YouTube for any specific project, but unlike Codecademy, we don’t use YouTube for learning or training with any language or tool. Rather, YouTube is useful for learning quick solutions to a problem you may have with one aspect of a tool or language, watching reviews of useful tools and languages, and getting general background on what certain tools or languages are used for. For example, one may wonder what exactly Python does and how broadly it can be applied. You may find a video like this one called Why you should learn Python Programming, or this one called Python programming for beginners: What can you do with Python?