Reading And Writing CSV Files With C++
As a data scientist, reading and writing data from/to CSV is one of the most common tasks I do on the daily. R, my language of choice, makes this easy with read.csv()
and write.csv()
(although I tend to use fread()
and fwrite()
from the data.table package).
Hot Take. C++ is not R.
As far as I know, there is no CSV reader/writer built into the C++ STL. That’s not a knock against C++; it’s just a lower level language. If we want to read and write CSV files with C++, we’ll have to deal with File I/O, data types, and some low level logic on how to read, parse, and write data. For me, this is a necessary step in order to build and test more fun programs like machine learning models.
Writing to CSV
We’ll start by creating a simple CSV file with one column of integer data. And we’ll give it the header Foo.
#include <fstream>
int main() {
// Create an output filestream object
std::ofstream myFile("foo.csv");
// Send data to the stream
myFile << "Foo\n";
myFile << "1\n";
myFile << "2\n";
myFile << "3\n";
// Close the file
myFile.close();
return 0;
}
Here, ofstream is an “output file stream”. Since it’s derived from ostream, we can treat it just like cout (which is also derived from ostream). The result of executing this program is that we get a file called foo.csv in the same directory as our executable. Let’s wrap this into a write_csv()
function that’s a little more dynamic.
#include <string>
#include <fstream>
#include <vector>
void write_csv(std::string filename, std::string colname, std::vector<int> vals){
// Make a CSV file with one column of integer values
// filename - the name of the file
// colname - the name of the one and only column
// vals - an integer vector of values
// Create an output filestream object
std::ofstream myFile(filename);
// Send the column name to the stream
myFile << colname << "\n";
// Send data to the stream
for(int i = 0; i < vals.size(); ++i)
{
myFile << vals.at(i) << "\n";
}
// Close the file
myFile.close();
}
int main() {
// Make a vector of length 100 filled with 1s
std::vector<int> vec(100, 1);
// Write the vector to CSV
write_csv("ones.csv", "Col1", vec);
return 0;
}
Cool. Now we can use write_csv()
to write a vector of integers to a CSV file with ease. Let’s expand on this to support multiple vectors of integers and corresponding column names.
#include <string>
#include <fstream>
#include <vector>
#include <utility> // std::pair
void write_csv(std::string filename, std::vector<std::pair<std::string, std::vector<int>>> dataset){
// Make a CSV file with one or more columns of integer values
// Each column of data is represented by the pair <column name, column data>
// as std::pair<std::string, std::vector<int>>
// The dataset is represented as a vector of these columns
// Note that all columns should be the same size
// Create an output filestream object
std::ofstream myFile(filename);
// Send column names to the stream
for(int j = 0; j < dataset.size(); ++j)
{
myFile << dataset.at(j).first;
if(j != dataset.size() - 1) myFile << ","; // No comma at end of line
}
myFile << "\n";
// Send data to the stream
for(int i = 0; i < dataset.at(0).second.size(); ++i)
{
for(int j = 0; j < dataset.size(); ++j)
{
myFile << dataset.at(j).second.at(i);
if(j != dataset.size() - 1) myFile << ","; // No comma at end of line
}
myFile << "\n";
}
// Close the file
myFile.close();
}
int main() {
// Make three vectors, each of length 100 filled with 1s, 2s, and 3s
std::vector<int> vec1(100, 1);
std::vector<int> vec2(100, 2);
std::vector<int> vec3(100, 3);
// Wrap into a vector
std::vector<std::pair<std::string, std::vector<int>>> vals = {{"One", vec1}, {"Two", vec2}, {"Three", vec3}};
// Write the vector to CSV
write_csv("three_cols.csv", vals);
return 0;
}
Here we’ve represented each column of data as a std::pair
of <column name, column values>
, and the whole dataset as a std::vector
of such columns. Now we can write a variable number of integer columns to a CSV file.
Reading from CSV
Now that we’ve written some CSV files, let’s attempt to read them. For now let’s correctly assume that our file contains integer data plus one row of column names at the top.
#include <string>
#include <fstream>
#include <vector>
#include <utility> // std::pair
#include <stdexcept> // std::runtime_error
#include <sstream> // std::stringstream
std::vector<std::pair<std::string, std::vector<int>>> read_csv(std::string filename){
// Reads a CSV file into a vector of <string, vector<int>> pairs where
// each pair represents <column name, column values>
// Create a vector of <string, int vector> pairs to store the result
std::vector<std::pair<std::string, std::vector<int>>> result;
// Create an input filestream
std::ifstream myFile(filename);
// Make sure the file is open
if(!myFile.is_open()) throw std::runtime_error("Could not open file");
// Helper vars
std::string line, colname;
int val;
// Read the column names
if(myFile.good())
{
// Extract the first line in the file
std::getline(myFile, line);
// Create a stringstream from line
std::stringstream ss(line);
// Extract each column name
while(std::getline(ss, colname, ',')){
// Initialize and add <colname, int vector> pairs to result
result.push_back({colname, std::vector<int> {}});
}
}
// Read data, line by line
while(std::getline(myFile, line))
{
// Create a stringstream of the current line
std::stringstream ss(line);
// Keep track of the current column index
int colIdx = 0;
// Extract each integer
while(ss >> val){
// Add the current integer to the 'colIdx' column's values vector
result.at(colIdx).second.push_back(val);
// If the next token is a comma, ignore it and move on
if(ss.peek() == ',') ss.ignore();
// Increment the column index
colIdx++;
}
}
// Close file
myFile.close();
return result;
}
int main() {
// Read three_cols.csv and ones.csv
std::vector<std::pair<std::string, std::vector<int>>> three_cols = read_csv("three_cols.csv");
std::vector<std::pair<std::string, std::vector<int>>> ones = read_csv("ones.csv");
// Write to another file to check that this was successful
write_csv("three_cols_copy.csv", three_cols);
write_csv("ones_copy.csv", ones);
return 0;
}
This program reads our previously created CSV files and writes each dataset to a new file, essentially creating copies of our original files.
Going further
So far we’ve seen how to read and write datasets with integer values only. Extending this to read/write a dataset of only doubles or only strings should be fairly straight-forward. Reading a dataset with unknown, mixed data types is another beast and beyond the scope of this article, but see this code review for possible solutions.
Special thanks to papagaga and Incomputable for helping me with this topic via codereview.stackexchange.com.