Export MongoDB Collection to CSV, and Import CSV Into DB


Node

Updated Mar 13th, 2022

Goals

You want the user to click a button from the UI to download data to a CSV. This is an important feature so the user can trust that they can have access to their data outside of your app. This makes them more willing to use your app because they don’t have to worry about app lock-in. This is also about being able to manage data backups.

Importing data from an excel file is about convenience of the bulk upload. Also necessary to be able to restore a backup or allow portability of data.

Abundance of Libraries To Help

In addition to Node’s built-in fs module there are a plethora of libraries to assist. I used “react-csv” for the export.

In a previous project I have used “xlsx” package. Interesting that this library supports React and React Native but it’s 7.97MB!

For CSVs, and not excel files, I’ve seen the following, (YouTube):

Fast-CSV (564k weekly – 8.53kB)

json2csv (792k weekly – 383kB)

csvwriter (164k weekly)

objects-to-csv (42k weekly)

react-csv (276k weekly – 40.9 kB): This package works for static data and async, where async calls api and writes to CSV. download has 3 vulnerabilities with 2 high.

react-papaparse (39.6k weekly)

For the Browser

This is a good explanation of the goal: Generate CSV and Download it Client Side from the Browser (In React/In NextJS). But we can have the server send the file since we are using NextJS.

This is good video here but not sure if this works with React.

Note the browser has a way to create an object from a URL:

const blob = new Blob([data], {type: "text/csv"})
const url = window.URL.createObjectURL(blob)

You then use “createElement” to add an “a” tag to download. I thought this is why this strategy would not work in React but read this.

Implementing the strategy form the link above was super simple and I will roll with this until it breaks. I’m assuming I will run into some issues with browser compatibility, parsing data when it comes to blanks or special characters, etc. For now this is the way to roll.

Note: it was pretty easy to implement this strategy in my project but right off the bat I see issues with the parsing/delimiting into rows and columns. Any questions with a comma in it will be split into the next column. This strategy is the the by-hand-no-library approach. A library should help fix this issue.

const downloadFile = (input: any) => {
  // Create a blob with the data we want to download as a file
  const blob = new Blob([input.data], { type: input.fileType })
  // Create an anchor element and dispatch a click event on it
  // to trigger a download
  const a = document.createElement("a")
  a.download = input.fileName
  a.href = window.URL.createObjectURL(blob)
  const clickEvt = new MouseEvent("click", {
    view: window,
    bubbles: true,
    cancelable: true
  })
  a.dispatchEvent(clickEvt)
  a.remove()
}
const exportToCsv = (e: any) => {
  e.preventDefault()
  // Headers for each column
  let headers = ["Id,Name,Surname,Age"]
  // Convert users data to a csv
  let usersCsv = usersData.users.reduce((acc: any, user) => {
    const { id, name, surname, age } = user
    acc.push([id, name, surname, age].join(","))
    return acc
  }, [])
  downloadFile({
    data: [...headers, ...usersCsv].join("\n"),
    fileName: "users.csv",
    fileType: "text/csv"
  })
}

There is an external json file imported in and looped over to create the rows of a table in the JSX

<table className="usersTable">
        <thead>
          <tr>
            <th>ID</th>
            <th>Name</th>
            <th>Surname</th>
            <th>Age</th>
          </tr>
        </thead>
        <tbody>
          {usersData.users.map(user => {
            const { id, name, surname, age } = user
            return (
              <tr key={id}>
                <td>{id}</td>
                <td>{name}</td>
                <td>{surname}</td>
                <td>{age}</td>
              </tr>
            )
          })}
        </tbody>
      </table>
      <div className="actionBtns">
        <button type="button" onClick={exportToCsv}>
          Export to CSV
        </button>
      </div>

When Do you Need a Parser

This explains the importance of a parser.

Downloading an uploading data is really about getting the data in the right format, and handling all of these edge cases, (special characters, empty fields, etc.).

csv-parse (2.2m weekly), papaparse (1m weekly), (Note: There is a react-papaparse package but only 40k weekly downloads).

Papaparse had 3 vulnerabilities, 1 moderate and 2 high.

Node Versus NextJS

Is there any major difference here?

In a previous project I drop multiple CSV files into a folder on my hard drive and then run a node script via the “node whateverScriptName” command. The script does its thing and spits out a workbook with multiple sheets as its output.

How do you run a node script in a web application? In NextJS this should be handled via an API endpoint.

Node Stream API

The Node.js stream API is scalable and offers the greatest control over the data flow. It comes at the cost of being more verbose and harder to write.

Write File from Client or Server?

Is it best to use the blob/URL.createObjectURL method or create the file on the server and send it back as a response?

Express has a “sendFile” method. Next JS likely has something similar right?

Seems like the client side is the way to go based on the solutions in this stack overflow post.

React-CSV Approach For Writing the CSV

I kept getting an output of html

Also had to deal with nested data but with the package this is not a problem.

Handling Nested Data

Instead of a question have individual keys for the three incorrect answers it has one “incorrect_answers” key that is an array of the three answers. As mentioned earlier this is pretty simple to handle with “react-csv” for the export to CSV feature.

What the word with importing the data though. And shouldn’t the column header names match the data schema?

FileReader()

Hadn’t used this before, native browser API for Filereader

const reader = new FileReader()

reader.onloadend = () => {}

reader.readAsText()

Importing

This was the article I used to help implement

Using TS I ran into a type error: “Object Possibly null for file type import.” I tried guarding, ran into this stack overflow as well.

I turned off TS for this file by changing the extension back to “.js”

I could tell “Papa.parse” was reading the file because I logged the variable the results were stored in a saw an Object with data, errors, and meta properties which is the expected result per the “papaparse” docs.

const csv = Papa.parse(target.result, {header.true})
console.log(csv)

Interestingly I did get two items in the errors array for trailing quotes:

{type: "Quotes", code: "InvalidQuotes", message: "Trailing quote on quoted field is malformed", row: 9, index: 2066}

In the meta property there was a “delimiter” key set to a value of “,”

If a user clicks upload button before selecting a file then you will get a runtime error: “”

Need to Format Using Loop

“csv.data” is an array of objects. We need to loop over that data and return a new array of objects.

Validation On Import

How do we make sure a csv file is chosen and not text file or excel file or other? Likely FileReader or Papaparse’s department.

For everything else it’s probably just more JavaScript on our end. We are essentially just using the parser to convert CSV to JSON. What we do with it form there is on the developer.

Need all of the security checks like no malicious code, empty fields, whitespace, etc. Certain special characters should be allowed so many need a sanitization package.

PapaParse Parsing

If an imported CSV file, has a field with quotes then the quotes get escaped with a forward slash. Special characters are allowed and so if you add an arrow function with an alert message the quotes in the alert will be escaped.

Getting an Extra Object for the Header

Trying to resolve with just the map array method and I still get a null object. Bringing in a filter resolved the issue.

// filter to remove header row
// map over to put into my format

const arrayOfQuestions = csv.data
  .filter((item, index) => {
    if (index !== csv.data.length - 1) return item
  })
  .map(QRow => {
    return {
      category: QRow.CATEGORY,
      type: "multiple",
      difficulty: "easy",
      question: QRow.QUESTION,
      correct_answer: QRow.Correct_Answer,
      incorrect_answers: [QRow.Incorrect_Answer_01, QRow.Incorrect_Answer_02, QRow.Incorrect_Answer_03]
    }
  })

Does adjusting the papa parse config option work?

const csv = Papa.parse(target.result, { header: false })

Nope. Makes it worse.

Using a conditional to see “.length -1” or “QRow !== undefined” I can get it to null but this is still not ideal. The filter method was the best solution for now.

Note: Need a test for what happens if a user uploads a file without a header row?

Uploading an Array of Objects

The app was originally set up to receive one question as an object and now I have a an array of multiple question objects.

I could 1.) have a different endpoint or function to handle multiple objects (seems like a lot of redundant code) or 2.) Check the type of the argument passed to see if is a single object or an array of objects or 3.) Always pass a question object in as an array and use the length property to handle.

Option #1: Hard pass.

Option #2 Something along the lines of:

if (req.body instanceof Array) {
      res.status(200).json({ message: "An Array has been sent" })
      console.log("An Array has been sent")
    }
 else {
      res.status(200).json({ message: "Some non-array thing sent" })
      console.log("An Object, or something other than an array, has been sent")
}     

Option #3: Updated “addQuestionForm” component to pass the new single object inside of an array. Configure the API Route to receive 0, 1, or more than one question object inside an array. Server-side cleanup and validation for each question object. Validation function setup as a reusable utility function and imported in. For loop to run a

Note: Likely a refactor opportunity to use “.insertMany” mongoDB method instead of multiple inserts.

NextJS Related

Since the file upload component is in a non-page component you hit an internal API route that handles the array of question objects with server-side code.

Sources

Here

https://theroadtoenterprise.com/blog/how-to-download-csv-and-json-files-in-react