Extract Data From Evernote files

Extract Data From Evernote files

Evernote can export its notes as HTML documents with images embedded into them instead of linked to. That has the advantage that you do not have to think about where your images etc. are stored and to make sure that you’re using the right URL to address them in your HTML file.

On the other hand, these HTML documents are huge because they contain the images (and possibly other documents) as ASCII text strings, which makes them a lot bigger than storing them as binaries on disc. So, you might want to extract those files from the Evernote note(s) so that you can then modify your HTML document to link to these external files.

This is nearly the inverse process of adding thumbnails to markdown files. The function extractDataURIs can be included into a larger script or modified to your needs – for example, you might want to save a modified HTML document along with the extracted images etc. containing links to these images.

As it stands now, extractDataURIs expects the path of the Evernote HTML file, a prefix to use for the newly created documents and the path to a folder where these documents are saved. The extracted documents are named “prefix-1.ext”, “prefix-2.ext”, and so on, where “ext” is the extension as specified in the MIME type of the embedded document (png, jpg, pdf etc.). If no prefix is specified, the base name (i.e. without extension) of the Evernote file or the string “embedded-data” will be used. Similarly, if no target folder is defined in the call to extractDataURIs, the extracted documents will be saved in the current user’s desktop folder.

The function extracts all data URIs from the file using a regular expression. It then employs the ObjC methods $.NSURL.URLWithString to build an NSURL object and NSData.dataWithContentsOfURLOptionsError to convert it to binary data. Note the usage of the Options and Error parts for this method – if an error occurs in the conversion of the data URI to binary data, it will be written to the console.

ObjC.import("Foundation");
function extractDataURIs(file, prefix, targetFolder) {
  /* file: Path to source file with embedded data URIs 
   prefix: If set, prefix of the generated files. 
          If not set, the file's name will be used 
          if that's not possible the prefix will be set to 
            "embedded-data"
   targetFolder: Folder where the generated files will be saved. 
     If not set, the current user's desktop will be used.
  */

  const ca = Application.currentApplication();
  ca.includeStandardAdditions = true;
  const myDesktop = ca.pathTo(“desktop”);

  /* Setup target folder and file name prefix */
  const targetDir = targetFolder ? targetFolder : myDesktop;
  const basename = (() => {
    if (prefix) {
      return prefix;
    } else {
      /* Get the file's basename w/o extension. If that fails
       use "embedded-data" as prefix */
      const fileDestruct = file.match(/.*\/([^.]+)(:?\.*)?/);
      return fileDestruct ? fileDestruct[1] : "embedded-data";
    }
  })();

  /* Read the current record's raw data into 'data' */
  const fm = $.NSFileManager.defaultManager;
  const data = fm.contentsAtPath(file);

  /* Convert the raw data to an UTF-8 encoded JavaScript string */
  const txt = $.NSString.alloc.initWithDataEncoding(
    data,
    $.NSUTF8StringEncoding
  ).js;

  /* Assemble all data URIs in an array looking for 
     'src="data:doc type/extension;base64,...."'
     Note the usage of the 's' flag in matchAll to treat the whole 
     string as a single line.
     */
  const base64Matches = [
    ...txt.matchAll(/src="(data:(?:.*?)\/(.*?);base64,.*?)"/gs),
  ];

  /* Loop over all data URIs */
  base64Matches.forEach((data, index) => {
    /* The first capturing group contains the complete data URL
     'data:image/png;base64,...'
       The second capturing group of the RE contains 
           the MIME type "extension", i.e. jpg, png etc.
      */
    const fullMatch = data[1];
    const extension = data[2];

    /* Build an NSURL from the complete data URL. 
    Note: MUST URL-escape the raw data first! */
    const matchNSString = $.NSString.alloc.initWithString(fullMatch);
    const url = $.NSURL.URLWithString(
      matchNSString.stringByAddingPercentEscapesUsingEncoding(
        $.NSASCIIStringEncoding
      )
    );

    /* Build an NSImage from the NSURL */
    const error = $();
    const imageData = $.NSData.dataWithContentsOfURLOptionsError(
      url,
      null,
      error
    );

    /* If the image could not be created, log the error */
    if (!ObjC.deepUnwrap(imageData)) {
      const errorMessage = $(error.localizedDescription).js;
      console.log(errorMessage);
    }

    /* Build a new file name of the form
       basename/prefix-number.extension
    */
    const newfile = `${basename}/${prefix}-${index + 1}.${extension}`;
    
    /* Write the image to the file */
    imageData.writeToFileAtomically(newfile, true);
  });
}