Skip to main content

Loader

Before you can start indexing your documents, you need to load them into memory.

SimpleDirectoryReader

Open in StackBlitz

LlamaIndex.TS supports easy loading of files from folders using the SimpleDirectoryReader class.

It is a simple reader that reads all files from a directory and its subdirectories.

import { SimpleDirectoryReader } from "llamaindex/readers/SimpleDirectoryReader";
// or
// import { SimpleDirectoryReader } from 'llamaindex'

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData("../data");

documents.forEach((doc) => {
console.log(`document (${doc.id_}):`, doc.getText());
});

Currently, it supports reading .txt, .pdf, .csv, .md, .docx, .htm, .html, .jpg, .jpeg, .png and .gif files, but support for other file types is planned.

You can modify the reader three different ways:

  • overrideReader overrides the reader for all file types, including unsupported ones.
  • fileExtToReader maps a reader to a specific file type. Can override reader for existing file types or add support for new file types.
  • defaultReader sets a fallback reader for files with unsupported extensions. By default it is TextFileReader.

SimpleDirectoryReader supports up to 9 concurrent requests. Use the numWorkers option to set the number of concurrent requests. By default it runs in sequential mode, i.e. set to 1.

Example

import type { Document, Metadata } from "llamaindex";
import { FileReader } from "llamaindex";
import {
FILE_EXT_TO_READER,
SimpleDirectoryReader,
} from "llamaindex/readers/SimpleDirectoryReader";
import { TextFileReader } from "llamaindex/readers/TextFileReader";

class ZipReader extends FileReader {
loadDataAsContent(fileContent: Buffer): Promise<Document<Metadata>[]> {
throw new Error("Implement me");
}
}

const reader = new SimpleDirectoryReader();
const documents = await reader.loadData({
directoryPath: "../data",
defaultReader: new TextFileReader(),
fileExtToReader: {
...FILE_EXT_TO_READER,
zip: new ZipReader(),
},
});

documents.forEach((doc) => {
console.log(`document (${doc.id_}):`, doc.getText());
});

API Reference