📄️ acreom
acreom is a dev-first knowledge base with tasks
📄️ Airbyte CDK
Airbyte is a data integration
📄️ Airbyte Gong
Airbyte is a data integration
📄️ Airbyte Hubspot
Airbyte is a data integration
📄️ Airbyte JSON
Airbyte is a data integration
📄️ Airbyte Salesforce
Airbyte is a data integration
📄️ Airbyte Shopify
Airbyte is a data integration
📄️ Airbyte Stripe
Airbyte is a data integration
📄️ Airbyte Typeform
Airbyte is a data integration
📄️ Airbyte Zendesk Support
Airbyte is a data integration
📄️ Airtable
- Get your API key
📄️ Alibaba Cloud MaxCompute
[Alibaba Cloud
📄️ Amazon Textract
[Amazon
📄️ Apify Dataset
Apify Dataset is a
📄️ ArcGIS
This notebook demonstrates the use of the
📄️ Arxiv
arXiv is an open-access archive for 2 million
📄️ AssemblyAI Audio Transcripts
The AssemblyAIAudioTranscriptLoader allows to transcribe audio files
📄️ Async Chromium
Chromium is one of the browsers supported by Playwright, a library used
📄️ AsyncHtml
AsyncHtmlLoader loads raw HTML from a list of URLs concurrently.
📄️ AWS S3 Directory
[Amazon Simple Storage Service (Amazon
📄️ AWS S3 File
[Amazon Simple Storage Service (Amazon
📄️ AZLyrics
AZLyrics is a large, legal, every day
📄️ Azure AI Data
Azure AI Studio provides the capability to
📄️ Azure Blob Storage Container
[Azure Blob
📄️ Azure Blob Storage File
[Azure
📄️ Azure AI Document Intelligence
Azure AI Document Intelligence (formerly known as Azure Form Recognizer)
📄️ BibTeX
BibTeX is a file format and reference management system commonly used
📄️ BiliBili
Bilibili is one of the most beloved
📄️ Blackboard
Blackboard Learn
📄️ Blockchain
Overview
📄️ Brave Search
Brave Search is a search
📄️ Browserless
Browserless is a service that allows you to run headless Chrome
📄️ ChatGPT Data
ChatGPT is an artificial intelligence (AI)
📄️ College Confidential
College Confidential gives
📄️ Concurrent Loader
Works just like the GenericLoader but concurrently for those who choose
📄️ Confluence
Confluence is a wiki
📄️ CoNLL-U
CoNLL-U is revised
📄️ Copy Paste
This notebook covers how to load a document object from something you
📄️ Couchbase
Couchbase is an award-winning distributed NoSQL
📄️ CSV
A [comma-separated values
📄️ Cube Semantic Layer
This notebook demonstrates the process of retrieving Cube’s data model
📄️ Datadog Logs
Datadog is a monitoring and analytics
📄️ Diffbot
Unlike traditional web scraping tools,
📄️ Discord
Discord is a VoIP and instant messaging social
📄️ Docugami
This notebook covers how to load documents from Docugami. It provides
📄️ Docusaurus
Docusaurus is a static-site generator which
📄️ Dropbox
Dropbox is a file hosting
📄️ DuckDB
DuckDB is an in-process SQL OLAP database
This notebook shows how to load email (.eml) or Microsoft Outlook
📄️ EPub
EPUB is an e-book file format
📄️ Etherscan
Etherscan is the leading blockchain
📄️ EverNote
EverNote is intended for archiving and
📄️ Microsoft Excel
The UnstructuredExcelLoader is used to load Microsoft Excel files.
📄️ Facebook Chat
Messenger) is an
📄️ Fauna
Fauna is a Document Database.
📄️ Figma
Figma is a collaborative web application for
📄️ Geopandas
Geopandas is an
📄️ Git
Git is a distributed version
📄️ GitBook
GitBook is a modern documentation
📄️ GitHub
This notebooks shows how you can load issues and pull requests (PRs) for
📄️ Google BigQuery
Google BigQuery is a serverless
📄️ Google Cloud Storage Directory
[Google Cloud
📄️ Google Cloud Storage File
[Google Cloud
📄️ Google Drive
Google Drive is a file
📄️ Google Speech-to-Text Audio Transcripts
The GoogleSpeechToTextLoader allows to transcribe audio files with the
📄️ Grobid
GROBID is a machine learning library for extracting, parsing, and
📄️ Gutenberg
Project Gutenberg is an online
📄️ Hacker News
Hacker News (sometimes
📄️ Huawei OBS Directory
The following code demonstrates how to load objects from the Huawei OBS
📄️ Huawei OBS File
The following code demonstrates how to load an object from the Huawei
📄️ HuggingFace dataset
The Hugging Face Hub is home
📄️ iFixit
iFixit is the largest, open repair community
📄️ Images
This covers how to load images such as JPG or PNG into a document
📄️ Image captions
By default, the loader utilizes the pre-trained [Salesforce BLIP image
📄️ IMSDb
IMSDb is the Internet Movie Script Database.
📄️ Iugu
Iugu is a Brazilian services and software as
📄️ Joplin
Joplin is an open-source note-taking app.
📄️ Jupyter Notebook
[Jupyter
📄️ lakeFS
lakeFS provides scalable version control
📄️ LarkSuite (FeiShu)
LarkSuite is an enterprise collaboration
📄️ Mastodon
Mastodon is a federated social media and
📄️ MediaWiki Dump
[MediaWiki XML
📄️ Merge Documents Loader
Merge the documents returned from a set of specified data loaders.
📄️ mhtml
MHTML is a is used both for emails but also for archived webpages.
📄️ Microsoft OneDrive
Microsoft OneDrive (formerly
📄️ Microsoft PowerPoint
[Microsoft
📄️ Microsoft SharePoint
Microsoft SharePoint is a
📄️ Microsoft Word
Microsoft Word
📄️ Modern Treasury
Modern Treasury simplifies complex
📄️ MongoDB
MongoDB is a NoSQL , document-oriented
📄️ News URL
This covers how to load HTML news articles from a list of URLs into a
📄️ Notion DB 1/2
Notion is a collaboration platform with
📄️ Notion DB 2/2
Notion is a collaboration platform with
📄️ Nuclia
Nuclia automatically indexes your unstructured
📄️ Obsidian
Obsidian is a powerful and extensible
📄️ Open Document Format (ODT)
The [Open Document Format for Office Applications
📄️ Microsoft OneNote
This notebook covers how to load documents from OneNote.
📄️ Open City Data
Socrata
📄️ Org-mode
A Org Mode document is a
📄️ Pandas DataFrame
This notebook goes over how to load data from a
📄️ Polars DataFrame
This notebook goes over how to load data from a
📄️ Psychic
This notebook covers how to load documents from Psychic. See
📄️ PubMed
PubMed® by
📄️ PySpark
This notebook goes over how to load data from a
📄️ Quip
Quip is a collaborative productivity software
📄️ ReadTheDocs Documentation
Read the Docs is an open-sourced free
📄️ Recursive URL
We may want to process load all URLs under a root directory.
Reddit is an American social news
📄️ Roam
ROAM is a note-taking tool for networked
📄️ Rockset
Rockset is a real-time analytics database which enables queries on
📄️ rspace
This notebook shows how to use the RSpace document loader to import
📄️ RSS Feeds
This covers how to load HTML news articles from a list of RSS feed URLs
📄️ RST
A [reStructured Text
📄️ Sitemap
Extends from the WebBaseLoader, SitemapLoader loads a sitemap from a
📄️ Slack
Slack is an instant messaging program.
📄️ Snowflake
This notebooks goes over how to load documents from Snowflake
📄️ Source Code
This notebook covers how to load source code files using a special
📄️ Spreedly
Spreedly is a service that allows you to
📄️ Stripe
Stripe is an Irish-American financial
📄️ Subtitle
[The SubRip file
📄️ Telegram
Telegram Messenger is a globally
📄️ Tencent COS Directory
[Tencent Cloud Object Storage
📄️ Tencent COS File
[Tencent Cloud Object Storage
📄️ TensorFlow Datasets
TensorFlow Datasets is a
📄️ 2Markdown
2markdown service transforms website content
📄️ TOML
TOML is a file format for
📄️ Trello
Trello is a web-based
📄️ TSV
A [tab-separated values
Twitter is an online social media and social
📄️ Unstructured File
This notebook covers how to use Unstructured package to load files of
📄️ URL
This covers how to load HTML documents from a list of URLs into a
📄️ Weather
OpenWeatherMap is an open-source
📄️ WebBaseLoader
This covers how to use WebBaseLoader to load all text from HTML
📄️ WhatsApp Chat
WhatsApp (also called
📄️ Wikipedia
Wikipedia is a multilingual free online
📄️ XML
The UnstructuredXMLLoader is used to load XML files. The loader
📄️ Xorbits Pandas DataFrame
This notebook goes over how to load data from a
📄️ YouTube audio
Building chat or QA applications on YouTube videos is a topic of high
📄️ YouTube transcripts
YouTube is an online video sharing and