Yamini Chaudhary

From HTML to Markdown: My Journey with the Froala Editor

May 11, 2025

When I first integrated the Froala editor into our project : KnowledgeKeeper, it felt like the right choice. As a developer building a platform that needed rich content creation, Froala’s WYSIWYG features seemed perfect — visually rich, easy to use, and well-documented.

But as the project scaled, the limitations of HTML content management became more apparent, especially when we started using AI to parse and manage user-generated content. This post isn’t just about switching from HTML to Markdown; it’s about solving a broader problem — managing documents from multiple platforms in a consistent, structured way, and making that content AI-friendly.

Project Context: Centralized AI-Driven Content Management

we’ve been working on KnowledgeKeeper aimed to import documents from different platforms (like Notion, Google Docs, Dropbox Paper, and even Confluence) and manage them in one unified space using AI. Most of these platforms either support Markdown natively or allow easy conversion to Markdown. As we were feeding this content into our AI systems for summarization, tagging, or rewriting, we noticed Markdown was much easier to handle than raw HTML.

Over time, we even started converting existing content to Markdown internally — not because we loved it initially, but because it just worked better for the job.

Why HTML in Froala Became a Problem

1. HTML Bloat from Copy-Paste and Rich Content

Users often pasted content from Word, Google Docs, or email clients. Froala would dutifully retain every span, style, and font tag — but this created a tangled mess of unnecessary HTML. Even simple text blocks turned into bloated code filled with inline styles, making the content hard to parse programmatically or feed into AI models.

2. HTML Made AI Processing More Difficult

Here’s a key issue we faced: AI models like GPT-style transformers perform significantly better with clean, predictable input. HTML introduces noise — deeply nested tags, redundant styles, inconsistent formatting — which made it harder to generate accurate outputs. With Markdown, the input was much cleaner and easier to parse semantically.

HTML also introduced inconsistencies across browsers and platforms, which further complicated the pipeline. Small rendering bugs in HTML didn’t just break the UI — they created inconsistencies in AI output.

3. Clean-Up Overhead and Maintenance Fatigue

Each time content was updated or edited in Froala, new divs, spans, and styles would creep in. Over time, this led to bloated, unstable HTML. Cleaning this up became a full-time chore. Instead of focusing on building features, we were debugging DOM structures and writing post-processing scripts just to make the content manageable.

4. Poor Support for Custom Elements

Froala does allow customization, but embedding interactive widgets or structured data elements required fragile workarounds — often involving DOM hacking or custom JS injection. These didn’t play nicely with downstream AI processing, which preferred lightweight, semantically meaningful structures. Markdown (extended with custom tokens or metadata) was simply more predictable.

Why We Eventually Embraced Markdown

At first, Markdown seemed too “plain” for our use case. It lacked the visual editing power of a WYSIWYG and didn’t support complex layouts easily. But as we integrated more platforms and needed to standardize content for AI, Markdown started to shine.

It’s readable.
It’s portable.
It’s easy to version-control.
And most importantly, it’s clean and consistent, which made AI updates far more reliable.

We even built a Markdown pre-processor that allowed for custom tokens and structures, giving us the flexibility we needed without the clutter of HTML.

That said — I’m currently using Markdown with ProseMirror, and I’ll admit, it does feel a bit boring compared to Froala’s rich WYSIWYG interface. Froala had the “wow” factor in terms of visuals and user experience. But when it comes to long-term maintainability and AI integration,
Markdown + ProseMirror just gets the job done with way less drama.

Sometimes boring is better — especially when your pipeline depends on clean, predictable content.