← Home

defuddle

17
Versions
License
No
Install Scripts
Missing
Provenance

Supply chain provenance

Status for the latest visible version.

No SLSA provenance npm registry signatures gitHead linked

Without SLSA provenance there is no cryptographic link between this tarball and the public source — the axios compromise (March 2026) relied on exactly this gap.

Maintainers

kepano

Keywords

readabilitycontent-extractionarticle-extractionweb-scrapinghtml-cleanupcontent-parserarticle-parserdom

Versions (showing 17 of 17)

Version Deps Published
0.18.1 1 / 12
0.18.0 1 / 12
0.17.0 1 / 12
0.16.0 1 / 12
0.15.0 1 / 12
0.14.0 1 / 12
0.13.0 1 / 12
0.12.0 1 / 12
0.11.0 1 / 12
0.10.0 1 / 12
0.9.0 1 / 12
0.8.0 1 / 12
0.7.0 0 / 12
0.6.6 0 / 12
0.6.5 0 / 12
0.6.4 0 / 11
0.6.3 0 / 11

v0.18.1

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.

v0.18.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

v0.17.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

v0.16.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

v0.15.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

v0.14.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.

v0.13.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

v0.12.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

v0.11.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

v0.10.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

v0.9.0

1 finding
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.

v0.8.0

2 findings
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.

LOW GHSA-5mq8-78gm-pjmq: defuddle vulnerable to XSS via unescaped string interpolation in _findContentBySchemaText image tag osv

### Summary The `_findContentBySchemaText` method in `src/defuddle.ts` interpolates image `src` and `alt` attributes directly into an HTML string without escaping: ```typescript html += `<img src="${imageSrc}" alt="${imageAlt}">`; ``` An attacker can use a `"` in the `alt` attribute to break out of the attribute context and inject event handlers. This is a separate vulnerability from the sanitization bypass fixed in f154cb7 — the injection happens during string construction, not in the DOM, so `_stripUnsafeElements` cannot catch it. ### Details When `_findContentBySchemaText` finds a sibling image outside the matched content element, it reads the image's `src` and `alt` attributes via `getAttribute()` and interpolates them into a template literal. `getAttribute('alt')` returns the raw attribute value. If the alt contains `"`, it terminates the `alt` attribute in the interpolated HTML string, and subsequent content becomes new attributes (including event handlers). The recently added `_stripUnsafeElements()` (commit f154cb7) strips `on*` attributes from DOM elements, but the `alt` attribute's name is `alt` (not `on*`), so it is preserved with its full value. The `onload` handler is created by the string interpolation, not present in the original DOM. ### PoC Input HTML: ```html <!DOCTYPE html> <html> <head> <title>PoC</title> <script type="application/ld+json"> {"@type": "Article", "text": "Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count."} </script> </head> <body> <article><p>Short.</p></article> <div class="post-container"> <p>Extra text to inflate parent word count padding padding padding.</p> <div class="post-body"> Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. </div> <img width="800" height="600" src="https://example.com/photo.jpg" alt='pwned" onload="alert(document.cookie)'> </div> </body> </html> ``` Output: ```html <img src="https://example.com/photo.jpg" alt="pwned" onload="alert(document.cookie)"> ``` The `onload` event handler is injected as a separate HTML attribute. ### Impact XSS in any application that renders defuddle's HTML output (browser extensions, web clippers, reader modes). The attack requires crafted HTML with schema.org structured data that triggers the `_findContentBySchemaText` fallback, combined with a sibling image whose `alt` attribute contains a quote character followed by an event handler. ### Suggested Fix Use DOM API instead of string interpolation: ```typescript if (imageSrc) { const img = this.doc.createElement('img'); img.setAttribute('src', imageSrc); img.setAttribute('alt', imageAlt); html += img.outerHTML; } ``` This ensures attribute values are properly escaped by the DOM serializer.

v0.7.0

2 findings
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.

LOW GHSA-5mq8-78gm-pjmq: defuddle vulnerable to XSS via unescaped string interpolation in _findContentBySchemaText image tag osv

### Summary The `_findContentBySchemaText` method in `src/defuddle.ts` interpolates image `src` and `alt` attributes directly into an HTML string without escaping: ```typescript html += `<img src="${imageSrc}" alt="${imageAlt}">`; ``` An attacker can use a `"` in the `alt` attribute to break out of the attribute context and inject event handlers. This is a separate vulnerability from the sanitization bypass fixed in f154cb7 — the injection happens during string construction, not in the DOM, so `_stripUnsafeElements` cannot catch it. ### Details When `_findContentBySchemaText` finds a sibling image outside the matched content element, it reads the image's `src` and `alt` attributes via `getAttribute()` and interpolates them into a template literal. `getAttribute('alt')` returns the raw attribute value. If the alt contains `"`, it terminates the `alt` attribute in the interpolated HTML string, and subsequent content becomes new attributes (including event handlers). The recently added `_stripUnsafeElements()` (commit f154cb7) strips `on*` attributes from DOM elements, but the `alt` attribute's name is `alt` (not `on*`), so it is preserved with its full value. The `onload` handler is created by the string interpolation, not present in the original DOM. ### PoC Input HTML: ```html <!DOCTYPE html> <html> <head> <title>PoC</title> <script type="application/ld+json"> {"@type": "Article", "text": "Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count."} </script> </head> <body> <article><p>Short.</p></article> <div class="post-container"> <p>Extra text to inflate parent word count padding padding padding.</p> <div class="post-body"> Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. </div> <img width="800" height="600" src="https://example.com/photo.jpg" alt='pwned" onload="alert(document.cookie)'> </div> </body> </html> ``` Output: ```html <img src="https://example.com/photo.jpg" alt="pwned" onload="alert(document.cookie)"> ``` The `onload` event handler is injected as a separate HTML attribute. ### Impact XSS in any application that renders defuddle's HTML output (browser extensions, web clippers, reader modes). The attack requires crafted HTML with schema.org structured data that triggers the `_findContentBySchemaText` fallback, combined with a sibling image whose `alt` attribute contains a quote character followed by an event handler. ### Suggested Fix Use DOM API instead of string interpolation: ```typescript if (imageSrc) { const img = this.doc.createElement('img'); img.setAttribute('src', imageSrc); img.setAttribute('alt', imageAlt); html += img.outerHTML; } ``` This ensures attribute values are properly escaped by the DOM serializer.

v0.6.6

2 findings
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

LOW GHSA-5mq8-78gm-pjmq: defuddle vulnerable to XSS via unescaped string interpolation in _findContentBySchemaText image tag osv

### Summary The `_findContentBySchemaText` method in `src/defuddle.ts` interpolates image `src` and `alt` attributes directly into an HTML string without escaping: ```typescript html += `<img src="${imageSrc}" alt="${imageAlt}">`; ``` An attacker can use a `"` in the `alt` attribute to break out of the attribute context and inject event handlers. This is a separate vulnerability from the sanitization bypass fixed in f154cb7 — the injection happens during string construction, not in the DOM, so `_stripUnsafeElements` cannot catch it. ### Details When `_findContentBySchemaText` finds a sibling image outside the matched content element, it reads the image's `src` and `alt` attributes via `getAttribute()` and interpolates them into a template literal. `getAttribute('alt')` returns the raw attribute value. If the alt contains `"`, it terminates the `alt` attribute in the interpolated HTML string, and subsequent content becomes new attributes (including event handlers). The recently added `_stripUnsafeElements()` (commit f154cb7) strips `on*` attributes from DOM elements, but the `alt` attribute's name is `alt` (not `on*`), so it is preserved with its full value. The `onload` handler is created by the string interpolation, not present in the original DOM. ### PoC Input HTML: ```html <!DOCTYPE html> <html> <head> <title>PoC</title> <script type="application/ld+json"> {"@type": "Article", "text": "Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count."} </script> </head> <body> <article><p>Short.</p></article> <div class="post-container"> <p>Extra text to inflate parent word count padding padding padding.</p> <div class="post-body"> Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. </div> <img width="800" height="600" src="https://example.com/photo.jpg" alt='pwned" onload="alert(document.cookie)'> </div> </body> </html> ``` Output: ```html <img src="https://example.com/photo.jpg" alt="pwned" onload="alert(document.cookie)"> ``` The `onload` event handler is injected as a separate HTML attribute. ### Impact XSS in any application that renders defuddle's HTML output (browser extensions, web clippers, reader modes). The attack requires crafted HTML with schema.org structured data that triggers the `_findContentBySchemaText` fallback, combined with a sibling image whose `alt` attribute contains a quote character followed by an event handler. ### Suggested Fix Use DOM API instead of string interpolation: ```typescript if (imageSrc) { const img = this.doc.createElement('img'); img.setAttribute('src', imageSrc); img.setAttribute('alt', imageAlt); html += img.outerHTML; } ``` This ensures attribute values are properly escaped by the DOM serializer.

v0.6.5

2 findings
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Consider requesting the maintainer enable provenance via CI/CD.

LOW GHSA-5mq8-78gm-pjmq: defuddle vulnerable to XSS via unescaped string interpolation in _findContentBySchemaText image tag osv

### Summary The `_findContentBySchemaText` method in `src/defuddle.ts` interpolates image `src` and `alt` attributes directly into an HTML string without escaping: ```typescript html += `<img src="${imageSrc}" alt="${imageAlt}">`; ``` An attacker can use a `"` in the `alt` attribute to break out of the attribute context and inject event handlers. This is a separate vulnerability from the sanitization bypass fixed in f154cb7 — the injection happens during string construction, not in the DOM, so `_stripUnsafeElements` cannot catch it. ### Details When `_findContentBySchemaText` finds a sibling image outside the matched content element, it reads the image's `src` and `alt` attributes via `getAttribute()` and interpolates them into a template literal. `getAttribute('alt')` returns the raw attribute value. If the alt contains `"`, it terminates the `alt` attribute in the interpolated HTML string, and subsequent content becomes new attributes (including event handlers). The recently added `_stripUnsafeElements()` (commit f154cb7) strips `on*` attributes from DOM elements, but the `alt` attribute's name is `alt` (not `on*`), so it is preserved with its full value. The `onload` handler is created by the string interpolation, not present in the original DOM. ### PoC Input HTML: ```html <!DOCTYPE html> <html> <head> <title>PoC</title> <script type="application/ld+json"> {"@type": "Article", "text": "Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count."} </script> </head> <body> <article><p>Short.</p></article> <div class="post-container"> <p>Extra text to inflate parent word count padding padding padding.</p> <div class="post-body"> Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. </div> <img width="800" height="600" src="https://example.com/photo.jpg" alt='pwned" onload="alert(document.cookie)'> </div> </body> </html> ``` Output: ```html <img src="https://example.com/photo.jpg" alt="pwned" onload="alert(document.cookie)"> ``` The `onload` event handler is injected as a separate HTML attribute. ### Impact XSS in any application that renders defuddle's HTML output (browser extensions, web clippers, reader modes). The attack requires crafted HTML with schema.org structured data that triggers the `_findContentBySchemaText` fallback, combined with a sibling image whose `alt` attribute contains a quote character followed by an event handler. ### Suggested Fix Use DOM API instead of string interpolation: ```typescript if (imageSrc) { const img = this.doc.createElement('img'); img.setAttribute('src', imageSrc); img.setAttribute('alt', imageAlt); html += img.outerHTML; } ``` This ensures attribute values are properly escaped by the DOM serializer.

v0.6.4

2 findings
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.

LOW GHSA-5mq8-78gm-pjmq: defuddle vulnerable to XSS via unescaped string interpolation in _findContentBySchemaText image tag osv

### Summary The `_findContentBySchemaText` method in `src/defuddle.ts` interpolates image `src` and `alt` attributes directly into an HTML string without escaping: ```typescript html += `<img src="${imageSrc}" alt="${imageAlt}">`; ``` An attacker can use a `"` in the `alt` attribute to break out of the attribute context and inject event handlers. This is a separate vulnerability from the sanitization bypass fixed in f154cb7 — the injection happens during string construction, not in the DOM, so `_stripUnsafeElements` cannot catch it. ### Details When `_findContentBySchemaText` finds a sibling image outside the matched content element, it reads the image's `src` and `alt` attributes via `getAttribute()` and interpolates them into a template literal. `getAttribute('alt')` returns the raw attribute value. If the alt contains `"`, it terminates the `alt` attribute in the interpolated HTML string, and subsequent content becomes new attributes (including event handlers). The recently added `_stripUnsafeElements()` (commit f154cb7) strips `on*` attributes from DOM elements, but the `alt` attribute's name is `alt` (not `on*`), so it is preserved with its full value. The `onload` handler is created by the string interpolation, not present in the original DOM. ### PoC Input HTML: ```html <!DOCTYPE html> <html> <head> <title>PoC</title> <script type="application/ld+json"> {"@type": "Article", "text": "Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count."} </script> </head> <body> <article><p>Short.</p></article> <div class="post-container"> <p>Extra text to inflate parent word count padding padding padding.</p> <div class="post-body"> Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. </div> <img width="800" height="600" src="https://example.com/photo.jpg" alt='pwned" onload="alert(document.cookie)'> </div> </body> </html> ``` Output: ```html <img src="https://example.com/photo.jpg" alt="pwned" onload="alert(document.cookie)"> ``` The `onload` event handler is injected as a separate HTML attribute. ### Impact XSS in any application that renders defuddle's HTML output (browser extensions, web clippers, reader modes). The attack requires crafted HTML with schema.org structured data that triggers the `_findContentBySchemaText` fallback, combined with a sibling image whose `alt` attribute contains a quote character followed by an event handler. ### Suggested Fix Use DOM API instead of string interpolation: ```typescript if (imageSrc) { const img = this.doc.createElement('img'); img.setAttribute('src', imageSrc); img.setAttribute('alt', imageAlt); html += img.outerHTML; } ``` This ensures attribute values are properly escaped by the DOM serializer.

v0.6.3

2 findings
LOW No provenance attestation provenance

Package was published without Sigstore provenance. Only ~12% of npm packages have provenance, so this is common but not ideal.

LOW GHSA-5mq8-78gm-pjmq: defuddle vulnerable to XSS via unescaped string interpolation in _findContentBySchemaText image tag osv

### Summary The `_findContentBySchemaText` method in `src/defuddle.ts` interpolates image `src` and `alt` attributes directly into an HTML string without escaping: ```typescript html += `<img src="${imageSrc}" alt="${imageAlt}">`; ``` An attacker can use a `"` in the `alt` attribute to break out of the attribute context and inject event handlers. This is a separate vulnerability from the sanitization bypass fixed in f154cb7 — the injection happens during string construction, not in the DOM, so `_stripUnsafeElements` cannot catch it. ### Details When `_findContentBySchemaText` finds a sibling image outside the matched content element, it reads the image's `src` and `alt` attributes via `getAttribute()` and interpolates them into a template literal. `getAttribute('alt')` returns the raw attribute value. If the alt contains `"`, it terminates the `alt` attribute in the interpolated HTML string, and subsequent content becomes new attributes (including event handlers). The recently added `_stripUnsafeElements()` (commit f154cb7) strips `on*` attributes from DOM elements, but the `alt` attribute's name is `alt` (not `on*`), so it is preserved with its full value. The `onload` handler is created by the string interpolation, not present in the original DOM. ### PoC Input HTML: ```html <!DOCTYPE html> <html> <head> <title>PoC</title> <script type="application/ld+json"> {"@type": "Article", "text": "Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count."} </script> </head> <body> <article><p>Short.</p></article> <div class="post-container"> <p>Extra text to inflate parent word count padding padding padding.</p> <div class="post-body"> Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. Long article text repeated many times to exceed the extracted content word count. </div> <img width="800" height="600" src="https://example.com/photo.jpg" alt='pwned" onload="alert(document.cookie)'> </div> </body> </html> ``` Output: ```html <img src="https://example.com/photo.jpg" alt="pwned" onload="alert(document.cookie)"> ``` The `onload` event handler is injected as a separate HTML attribute. ### Impact XSS in any application that renders defuddle's HTML output (browser extensions, web clippers, reader modes). The attack requires crafted HTML with schema.org structured data that triggers the `_findContentBySchemaText` fallback, combined with a sibling image whose `alt` attribute contains a quote character followed by an event handler. ### Suggested Fix Use DOM API instead of string interpolation: ```typescript if (imageSrc) { const img = this.doc.createElement('img'); img.setAttribute('src', imageSrc); img.setAttribute('alt', imageAlt); html += img.outerHTML; } ``` This ensures attribute values are properly escaped by the DOM serializer.