Contributing to open source project JS-DOM
We use JSDOM for testing clientside applications in NodeJS. JSDOM lowers the complexity of writing tests for clientside code by omitting the browser and replacing it with a fake one: JSDOM.
However, there's one JSDOM dependency that concerned me a bit: request, with request-promise-native. Request has been deprecated, and request-promise-native does nasty things using stealthy-require. I'm not sure why anyone would use stealthy-require
, but I trust there's a good reason for to use it.
request
has already been a discussed to be replaced with something else in an issue #2792: Replace request with something better. Since there were no pull requests for the issue, I decided to see if I can help out and fix it myself. In this blog post, I'll describe my process.
Contributing to foreign projects
Changing code inside a foreign project is commonly quite the challenge. There is usually a lot of code and a lot of things to consider, many you just don't know about. This is why tests are really important.
For a complex project like JSDOM, without a comprehensive suite of tests, there is no way to be sure your changes might break something. Even with perfect code coverage, there is still no guarantee your changes don't break something, but you can still be pretty sure your code at least runs in the cases presented by the tests.
Fork & Clone.
I forked & cloned the repository, and created a new branch to start my experimental replacement.
git clone git@github.com:tobyhinloopen/jsdom.git
cd jsdom
git checkout -b 2792-replace-request-with-node-fetch
Now let's see if there are some tests I can run.
$ npm i
npm ERR! code EUNSUPPORTEDPROTOCOL
npm ERR! Unsupported URL Type "link:": link:./scripts/eslint-plugin
npm ERR! A complete log of this run can be found in:
npm ERR! /Users/hinloopen/.npm/_logs/2020-05-10T15_02_02_981Z-debug.log
Uh... ok. Let's consult the README first. There is a README.md
and Contributing.md
. Both might be relevant.
In Contributing.md
, it's already mentioned they're using yarn
. Eager to start, I ignore the rest and use yarn install
to install the dependencies.
Let's run some tests without consulting the readme or contributing guidelines and see if they run.
$ yarn test
# ...
1) "before all" hook: $mochaNoSugar in "{root}"
2) "after all" hook: $mochaNoSugar in "{root}"
0 passing (16ms)
2 failing
1) "before all" hook: $mochaNoSugar in "{root}":
Error: Host entries not present for web platform tests. See https://github.com/web-platform-tests/wpt#running-the-tests
at /Users/hinloopen/Projects/Github/jsdom/test/web-platform-tests/start-wpt-server.js:62:13
at async /Users/hinloopen/Projects/Github/jsdom/test/web-platform-tests/run-tuwpts.js:25:32
# ...
Looks like the tests require more setup. Let's consult the readme again. The readme refers to The web-platform-tests Project. It looks like this project allows you to run a test suite (that you have to provide yourself in some way) inside a set of browsers. You have to clone the repo and run the code.
I'll just assume this web-platform-tests project starts some kind of server and you have to open a page in a real browser. Since we're testing a fake browser (JSDOM), I also assume JSDOM somehow registers to WPT as a real browser, so it can the same tests in JSDOM, as if JSDOM was a browser. Let's try it out.
$ git clone https://github.com/web-platform-tests/wpt.git
# ...
$ cd wpt
$ ./wpt serve
# ...
CRITICAL:web-platform-tests:Failed to start HTTP server on port 59514; is something already using that port?
CRITICAL:web-platform-tests:Please ensure all the necessary WPT subdomains are mapped to a loopback device in /etc/hosts.
Right. RTFM. I added the setup instructions to .envrc
in the WPT project folder.
$ nano .envrc
python -m ensurepip --user
export PATH="$PATH:$HOME/Library/Python/2.7/bin"
pip install --user virtualenv
Additionally:
To get the tests running, you need to set up the test domains in your hosts file. (source)
Let's do that.
./wpt make-hosts-file | sudo tee -a /etc/hosts
# ...
I think that command fails when a password is asked. I used sudo ls
to make my system ask for a password so I can run another sudo command without asking for password. I'm sure there is a better way, but hey, it works.
After that, let's retry serve
:
$ ./wpt serve
# ...
INFO:web-platform-tests:Starting http server on web-platform.test:8000
INFO:web-platform-tests:Starting http server on web-platform.test:59632
INFO:web-platform-tests:Starting https server on web-platform.test:8443
Hey, it works! Let's open it with a browser!
Well that's not very interesting at all. Am I done now? Let's get back to JSDOM and run the tests.
yarn test
# ...
Cool! It's running tests. Thousands of them. While the tests are running and are heating my macbook, let's take a peak at our goal: Removing request
. Let's see where it is used.
Finding usages of request
The first, most naive way to find usages of request is to look for require("request")
and require("request-promise-native")
:
lib/jsdom/living/helpers/wrap-cookie-jar-for-request.js
"use strict";
const request = require("request");
module.exports = cookieJar => {
const jarWrapper = request.jar();
jarWrapper._jar = cookieJar;
return jarWrapper;
};
lib/jsdom/living/xhr/xhr-utils.js
// ...
const request = require("request");
// ...
const wrapCookieJarForRequest = require("../helpers/wrap-cookie-jar-for-request");
// ...
function doRequest() {
try {
const client = request(options);
if (hasBody && flag.formData) {
const form = client.form();
for (const entry of body) {
form.append(entry.name, entry.value, entry.options);
}
}
return client;
} catch (e) {
const client = new EventEmitter();
process.nextTick(() => client.emit("error", e));
return client;
}
}
/// ...
test/util.js
// ...
const request = require("request");
// ...
/**
* Reads a static fixture file as utf8.
* If running tests from node, the file will be read from the file system
* If running tests using karma, a http request will be performed to retrieve the file using karma's server.
* @param {string} relativePath Relative path within the test directory. For example "jsdom/files/test.html"
*/
exports.readTestFixture = relativePath => {
const useRequest = exports.inBrowserContext();
return exports.nodeResolverPromise(nodeResolver => {
if (useRequest) {
request.get(exports.getTestFixtureUrl(relativePath), { timeout: 5000 }, nodeResolver);
} else {
fs.readFile(path.resolve(__dirname, relativePath), { encoding: "utf8" }, nodeResolver);
}
})
// request passes (error, response, content) to the callback
// we are only interested in the `content`
.then(result => useRequest ? result[1] : result);
};
lib/jsdom/browser/resources/resource-loader.js
// ...
const request = require("request-promise-native");
const wrapCookieJarForRequest = require("../../living/helpers/wrap-cookie-jar-for-request");
// ...
fetch(urlString, options = {}) {
const url = parseURL(urlString);
// ...
switch (url.scheme) {
// ...
case "http":
case "https": {
const requestOptions = this._getRequestOptions(options);
return request(urlString, requestOptions);
}
// ...
}
}
test/web-platform-tests/start-wpt-server.js
// ...
const requestHead = require("request-promise-native").head;
// ...
function pollForServer(url) {
return requestHead(url, { strictSSL: false })
.then(() => {
// ...
}
Looks good! Looking for require('request')
yields no results, so I'll assume there is either a strict merge policy or some kind of linter ensuring double quoted strings are used everywhere.
There might be other ways request
or request-promise-native
is required. One could have aliased the require
to something else. Maybe someone used require("re" + "quest")
to mess with me. Maybe someone's using import
somewhere.
Instead of hunting other possible dependencies, let's try to fix the found dependencies first and re-run the tests.
Narrowing down the tests
Running all tests takes ages. However, I'm not sure how to narrow down the number of tests. While trying to figure out how to narrow down the number of tests, the test runner finally finished after 11 minutes.
Reading the contributing guidelines, it is mentioned you can run only JSDOM api tests, or even a set of tests for one specific function. Since the JSDOM API includes a fromUrl
function, I'll assume fromUrl
fetches the document using request
.
There is a test suite specifically for fromUrl
and based on the contributing guidelines, I can run it using yarn test-mocha test/api/from-url.js
. Let's try that.
$ yarn test-mocha test/api/from-url.js
yarn run v1.22.4
$ mocha test/api/from-url.js
API: JSDOM.fromURL()
✓ should return a rejected promise for a bad URL
✓ should return a rejected promise for a 404
✓ should return a rejected promise for a 500
✓ should use the body of 200 responses (54ms)
✓ should use the body of 301 responses
✓ should be able to handle gzipped bodies
✓ should send a HTML-preferring Accept header
✓ should send an Accept-Language: en header
user agent
✓ should use the default user agent as the User-Agent header when none is given
referrer
✓ should reject when passing an invalid absolute URL for referrer
✓ should not send a Referer header when no referrer option is given
✓ should use the supplied referrer option as a Referer header
✓ should canonicalize referrer URLs before using them as a Referer header
✓ should use the redirect source URL as the referrer, overriding a provided one
inferring options from the response
url
✓ should use the URL fetched for a 200
✓ should preserve full request URL
✓ should use the ultimate response URL after a redirect
✓ should preserve fragments when processing redirects
✓ should disallow passing a URL manually
contentType
✓ should use the content type fetched for a 200
✓ should use the ultimate response content type after a redirect
✓ should disallow passing a content type manually
cookie jar integration
✓ should send applicable cookies in a supplied cookie jar
✓ should store cookies set by the server in a supplied cookie jar
✓ should store cookies set by the server in a newly-created cookie jar
25 passing (234ms)
✨ Done in 1.09s.
Phew. That's better. One second. Let's first try to break these tests by changing the code that requires request
. I'm hoping these tests touches the request
-requires at some point.
The test messages also mention cookie jar. I'm hoping this is somehow related to lib/jsdom/living/helpers/wrap-cookie-jar-for-request.js
so we can test our changes in that file using this test.
Removing request from test/util.js
Before we can drop request
, we need a replacement. I'll be using node-fetch
. node-fetch
is a NodeJS implementation for the browser's Fetch API. I like the idea of using a library that implements an existing standard because even if you don't longer like or want to use the library, you can just replace the fetch library with some other fetch implementation.
Since JSDOM also runs in the browser, you can use the browser's Fetch implementation in the browser. Isn't that great?
npm install nod
-- oh right, we're using YARN now.
$ yarn install node-fetch
error `install` has been replaced with `add` to add new dependencies. Run "yarn add node-fetch" instead.
$ yarn add node-fetch
# ...
✨ Done in 7.80s.
Ok. Now, let's naively replace request with fetch somewhere. Let's start with test/util.js
, since I'll assume it's only used from tests. It is most likely the easiest one to replace.
test/util.js
// ...
const fetch = require("node-fetch");
// ...
exports.readTestFixture = relativePath => {
const useRequest = exports.inBrowserContext();
if (useRequest) {
const url = exports.getTestFixtureUrl(relativePath);
// timeout is a node-fetch specific extention.
fetch(url, { timeout: 5000 }).then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status ${response.status} fetching ${url}`);
}
return response.text();
});
} else {
return exports.nodeResolverPromise(nodeResolver => {
fs.readFile(path.resolve(__dirname, relativePath), { encoding: "utf8" }, nodeResolver);
});
}
};
Looks fine, I suppose. Let's run the tests.
$ yarn test-mocha test/api/from-url.js
yarn run v1.22.4
$ mocha test/api/from-url.js
# ...
25 passing (234ms)
✨ Done in 1.02s.
All tests are passing, but I don't know if the tests even touch my changes. Let's just throw inside the method.
test/util.js
exports.readTestFixture = relativePath => {
const useRequest = exports.inBrowserContext();
if (useRequest) {
throw new Error("???");
// ...
$ yarn test-mocha test/api/from-url.js
yarn run v1.22.4
$ mocha test/api/from-url.js
# ...
25 passing (234ms)
✨ Done in 1.02s.
No thrown errors or failing tests, so it's still not touching my changes. Let's run all API tests for good measure. Otherwise, I'll have to run all tests.
yarn test-api
# ...
419 passing (4s)
✨ Done in 4.56s.
Still no error. Let's run all tests until something goes bad. While the tests are running forever, let's CMD+F for readTestFixture
.
It looks like all occurences are in test/to-port-to-wpts
. CMD+F for to-port-to-wpts
yields to this result in the readme:
Ideally we would only use Mocha for testing the JSDOM API itself, from the outside. Unfortunately, a lot of web platform features are still tested using Mocha, instead of web-platform-tests. These are located in the
to-port-to-wpts
subdirectory.
So maybe running all mocha tests will trigger my intentional failure. While the main test suite is running, I run the mocha tests using yarn test-mocha
, hoping it will run faster.
After a minute, I cancelled the mocha runner since it there seems to be no obvious speed improvement by invoking mocha this way.
What about yarn test-mocha test/to-port-to-wpts/*.js
?
$ yarn test-mocha test/to-port-to-wpts/*.js
379 passing (6s)
1 pending
✨ Done in 9.78s.
That runs the tests, but the tests aren't failing. Confused, I read the jsdoc comment above the function:
test/util.js
/**
* Reads a static fixture file as utf8.
* If running tests from node, the file will be read from the file system
* If running tests using karma, a http request will be performed to retrieve the file using karma's server.
* @param {string} relativePath Relative path within the test directory. For example "jsdom/files/test.html"
*/
exports.readTestFixture = relativePath => {
So my error will only be thrown when running from inside a browser. Well, I don't need node-fetch
inside a browser, do I? I can just use window.fetch
, but I won't get the timeout, since the timeout
option isn't supported on window.fetch
.
How did request
implement the timeout? I suppose it uses XMLHttpRequest in the background and aborts after a certain amount of time. Let's ignore that for now and see if we can run the tests inside a browser. The jsdoc mentions karma
. Let's CMD+F karma
in the readmes.
Contributing.md
The mocha test cases are executed in Chrome using karma. Currently, web platform tests are not executed in the browser yet.
To run all browser tests:
yarn test-browser
To run the karma tests in an iframe:
yarn test-browser-iframe
To run the karma tests in a web worker:
yarn test-browser-worker
Sure. Let's try that.
$ yarn test-browser
yarn run v1.22.4
$ yarn test-browser-iframe && yarn test-browser-worker
$ karma start test/karma.conf.js
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4) ERROR
Uncaught Error: ???
at /var/folders/bf/29ljwt3s4dscb7tdd2z5zz0h0000gn/T/test/util.js:162:1 <- /var/folders/bf/29ljwt3s4dscb7tdd2z5zz0h0000gn/T/91efe4665a6210ee2f5edcae3a8f463c.browserify.js:293540:5
Error: ???
at exports.readTestFixture (/var/folders/bf/29ljwt3s4dscb7tdd2z5zz0h0000gn/T/test/util.js:162:1 <- /var/folders/bf/29ljwt3s4dscb7tdd2z5zz0h0000gn/T/91efe4665a6210ee2f5edcae3a8f463c.browserify.js:293540:11)
[...]
My ???
error is thrown! Now, let's retry without the intentional failure.
$ yarn test-browser
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4) jsdom/namespaces should set namespaces in HTML documents created by jsdom.env() FAILED
TypeError: Cannot read property 'then' of undefined
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4) jsdom/namespaces should set namespace-related properties in HTML documents created by innerHTML FAILED
TypeError: Cannot read property 'then' of undefined
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4) jsdom/namespaces should set namespace-related properties in HTML-SVG documents created by jsdom.env() FAILED
TypeError: Cannot read property 'then' of undefined
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4) jsdom/namespaces should set namespace-related properties in HTML-SVG documents created by innerHTML FAILED
TypeError: Cannot read property 'then' of undefined
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4) jsdom/parsing real-world page with < inside a text node (GH-800) FAILED
TypeError: Cannot read property 'then' of undefined
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4) jsdom/xml should ignore self-closing of tags in html docs FAILED
TypeError: Cannot read property 'then' of undefined
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4) jsdom/xml should handle self-closing tags properly in xml docs FAILED
TypeError: Cannot read property 'then' of undefined
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4): Executed 1209 of 2460 (7 FAILED) (skipped 1251) (7.437 secs / 6.708 secs)
TOTAL: 7 FAILED, 1202 SUCCESS
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with exit code 1.
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
Failures! TypeError: Cannot read property 'then' of undefined
? Oh... i forgot to return
. Oops.
test/util.js
if (useRequest) {
const url = exports.getTestFixtureUrl(relativePath);
// timeout is a node-fetch specific extension
return fetch(url, { timeout: 5000 }).then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status ${response.status} fetching ${url}`);
}
return response.text();
});
}
$ yarn test-browser
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4): Executed 1209 of 2460 (skipped 1251) SUCCESS (7.497 secs / 6.723 secs)
TOTAL: 1209 SUCCESS
That's great! Now, since it's run inside a browser, let's drop the node-fetch
requirement and use the browser's fetch
.
test/util.js
if (exports.inBrowserContext()) {
return fetch(exports.getTestFixtureUrl(relativePath)).then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status ${response.status} fetching ${response.location}`);
}
return response.text();
});
}
$ yarn test-browser
[...]
HeadlessChrome 81.0.4044 (Mac OS X 10.15.4): Executed 1209 of 2460 (skipped 1251) SUCCESS (7.561 secs / 6.812 secs)
TOTAL: 1209 SUCCESS
Great. The best dependency is the one not being used, am I right?
Removing request from test/web-platform-tests/start-wpt-server.js
The second request
usage by tests is inside test/web-platform-tests/start-wpt-server.js
.
test/web-platform-tests/start-wpt-server.js
// ...
const requestHead = require("request-promise-native").head;
// ...
function pollForServer(url) {
return requestHead(url, { strictSSL: false })
.then(() => {
console.log(`WPT server at ${url} is up!`);
return url;
})
.catch(err => {
console.log(`WPT server at ${url} is not up yet (${err.message}); trying again`);
return new Promise(resolve => {
setTimeout(() => resolve(pollForServer(url)), 500);
});
});
}
Based on the name of the file and some of the error messages, it looks like this code is used to check whether WPT is running. This code is used at the start of the test runner. That should be easy enough to test. Let's replace request
with node-fetch
.
The strictSSL
option is no part of the Fetch standard, but stack overflow tells me I can use rejectUnauthorized: false
instead.
const fetch = require("node-fetch");
const https = require("https");
// ...
const httpsAgent = new https.Agent({
rejectUnauthorized: false,
});
function pollForServer(url) {
const agent = url.startsWith("https")
? new https.Agent({ rejectUnauthorized: false })
: null;
return fetch(url, { method: "HEAD", agent })
.then(({ ok, status }) => {
if (!ok) {
throw new Error(`Unexpected status=${status}`);
}
console.log(`WPT server at ${url} is up!`);
return url;
})
.catch(err => {
console.log(`WPT server at ${url} is not up yet (${err.message}); trying again`);
return new Promise(resolve => {
setTimeout(() => resolve(pollForServer(url)), 500);
});
});
}
I've added throw new Error("Foo")
(not shown above) to intentionally fail at first. Let's run the tests and see if they fail. I'll assume they fail early, so I'll run all tests.
$ yarn test
[...]
1) "before all" hook: $mochaNoSugar in "{root}"
2) "after all" hook: $mochaNoSugar in "{root}"
0 passing (22ms)
2 failing
1) "before all" hook: $mochaNoSugar in "{root}":
Error: foo
I was right. Let's kill it and retry without the intentional failure.
$ yarn test
[...]
The tests are running again. I let them run, but I assume my change is fine.
Removing request from lib/jsdom/browser/resources/resource-loader.js
Now that the test utilities are fixed, let's get our hands dirty on the lib code. There are only 2 files where request
is actually invoked. The 3rd is only a helper:
lib/jsdom/living/helpers/wrap-cookie-jar-for-request.js
"use strict";
const request = require("request");
module.exports = cookieJar => {
const jarWrapper = request.jar();
jarWrapper._jar = cookieJar;
return jarWrapper;
};
Since this helper is a dependency of the other 2 files, I'll look at the helper last. Let's first look at resource-loader
.
lib/jsdom/browser/resources/resource-loader.js
// ...
const request = require("request-promise-native");
const wrapCookieJarForRequest = require("../../living/helpers/wrap-cookie-jar-for-request");
// ...
_getRequestOptions({ cookieJar, referrer, accept = "*/*" }) {
const requestOptions = {
encoding: null,
gzip: true,
jar: wrapCookieJarForRequest(cookieJar),
strictSSL: this._strictSSL,
proxy: this._proxy,
forever: true,
headers: {
"User-Agent": this._userAgent,
"Accept-Language": "en",
Accept: accept
}
};
if (referrer && !IS_BROWSER) {
requestOptions.headers.referer = referrer;
}
return requestOptions;
}
// ...
fetch(urlString, options = {}) {
const url = parseURL(urlString);
// ...
switch (url.scheme) {
// ...
case "http":
case "https": {
const requestOptions = this._getRequestOptions(options);
return request(urlString, requestOptions);
}
// ...
}
}
Seems easy enough. Let's convert the request options to fetch options.
-
encoding: null
: This causes request to return a buffer. Withnode-fetch
, we might be able to useresponse.arrayBuffer()
for that. -
jar: wrapCookieJarForRequest(cookieJar)
: Somehow cookies are reused this way. ThecookieJar
variable is converted to a request-compatible cookie jar to allow keeping track of cookies. I don't know iffetch
has features like this. I suppose we can just manually read/write the cookies. -
strictSSL: this._strictSSL
: Just like before, use the HTTPS agent withrejectUnauthorized
. -
proxy: this._proxy
: Enables proxy. There is no obvious way to implement this innode-fetch
. I also don't know what's inthis._proxy
. I might need to usehttps-proxy-agent
for this. -
forever: true
: Sets keepAlive on the HTTPS agent. Since we're replacing the agent anyway, we might as well setkeepAlive: true
for both http and https agents.
Let's make a first attempt to implement resource-loader's fetch function using fetch instead of request. Because I don't know how to implement the proxy or cookies, I'll ignore those for now.
lib/jsdom/browser/resources/resource-loader.js
_getFetchOptions({ cookieJar, referrer, accept = "*/*" }) {
/** @type RequestInit */
const fetchOptions = {};
// I don't know what these variables hold exactly - let's log them!
console.log("cookieJar", cookieJar);
console.log("this._proxy", this._proxy);
fetchOptions.headers = {
"User-Agent": this._userAgent,
"Accept-Language": "en",
Accept: accept,
};
if (!IS_BROWSER) {
const httpAgent = new http.Agent({ keepAlive: true });
const httpsAgent = new https.Agent({ keepAlive: true, rejectUnauthorized: this._strictSSL });
fetchOptions.headers.referrer = referrer;
fetchOptions.agent = (url) => url.protocol == 'http:' ? httpAgent : httpsAgent;
}
return fetchOptions;
}
// ...
case "http":
case "https": {
return fetch(urlString, this._getFetchOptions(options))
.then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status=${response.status} for ${urlString}`);
}
return response.arrayBuffer();
})
}
Let's run the tests and see the mess I've created. I get a lot of failures from the tests, as expected. Some are related to cookies. The console.log
s look like this:
cookieJar CookieJar { enableLooseMode: true, store: { idx: {} } }
this._proxy undefined
cookieJar CookieJar { enableLooseMode: true, store: { idx: {} } }
this._proxy http://127.0.0.1:51388
So the proxy is just a URL. I'm not sure how to implement the proxy from fetch, if it is even possible. I suppose I can use a proxy agent on the server, but I don't know an alternative for the browser.
The cookie jar is still a mystery. Since package.json
mentions tough-cookie
, I'll just assume the cookie jar is from that library. I'm just going to assume this is also only used server-side, since the browser's fetch handles cookies automatically.
To add tough-cookie
's cookie-jar to node-fetch
, I'm going to use a library called fetch-cookie
. fetch-cookie
has no other dependencies except for tough-cookie
so it can be used independently from Fetch implementations. fetch-cookie
is also pretty small: about 50 lines of code.
yarn add fetch-cookie
lib/jsdom/browser/resources/resource-loader.js
_getFetchOptions({ cookieJar, referrer, accept = "*/*" }) {
/** @type RequestInit */
const fetchOptions = {};
// I don't know what these variables hold exactly - let's log them!
console.log("cookieJar", cookieJar);
console.log("this._proxy", this._proxy);
fetchOptions.headers = {
"User-Agent": this._userAgent,
"Accept-Language": "en",
"Accept-Encoding": "gzip",
Accept: accept,
};
if (!IS_BROWSER) {
const httpAgent = new http.Agent({ keepAlive: true });
const httpsAgent = new https.Agent({ keepAlive: true, rejectUnauthorized: this._strictSSL });
fetchOptions.headers.referrer = referrer;
fetchOptions.agent = (url) => url.protocol == 'http:' ? httpAgent : httpsAgent;
}
return fetchOptions;
}
// ...
case "http":
case "https": {
const cookieJar = options.cookieJar;
cookieJar.__setCookie = cookieJar.setCookie;
cookieJar.setCookie = (...args) => {
if (args.length === 3) {
args.splice(2, 0, {});
}
if (args.length === 4) {
args[2].ignoreError = true;
}
return cookieJar.__setCookie(...args);
}
const targetFetch = fetchCookie(fetch, cookieJar);
const fetchOptions = this._getFetchOptions(options);
return targetFetch(urlString, fetchOptions)
.then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status=${response.status} for ${urlString}`);
}
return response.arrayBuffer();
});
}
I got a lot of errors when handling the cookies. Turns out, when adding cookies, the request
library sets ignoreError
on true
by default (like a browser would do), but fetch-cookie
doesn't allow you to change the options when setting cookies.
To "fix" this, I hijacked the setCookie
function to silence the errors, only to get different errors. I'll find a proper fix later.
1) Cookie processing
document.cookie
reflects back cookies set from the server while requesting the page:
TypeError: Cannot read property 'headers' of undefined
at /Users/hinloopen/Projects/Github/jsdom/lib/api.js:138:28
at processTicksAndRejections (internal/process/task_queues.js:93:5)
Let's see what's inside lib/api.js
:
lib/api.js
const req = resourceLoaderForInitialRequest.fetch(url, {
accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
cookieJar: options.cookieJar,
referrer: options.referrer
});
return req.then(body => {
const res = req.response;
options = Object.assign(options, {
url: req.href + originalHash,
contentType: res.headers["content-type"],
referrer: req.getHeader("referer")
});
return new JSDOM(body, options);
});
So that's interesting. Apparently, the promise returned by request-promise
not only has a .then
method, it also has a .response
property containing the response. I didn't know that, and I don't see it documented anywhere on the request-promise
readme. I would just have used resolveWithFullResponse
but whatever.
Let's see if we can replicate this behavior.
We need to return a promise-like object that has a .then
and a .catch
(like a promise), but it also needs to have a .response
getter, .href
getter, and a .getHeader
function.
Again, quick and dirty, let's make it work the easiest way possible.
lib/jsdom/browser/resources/resource-loader.js
const cookieJar = options.cookieJar;
cookieJar.__setCookie = cookieJar.setCookie;
cookieJar.setCookie = (...args) => { /* ... */ }
const targetFetch = fetchCookie(fetch, cookieJar);
const fetchOptions = this._getFetchOptions(options);
const fetchResult = targetFetch(urlString, fetchOptions);
let result;
result = {
response: null,
href: urlString,
then: fetchResult.then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status=${response.status} for ${urlString}`);
}
result.response = response;
return response.arrayBuffer();
}).then.bind(fetchResult),
catch: fetchResult.catch.bind(fetchResult),
getHeader(name) {
return fetchOptions.headers[name];
}
};
return result;
The previously failing test now succeeds, but many others still fail. Let's fix the next one:
1) Cookie processing
should share cookies when a cookie jar is shared:
TypeError: Cannot read property 'innerHTML' of null
at /Users/hinloopen/Projects/Github/jsdom/test/api/cookies.js:288:75
at processTicksAndRejections (internal/process/task_queues.js:93:5)
test/api/cookies.js
it("should share cookies when a cookie jar is shared", () => {
const cookieJar = new CookieJar();
return JSDOM.fromURL(testHost + "/TestPath/set-cookie-from-server", { cookieJar }).then(() => {
return JSDOM.fromURL(testHost + "/TestPath/html-get-cookie-header", { cookieJar });
}).then(({ window }) => {
const cookieHeader = window.document.querySelector(".cookie-header").innerHTML;
assertCookies(cookieHeader, [
"Test1=Basic",
"Test2=PathMatch",
"Test6=HttpOnly",
"Test9=Duplicate",
"Test10={\"prop1\":5,\"prop2\":\"value\"}",
"Malformed"
]);
assertCookies(window.document.cookie, [
"Test1=Basic",
"Test2=PathMatch",
"Test9=Duplicate",
"Test10={\"prop1\":5,\"prop2\":\"value\"}",
"Malformed"
]);
});
});
So the .cookie-header
element couldn't be found in the /html-get-cookie-header
page. Maybe there is a hint somewhere in the document's HTML. Let's log window.document.body.innerHTML
using console.log({ html: window.document.body.innerHTML });
{ html: '[object Response]' }
I strongly suspect somewhere inside my new fetch implementation, the HTML body's toString
returns "[object Response]"
. Let's check our implementation again.
lib/jsdom/browser/resources/resource-loader.js
const fetchOptions = this._getFetchOptions(options);
const fetchPromise = targetFetch(urlString, fetchOptions);
let result;
const then = function(onfulfilled, onrejected) {
return fetchPromise.then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status=${response.status} for ${urlString}`);
}
result.response = response;
return response.arrayBuffer();
}).then(onfulfilled, onrejected);
};
result = {
response: null,
href: urlString,
then,
catch: function(onrejected) { return then(undefined, onrejected) },
getHeader(name) {
return fetchOptions.headers[name];
}
};
return result;
Now we get, yet again, different errors. One includes The "buf" argument must be one of type Buffer, TypedArray, or DataView. Received type object
. I suspect this has to do with the ArrayBuffer
returned by node-fetch
: This is NOT the same as a NodeJS Buffer
. Let's make it a Buffer
for NodeJS only:
lib/jsdom/browser/resources/resource-loader.js
const then = function(onfulfilled, onrejected) {
return fetchPromise.then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status=${response.status} for ${urlString}`);
}
result.response = response;
return response.arrayBuffer();
})
.then((arrayBuffer) => {
if (typeof Buffer === "undefined") {
return arrayBuffer;
} else {
return Buffer.from(arrayBuffer);
}
})
.then(onfulfilled, onrejected);
};
The next error I encounter is this one:
1) API: resource loading configuration
set to "usable"
canceling requests
should abort a script request (with no events) when stopping the window:
TypeError: openedRequest.abort is not a function
at RequestManager.close (lib/jsdom/browser/resources/request-manager.js:25:21)
at Window.stop (lib/jsdom/browser/Window.js:608:15)
at /Users/hinloopen/Projects/Github/jsdom/test/api/resources.js:559:20
at processTicksAndRejections (internal/process/task_queues.js:93:5)
.abort
is not a function. Is openedRequest
our fetch result?
lib/jsdom/browser/resources/request-manager.js
/**
* Manage all the request and it is able to abort
* all pending request.
*/
module.exports = class RequestManager {
// ...
close() {
for (const openedRequest of this.openedRequests) {
openedRequest.abort();
}
this.openedRequests = [];
}
// ...
};
Let's implement .abort
, make it do nothing, and see if the error changes.
lib/jsdom/browser/resources/resource-loader.js
result = {
response: null,
abort: () => { console.log("TODO ABORT"); },
href: urlString,
then,
catch: function(onrejected) { return then(undefined, onrejected) },
getHeader(name) {
return fetchOptions.headers[name];
}
};
TODO ABORT
Error: Could not load script: "http://127.0.0.1:58978/"
1) API: resource loading configuration
set to "usable"
canceling requests
should abort a script request (with no events) when stopping the window:
The error event must not fire
+ expected - actual
-true
+false
at /Users/hinloopen/Projects/Github/jsdom/test/api/resources.js:920:12
at async Promise.all (index 0)
at async /Users/hinloopen/Projects/Github/jsdom/test/api/resources.js:561:9
Right, time to properly implement .abort
. Can we even implement .abort
using the browser's Fetch API? According to MDN, it is experimental technology. Browser support might be incomplete, but I suspect it's only used in NodeJS anyway.
node-fetch
also supports aborting requests, and it is implemented in the same way! It requires an AbortController
implementation - abort-controller
is suggested.
sh
yarn add abort-controller
lib/jsdom/browser/resources/resource-loader.js
const AbortController = require("abort-controller");
// ...
const targetFetch = fetchCookie(fetch, cookieJar);
const fetchOptions = this._getFetchOptions(options);
const abortController = new AbortController();
fetchOptions.signal = abortController.signal;
const fetchPromise = targetFetch(urlString, fetchOptions);
let result;
const then = function(onfulfilled, onrejected) {
return fetchPromise.then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status=${response.status} for ${urlString}`);
}
result.response = response;
return response.arrayBuffer();
})
.then((arrayBuffer) => typeof Buffer === "undefined" ? arrayBuffer : Buffer.from(arrayBuffer))
.then(onfulfilled, onrejected);
};
result = {
response: null,
abort: () => { abortController.abort(); },
href: urlString,
then,
catch: function(onrejected) { return then(undefined, onrejected) },
getHeader(name) {
return fetchOptions.headers[name];
}
};
Using abort still throws an error, causing the test to fail:
Error: Could not load script: "http://127.0.0.1:61567/"
# ...
type: 'aborted',
message: 'The user aborted a request.'
# ...
1) API: resource loading configuration
set to "usable"
canceling requests
should abort a script request (with no events) when stopping the window:
The error event must not fire
+ expected - actual
-true
+false
I'm not sure how request
would have handled the abort, but based on this failure, it wasn't by throwing an error. I can't find any documentation about it. The source seems to just cancel the request and destroy the response without throwing an error. Maybe the promise just never resolves?
Let's implement it that way, see if it works.
lib/jsdom/browser/resources/resource-loader.js
let aborted = false;
let result;
const then = function(onfulfilled, onrejected) {
return fetchPromise.then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status=${response.status} for ${urlString}`);
}
result.response = response;
return response.arrayBuffer();
})
.then((arrayBuffer) => typeof Buffer === "undefined" ? arrayBuffer : Buffer.from(arrayBuffer))
.then((result) => { if (!aborted) return onfulfilled(result); })
.catch((error) => { if (!aborted) return onrejected(error); });
};
result = {
response: null,
abort: function() {
aborted = true;
abortController.abort();
},
href: urlString,
then,
catch: function(onrejected) {
return then(undefined, onrejected)
},
getHeader(name) {
return fetchOptions.headers[name];
}
};
A lot of green tests this round! Looking good. Still, there are tens of failing tests, some mentioning the proxy. Others mentioning the Referer
header.
It looks like I assigned the referrer to a header named Referrer
instead of Referer
. Let's fix that and look at the next error.
lib/jsdom/browser/resources/resource-loader.js
// inside _getFetchOptions
if (!IS_BROWSER) {
const httpAgent = new http.Agent({ keepAlive: true });
const httpsAgent = new https.Agent({ keepAlive: true, rejectUnauthorized: this._strictSSL });
if (referrer) {
fetchOptions.headers.referer = referrer;
}
fetchOptions.agent = (url) => url.protocol == 'http:' ? httpAgent : httpsAgent;
}
The other two errors are going to be a problem, and are related to redirects:
1) Cookie processing
sent with requests
should gather cookies from redirects (GH-1089):
AssertionError: expected [ 'Test3=Redirect3' ] to deeply equal [ Array(3) ]
+ expected - actual
[
+ "Test1=Redirect1"
+ "Test2=Redirect2"
"Test3=Redirect3"
]
at assertCookies (test/api/cookies.js:383:10)
at /Users/hinloopen/Projects/Github/jsdom/test/api/cookies.js:247:9
at processTicksAndRejections (internal/process/task_queues.js:93:5)
2) API: JSDOM.fromURL()
referrer
should use the redirect source URL as the referrer, overriding a provided one:
AssertionError: expected 'http://example.com/' to equal 'http://127.0.0.1:55863/1'
+ expected - actual
-http://example.com/
+http://127.0.0.1:55863/1
at /Users/hinloopen/Projects/Github/jsdom/test/api/from-url.js:135:14
at processTicksAndRejections (internal/process/task_queues.js:93:5)
fetch
uses transparent redirects, and it appears that fetch-cookie
doesn't store cookies around redirects. Reading the documentation, there is actually a fix for that. Let's apply that fix.
It looks like it is as easy as changing the require to const fetchCookie = require('fetch-cookie/node-fetch');
. Let's do that, and re-run the tests.
1) API: JSDOM.fromURL()
referrer
should use the redirect source URL as the referrer, overriding a provided one:
AssertionError: expected 'http://example.com/' to equal 'http://127.0.0.1:56188/1'
+ expected - actual
-http://example.com/
+http://127.0.0.1:56188/1
The other error is gone. Now let's see how we fix this one. I can make an educated guess what's being tested here, but let's look at the source.
it("should use the redirect source URL as the referrer, overriding a provided one", async () => {
const [requestURL] = await redirectServer("<p>Hello</p>", { "Content-Type": "text/html" });
const dom = await JSDOM.fromURL(requestURL, { referrer: "http://example.com/" });
assert.strictEqual(dom.window.document.referrer, requestURL);
});
So... it's checking document.referrer
. I've no idea where this is assigned and I don't want to find out. Instead, since this test is testing JSDOM.fromURL
specifically, let's see if JSDOM.fromURL
assigns the referrer
somewhere.
lib/api.js
static fromURL(url, options = {}) {
return Promise.resolve().then(() => {
// Remove the hash while sending this through the research loader fetch().
// It gets added back a few lines down when constructing the JSDOM object.
const parsedURL = new URL(url);
const originalHash = parsedURL.hash;
parsedURL.hash = "";
url = parsedURL.href;
options = normalizeFromURLOptions(options);
const resourceLoader = resourcesToResourceLoader(options.resources);
const resourceLoaderForInitialRequest = resourceLoader.constructor === NoOpResourceLoader ?
new ResourceLoader() :
resourceLoader;
const req = resourceLoaderForInitialRequest.fetch(url, {
accept: "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
cookieJar: options.cookieJar,
referrer: options.referrer
});
return req.then(body => {
const res = req.response;
options = Object.assign(options, {
url: req.href + originalHash,
contentType: res.headers["content-type"],
referrer: req.getHeader("referer")
});
return new JSDOM(body, options);
});
});
}
Interesting - it uses this req.getHeader("referer")
. req
is the object I'm returning, so it actually calls my getHeader
function. This function returns the header of the first request.
This is a problem: Because the request was redirected, a new request was started. However, my getHeader
fetches the header of the first request, not the last request in the redirect chain.
This is also an issue for req.href
, which returns the first request URL, not the last, but I haven't confirmed a failing test for this problem.
Let's see if we can peek into the redirect requests. Since fetch-cookie
also fixed this problem for assigning cookies, I bet their fix shows how you can peek into redirect requests. Let's take a look at fetch-cookie/node-fetch
fetch-cookie's node-fetch.js
module.exports = function nodeFetchCookieDecorator (nodeFetch, jar) {
const fetchCookie = require('./')(nodeFetch, jar)
return function nodeFetchCookie (url, userOptions = {}) {
const opts = Object.assign({}, userOptions, { redirect: 'manual' })
// Forward identical options to wrapped node-fetch but tell to not handle redirection.
return fetchCookie(url, opts)
.then(res => {
const isRedirect = (res.status === 303 || res.status === 301 || res.status === 302 || res.status === 307)
// Interpret the proprietary "redirect" option in the same way that node-fetch does.
if (isRedirect && userOptions.redirect !== 'manual' && userOptions.follow !== 0) {
const statusOpts = {
// Since the "follow" flag is not relevant for node-fetch in this case,
// we'll hijack it for our internal bookkeeping.
follow: userOptions.follow !== undefined ? userOptions.follow - 1 : undefined
}
if (res.status !== 307) {
statusOpts.method = 'GET'
statusOpts.body = null
}
const redirectOpts = Object.assign({}, userOptions, statusOpts)
return nodeFetchCookie(res.headers.get('location'), redirectOpts)
} else {
return res
}
})
}
}
So basically, their fix is to set the redirect-mode to manual
and just call fetch
again for every redirect. Because it calls fetch
for every redirect, the cookies can be assigned and extracted every request by fetch-cookie
.
The easiest way to keep track of all the redirect requests without also interfering with fetch-cookie
's fix is by wrapping the node-fetch
instance, keeping track of the last request.
Let's try that.
lib/jsdom/browser/resources/resource-loader.js
_getFetchOptions({ accept = "*/*" }) {
/** @type RequestInit */
const fetchOptions = {};
fetchOptions.headers = {
"User-Agent": this._userAgent,
"Accept-Language": "en",
"Accept-Encoding": "gzip",
Accept: accept,
};
if (!IS_BROWSER) {
const httpAgent = new http.Agent({ keepAlive: true });
const httpsAgent = new https.Agent({ keepAlive: true, rejectUnauthorized: this._strictSSL });
fetchOptions.agent = (url) => url.protocol == 'http:' ? httpAgent : httpsAgent;
}
return fetchOptions;
}
// inside fetch(urlString, options = {})
let lastUrl = options.referrer;
let lastOpts = null;
const myFetch = (url, opts) => {
if (lastUrl && !IS_BROWSER) {
opts.headers.referer = lastUrl;
}
lastUrl = url;
lastOpts = opts;
return fetch(url, opts);
};
const targetFetch = fetchCookie(myFetch, cookieJar);
const fetchOptions = this._getFetchOptions(options);
const abortController = new AbortController();
fetchOptions.signal = abortController.signal;
const fetchPromise = targetFetch(urlString, fetchOptions);
let aborted = false;
let result;
const then = function(onfulfilled, onrejected) {
return fetchPromise.then((response) => {
if (!response.ok) {
throw new Error(`Unexpected status=${response.status} for ${urlString}`);
}
result.response = response;
result.href = lastUrl;
return response.arrayBuffer();
})
.then((arrayBuffer) => typeof Buffer === "undefined" ? arrayBuffer : Buffer.from(arrayBuffer))
.then((result) => { if (!aborted) return onfulfilled(result); })
.catch((error) => {
if (!aborted) {
if (onrejected) {
return onrejected(error);
} else {
throw error;
}
}
});
};
result = {
response: null,
abort: function() {
aborted = true;
abortController.abort();
},
href: null,
then,
catch: function(onrejected) {
return then(undefined, onrejected)
},
getHeader(name) {
return lastOpts.headers[name];
}
};
return result;
So we now have fetch
, myFetch
and targetFetch
. Bad variable names aside, the redirect-related failures seem solved. Let's run the tests and look at the next errors.
# ...
with a Content-Type header specifying csiso88598e
1) should sniff no-bom-charset-http-equiv-no-quotes.html as ISO-8859-8
2) should sniff no-bom-charset-http-equiv-tis-620.html as ISO-8859-8
3) should sniff no-bom-charset-koi8.html as ISO-8859-8
4) should sniff no-bom-charset-utf-16.html as ISO-8859-8
5) should sniff no-bom-charset-utf-16be.html as ISO-8859-8
6) should sniff no-bom-charset-utf-16le.html as ISO-8859-8
7) should sniff no-bom-no-charset.html as ISO-8859-8
# ...
2) API: encoding detection
fromURL
with a Content-Type header specifying csiso88598e
should sniff no-bom-charset-http-equiv-tis-620.html as ISO-8859-8:
AssertionError: expected 'windows-874' to equal 'ISO-8859-8'
+ expected - actual
-windows-874
+ISO-8859-8
# ...
I have questions. Maybe the test provides some details.
test/api/encoding.js
describe("fromURL", { skipIfBrowser: true }, () => {
let server;
let host;
before(() => {
return createServer((req, res) => {
const [, fixture, query] = /^\/([^?]+)(\?.*)?$/.exec(req.url);
const headers = { "Content-Type": "text/html" };
if (query === "?charset=csiso88598e") {
headers["Content-Type"] = "text/html;charset=csiso88598e";
}
res.writeHead(200, headers);
fs.createReadStream(fixturePath(fixture)).pipe(res);
}).then(s => {
server = s;
host = `http://127.0.0.1:${s.address().port}`;
});
});
after(() => server.destroy());
describe("with no Content-Type header given", () => {
for (const encodingFixture of Object.keys(encodingFixtures)) {
const { name, body } = encodingFixtures[encodingFixture];
it(`should sniff ${encodingFixture} as ${name}`, () => {
return JSDOM.fromURL(`${host}/${encodingFixture}`).then(dom => {
assert.strictEqual(dom.window.document.characterSet, name);
assert.strictEqual(dom.window.document.body.textContent, body);
});
});
}
});
describe("with a Content-Type header specifying csiso88598e", () => {
for (const encodingFixture of Object.keys(encodingFixtures)) {
const { nameWhenOverridden, bodyWhenOverridden } = encodingFixtures[encodingFixture];
it(`should sniff ${encodingFixture} as ${nameWhenOverridden}`, () => {
return JSDOM.fromURL(`${host}/${encodingFixture}?charset=csiso88598e`).then(dom => {
assert.strictEqual(dom.window.document.characterSet, nameWhenOverridden);
assert.strictEqual(dom.window.document.contentType, "text/html"); // encoding should be stripped
if (bodyWhenOverridden) {
assert.strictEqual(dom.window.document.body.textContent, bodyWhenOverridden);
}
});
});
}
});
});
Looking at other tests, this csiso88598e
content-type is also tested when invoking the constructir directly, and the expectations are similar, and these tests are not failing:
constructor, given binary data
with a contentType option specifying csiso88598e
Buffer
✓ should sniff no-bom-charset-http-equiv-no-quotes.html as ISO-8859-8
✓ should sniff no-bom-charset-http-equiv-tis-620.html as ISO-8859-8
✓ should sniff no-bom-charset-koi8.html as ISO-8859-8
✓ should sniff no-bom-charset-utf-16.html as ISO-8859-8
✓ should sniff no-bom-charset-utf-16be.html as ISO-8859-8
✓ should sniff no-bom-charset-utf-16le.html as ISO-8859-8
✓ should sniff no-bom-no-charset.html as ISO-8859-8
✓ should sniff utf-8-bom.html as UTF-8
✓ should sniff utf-16be-bom.html as UTF-16BE
✓ should sniff utf-16le-bom.html as UTF-16LE
fromURL
with no Content-Type header given
✓ should sniff no-bom-charset-http-equiv-no-quotes.html as ISO-8859-5 (48ms)
✓ should sniff no-bom-charset-http-equiv-tis-620.html as windows-874
✓ should sniff no-bom-charset-koi8.html as KOI8-R
✓ should sniff no-bom-charset-utf-16.html as UTF-8
✓ should sniff no-bom-charset-utf-16be.html as UTF-8
✓ should sniff no-bom-charset-utf-16le.html as UTF-8
✓ should sniff no-bom-no-charset.html as windows-1252
✓ should sniff utf-8-bom.html as UTF-8
✓ should sniff utf-16be-bom.html as UTF-16BE
✓ should sniff utf-16le-bom.html as UTF-16LE
with a Content-Type header specifying csiso88598e
1) should sniff no-bom-charset-http-equiv-no-quotes.html as ISO-8859-8
2) should sniff no-bom-charset-http-equiv-tis-620.html as ISO-8859-8
3) should sniff no-bom-charset-koi8.html as ISO-8859-8
4) should sniff no-bom-charset-utf-16.html as ISO-8859-8
5) should sniff no-bom-charset-utf-16be.html as ISO-8859-8
6) should sniff no-bom-charset-utf-16le.html as ISO-8859-8
7) should sniff no-bom-no-charset.html as ISO-8859-8
Correctly handling this csiso88598e
content-type should be done by the constructor. Looking at the source and the tests, the constructor accepts a contentType
option that, when equal to csiso88598e
, parses the response as ISO-8859-8
.
Additionally, the test-server returns a Content-Type: text/html;charset=csiso88598e
header. This content-type should be passed to the JSDOM constructor from fromURL
:
lib/api.js
static fromURL(url, options = {}) {
return Promise.resolve().then(() => {
return req.then(body => {
const res = req.response;
options = Object.assign(options, {
url: req.href + originalHash,
contentType: res.headers["content-type"],
referrer: req.getHeader("referer")
});
return new JSDOM(body, options);
});
});
}
Let's take a look at res.headers
inside one of the failing tests using console.log(res.headers, res.headers["content-type"]);
:
Headers {
[Symbol(map)]: [Object: null prototype] {
'content-type': [ 'text/html;charset=csiso88598e' ],
date: [ 'Mon, 29 Jun 2020 20:44:07 GMT' ],
connection: [ 'keep-alive' ],
'transfer-encoding': [ 'chunked' ]
}
} undefined
So the content-type is there, but res.headers["content-type"]
is undefined. That's because res.headers
is not a regular object, but instead is a Headers object. Apperently, you cannot use the []
operator to access the Header
's properties. Instead, you should use .get
.
For backwards compatibility, let's change response
to have a headers
property that's just a plain JS object.
lib/jsdom/browser/resources/resource-loader.js
// inside `then`
const { ok, status } = response;
if (!ok) {
throw new Error(`Unexpected status=${status} for ${urlString}`);
}
const headers = {};
for (const [ key, value ] of response.headers) {
headers[key] = value;
}
result.response = {
status,
headers,
};
result.href = lastUrl;
return response.arrayBuffer();
All encoding-related tests are now green. Let's see what's next. There a lot less failures now, so waiting for a failing test now takes minutes.
There are some interesting failures. A common one is a maximum call stack size exceeded error in setCookie
:
RangeError: Maximum call stack size exceeded
at Array.values (<anonymous>)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resource-loader.js:148:28)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resource-loader.js:148:28)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resource-loader.js:148:28)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resource-loader.js:148:28)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resource-loader.js:148:28)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resource-loader.js:148:28)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resource-loader.js:148:28)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resource-loader.js:148:28)
at CookieJar.cookieJar.setCookie [as __setCookie] (/Users/hinloopen/Projects/Github/jsdom/lib/jsdom/browser/resources/resou
Another one is mentioning the proxy, which I have not yet implemented:
1) API: resource loading configuration
With a custom resource loader
should be able to customize the proxy option:
AssertionError: expected 1 to equal 3
+ expected - actual
-1
+3
A timeout:
2) web-platform-tests
cors
credentials-flag.htm:
Error: Error: test harness should not timeout: cors/credentials-flag.htm
And cookies being sent for preflight requests:
31) web-platform-tests
xhr
access-control-preflight-request-must-not-contain-cookie.htm:
Failed in "Preflight request must not contain any cookie header":
assert_unreached: Unexpected error. Reached unreachable code
There might also be some other errors in between, but the logs are full with the setCookie stacktraces, so let's first fix that one.
It seems that the cookieJar keeps being patched over and over again, which was not my intention. Fixing this should fix the stack-level-too-deep error, and it might also fix the timeout error.
Let's add a check to ensure the cookieJar is only patched once:
lib/jsdom/browser/resources/resource-loader.js
// inside `fetch(urlString, options = {})`
const cookieJar = options.cookieJar;
if (!cookieJar.__setCookie) {
cookieJar.__setCookie = cookieJar.setCookie;
cookieJar.setCookie = (...args) => {
if (args.length === 3) {
args.splice(2, 0, {});
}
if (args.length === 4) {
args[2].ignoreError = true;
}
return cookieJar.__setCookie(...args);
}
}
4917 passing (11m)
563 pending
1 failing
1) API: resource loading configuration
With a custom resource loader
should be able to customize the proxy option:
AssertionError: expected 1 to equal 3
+ expected - actual
-1
+3
at /Users/hinloopen/Projects/Github/jsdom/test/api/resources.js:666:16
at runMicrotasks (<anonymous>)
at processTicksAndRejections (internal/process/task_queues.js:93:5)
4917 passing tests, 1 failing. Only the proxy implementation remains.
Implementing proxy
It seems that one can replace the node-fetch
HTTP(s) agents with a proxy agent using https-proxy-agent as mentioned by jimliang.
Looking at the dependencies of https-proxy-agent
, it seems there are two: agent-base and debug.
I feel like this debug
dependency should have been optional, but who am I to judge. The agent-base
dependency seems sensible.
I also noticed there is a http-proxy-agent
variant, without the https
. I'm not sure if we need both. I'm hoping the https
one just supports both HTTP and HTTPS so I don't have to install both.
Let's try https-proxy-agent
.
yarn add https-proxy-agent
lib/jsdom/browser/resources/resource-loader.js
const HttpsProxyAgent = require("https-proxy-agent");
// _getFetchOptions({ accept = "*/*" }) {
if (!IS_BROWSER) {
const proxyAgent = this._proxy ? new HttpsProxyAgent(this._proxy) : null;
const httpAgent = new http.Agent({ keepAlive: true });
const httpsAgent = new https.Agent({ keepAlive: true, rejectUnauthorized: this._strictSSL });
fetchOptions.agent = (url) => proxyAgent ? proxyAgent : (url.protocol == 'http:' ? httpAgent : httpsAgent);
}
Let's run the tests, see if this works.
# (with .only on "should be able to customize the proxy option")
0 passing (6s)
1 failing
1) API: resource loading configuration
With a custom resource loader
should be able to customize the proxy option:
Error: Timeout of 2000ms exceeded. For async tests and hooks, ensure "done()" is called; if returning a Promise, ensure it resolves. (/Users/hinloopen/Projects/Github/jsdom/test/index.js)
at listOnTimeout (internal/timers.js:531:17)
at processTimers (internal/timers.js:475:7)
Timeout? That's not helpful at all. Since the proxy is HTTP, let's blindly try http-proxy-agent
. Additionally, the keepAlive
and rejectUnauthorized
options are not passed to the proxy agent. Let's add them. Both proxy agents accept either a URL or an object post
, hostname
, protocol
: The output of url.parse
. I'm assuming the remaining options are passed to http(s).Agent
.
Let's combine all my assumptions and see if we get anything other than a timeout. Let's also increase the timeout, just in case something is just being slow.
yarn add http-proxy-agent
lib/jsdom/browser/resources/resource-loader.js
const url = require("url");
const HttpProxyAgent = require("http-proxy-agent");
const HttpsProxyAgent = require("https-proxy-agent");
// _getFetchOptions({ accept = "*/*" }) {
if (!IS_BROWSER) {
const agentOpts = { keepAlive: true, rejectUnauthorized: this._strictSSL };
const proxyOpts = { ...agentOpts, ...(this._proxy ? url.parse(this._proxy) : {}) };
const httpAgent = this._proxy ? new HttpProxyAgent(proxyOpts) : new http.Agent(agentOpts);
const httpsAgent = this._proxy ? new HttpsProxyAgent(proxyOpts) : new https.Agent(agentOpts);
fetchOptions.agent = (url) => url.protocol == 'http:' ? httpAgent : httpsAgent;
}
# npm t -- --timeout 9999
# (with .only on "should be able to customize the proxy option")
this._proxy http://127.0.0.1:63767
this._proxy http://127.0.0.1:63767
✓ should be able to customize the proxy option (80ms)
1 passing (4s)
Success!
Let's do a minor cleanup to create agents on-demand, and re-run all tests to make sure everything still works.
lib/jsdom/browser/resources/resource-loader.js
/**
*
* @param {string} protocol "http:" or "https:"
*/
_getAgent(protocol) {
const isHttps = protocol === "https:";
const agentOpts = { keepAlive: true, rejectUnauthorized: this._strictSSL };
if (this._proxy) {
agentOpts.rejectUnauthorized = this._strictSSL;
const proxyOpts = { ...url.parse(this._proxy), ...agentOpts };
return isHttps ? new HttpsProxyAgent(proxyOpts) : new HttpProxyAgent(proxyOpts);
} else {
return isHttps ? new https.Agent(agentOpts) : new http.Agent(agentOpts);
}
}
// inside _getFetchOptions({ accept = "*/*" }) {
if (!IS_BROWSER) {
fetchOptions.agent = (url) => this._getAgent(url.protocol);
}
All tests are gean. Great. This is the final result. I intent to clean it up after the remaining request
dependencies are removed.
lib/jsdom/browser/resources/resource-loader.js
/**
*
* @param {string} protocol "http:" or "https:"
*/
_getAgent(protocol) {
const isHttps = protocol === "https:";
const agentOpts = { keepAlive: true, rejectUnauthorized: this._strictSSL };
if (this._proxy) {
agentOpts.rejectUnauthorized = this._strictSSL;
const proxyOpts = { ...url.parse(this._proxy), ...agentOpts };
return isHttps ? new HttpsProxyAgent(proxyOpts) : new HttpProxyAgent(proxyOpts);
} else {
return isHttps ? new https.Agent(agentOpts) : new http.Agent(agentOpts);
}
}
// inside _getFetchOptions({ accept = "*/*" }) {
case "http":
case "https": {
const cookieJar = options.cookieJar;
if (!cookieJar.__setCookie) {
cookieJar.__setCookie = cookieJar.setCookie;
cookieJar.setCookie = (...args) => {
if (args.length === 3) {
args.splice(2, 0, {});
}
if (args.length === 4) {
args[2].ignoreError = true;
}
return cookieJar.__setCookie(...args);
}
}
let lastUrl = options.referrer;
let lastOpts = null;
const myFetch = (url, opts) => {
if (lastUrl && !IS_BROWSER) {
opts.headers.referer = lastUrl;
}
lastUrl = url;
lastOpts = opts;
return fetch(url, opts);
};
const targetFetch = fetchCookie(myFetch, cookieJar);
const fetchOptions = this._getFetchOptions(options);
const abortController = new AbortController();
fetchOptions.signal = abortController.signal;
const fetchPromise = targetFetch(urlString, fetchOptions);
let aborted = false;
let result;
const then = function(onfulfilled, onrejected) {
return fetchPromise.then((response) => {
const { ok, status } = response;
if (!ok) {
throw new Error(`Unexpected status=${status} for ${urlString}`);
}
const headers = {};
for (const [ key, value ] of response.headers) {
headers[key] = value;
}
result.response = {
status,
headers,
};
result.href = lastUrl;
return response.arrayBuffer();
})
.then((arrayBuffer) => typeof Buffer === "undefined" ? arrayBuffer : Buffer.from(arrayBuffer))
.then((result) => { if (!aborted) return onfulfilled(result); })
.catch((error) => {
if (!aborted) {
if (onrejected) {
return onrejected(error);
} else {
throw error;
}
}
});
};
result = {
response: null,
abort: function() {
aborted = true;
abortController.abort();
},
href: null,
then,
catch: function(onrejected) {
return then(undefined, onrejected)
},
getHeader(name) {
return lastOpts.headers[name];
}
};
return result;
}
Because this post has become fairly large, I'll continue this post in a part 2. To be continued...
Top comments (0)