Welcome toVigges Developer Community-Open, Learning,Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
949 views
in Technique[技术] by (71.8m points)

node.js - Parallelism of Puppeteer with Express Router Node JS. How to pass page between routes while maintaining concurrency

app.post('/api/auth/check', async (req, res) => {
try {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  await page.goto(
    'https://www.google.com'
  );
  res.json({message: 'Success'})
} catch (e) {
  console.log(e);
  res.status(500).json({ message: 'Error' });
}});

app.post('/api/auth/register', async (req, res) => {
  console.log('register');
  // Here i'm need to transfer the current user session (page and browser) and then perform actions on the same page.
  await page.waitForTimeout(1000);
  await browser.close();
}});

Is it possible to somehow transfer page and browser from one route to another while maintaining puppeteer concurrency. If you set the variable globally, then the page and browser will be overwritten and multitasking will not work.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

One approach is to create a closure that returns promises that will resolve to the same page and browser instances. Since HTTP is stateless, I assume you have some session/authentication management system that associates a user's session with a Puppeteer browser instance.

I've simplified your routes a bit and added a naive token management system to associate a user with a session in the interests of making a complete, runnable example but I don't think you'll have problems adapting it to your use case.

const express = require("express");
const puppeteer = require("puppeteer");

// https://stackoverflow.com/questions/51391080/handling-errors-in-express-async-middleware 
const asyncHandler = fn => (req, res, next) =>
  Promise.resolve(fn(req, res, next)).catch(next)
;
const startPuppeteerSession = async () => {
  const browser = await puppeteer.launch();
  const page = await browser.newPage();
  return {browser, page};
};
const sessions = {};

express()
  .use((req, res, next) => 
    req.query.token === undefined ? res.sendStatus(401) : next()
  )
  .get("/start", asyncHandler(async (req, res) => {
    sessions[req.query.token] = await startPuppeteerSession();
    res.sendStatus(200);
  }))
  .get("/navigate", asyncHandler(async (req, res) => {
    const page = await sessions[req.query.token].page;
    await page.goto(req.query.to || "http://www.example.com");
    res.sendStatus(200);
  }))
  .get("/content", asyncHandler(async (req, res) => {
    const page = await sessions[req.query.token].page;
    res.send(await page.content()); 
  }))
  .get("/kill", asyncHandler(async (req, res) => {
    const browser = await sessions[req.query.token].browser;
    await browser.close();
    delete sessions[req.query.token];
    res.sendStatus(200);
  }))
  .use((err, req, res, next) => res.sendStatus(500))
  .listen(8000, () => console.log("listening on port 8000"))
;

Sample usage from the client's perspective:

$ curl localhost:8000/start?token=1
OK
$ curl 'localhost:8000/navigate?to=https://stackoverflow.com/questions/66935883&token=1'
OK
$ curl localhost:8000/content?token=1 | grep 'apsenT'
        <a href="/users/15547056/apsent">apsenT</a><span class="d-none" itemprop="name">apsenT</span>
            <a href="/users/15547056/apsent">apsenT</a> is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
                        <a href="/users/15547056/apsent">apsenT</a> is a new contributor. Be nice, and check out our <a href="/conduct">Code of Conduct</a>.
$ curl localhost:8000/kill?token=1
OK

You can see the client associated with token 1 has persisted a single browser session across multiple routes. Other clients can launch browser sessions and manipulate them simultaneously.

To reiterate, this is only a proof-of-concept of sharing a Puppeteer browser instance across routes. Using the code above, a user can just spam the start route and create browsers until the server crashes, so this is totally unfit for production without real authentication and session management/error handling.

Packages used: express ^4.17.1, puppeteer ^8.0.0.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to Vigges Developer Community for programmer and developer-Open, Learning and Share
...