Skip to main content

Web Scraping REST API with Node, Express and Puppeteer

Web Scraping REST API with Node, Express and Puppeteer

Step 1. Create a Node project

Create a folder. Open the folder using a terminal.
Type: npm init and press enter
Answer or leave unanswered the questions asked by the program.
Your Node Project is ready.

Step 2. Install Puppeteer and Express 

Run npm install --save express in the terminal.
Run npm install --save puppeteer in the terminal.
This installs puppeteer as well as an instance of browser.

Step 3. Create Web Scraping Program

Create a file named app.js
Add the following lines to app.js
const express = require('express')

const scraper = require('./utils/scraper')

const app = express();

app.get('/reviews', (req, res) => {
scraper.extractReviews(req.query.url)
.then(data => {
res.status(200).json({ message: "success", data: data })
}).catch(err => res.status(500).json({ message: "Something went wrong. Could not fetch result." }))
});

app.listen(process.env.PORT || 3000, () =>
console.log('Example app listening on port!'),
);
Add a folder named utils
Add a file in utils named scraper.js
Add the following code in scraper.js
const puppeteer = require('puppeteer'); // import puppeteer

extractReviews = async (url) => {
const browser = await puppeteer.launch({ headless: true });
const page = await browser.newPage();
await page.goto(url, {waitUntil: 'networkidle2'});
const reviewCount = await page.evaluate(() => document.querySelector('span[itemprop="reviewCount"]').getAttribute('content'));
let reviewArray = [];
if (reviewCount > 0) {
url = url+"&pagenumber=0&RSort=1&csid=ITD&recordsPerPage="+reviewCount+"&body=REVIEWS#CustomerReviewsBlock"
await page.goto(url, { waitUntil: 'load' });
reviewArray = await page.evaluate(() => Array.from(document.querySelectorAll('.review')).map(review => ({ reviewTitle: review.querySelector('.rightCol blockquote h6').textContent, reviewComment: review.querySelector('.rightCol blockquote p').textContent, reviewRating: +review.querySelector('.leftCol .itemReview dd .itemRating strong').textContent, reviewDate: review.querySelector('.leftCol .reviewer dd:nth-of-type(2)').textContent, reviewer: review.querySelector('.leftCol .reviewer dd:nth-of-type(1)').textContent })));
}
await browser.close();
return { reviewCount: +reviewCount, reviewArray: reviewArray, url: url };
};

module.exports.extractReviews = extractReviews
Run node app.js in terminal to start the server on localhost.
In browser open the following URL to test your Scraping API: http://localhost:3000/reviews/?url=http://www.tigerdirect.com/applications/SearchTools/item-details.asp?EdpNo=3415697
Tadaaaaa!
If it did not work. Let me know in the comments.

Comments

Popular posts from this blog

Difference between .exec() and .execPopulate() in Mongoose?

Here I answer what is the difference between .exec() and .execPopulate() in Mongoose? .exec() is used with a query while .execPopulate() is used with a document Syntax for .exec() is as follows: Model.query() . populate ( 'field' ) . exec () // returns promise . then ( function ( document ) { console . log ( document ); }); Syntax for .execPopulate() is as follows: fetchedDocument . populate ( 'field' ) . execPopulate () // returns promise . then ( function ( document ) { console . log ( document ); }); When working with individual document use .execPopulate(), for model query use .exec(). Both returns a promise. One can do without .exec() or .execPopulate() but then has to pass a callback in populate.

Python - List - Append, Count, Extend, Index, Insert, Pop, Remove, Reverse, Sort

🐍 Advance List List is widely used and it's functionalities are heavily useful. Append Adds one element at the end of the list. Syntax list1.append(value) Input l1 = [1, 2, 3] l1.append(4) l1 Output [1, 2, 3, 4] append can be used to add any datatype in a list. It can even add list inside list. Caution: Append does not return anything. It just appends the list. Count .count(value) counts the number of occurrences of an element in the list. Syntax list1.count(value) Input l1 = [1, 2, 3, 4, 3] l1.count(3) Output 2 It returns 0 if the value is not found in the list. Extend .count(value) counts the number of occurrences of an element in the list. Syntax list1.extend(list) Input l1 = [1, 2, 3] l1.extend([4, 5]) Output [1, 2, 3, 4, 5] If we use append, entire list will be added to the first list like one element. Extend, i nstead of considering a list as one element, it joins the two lists one after other. Append works in the following way. Input l1 = [1, 2, 3] l1.append([4, 5]) Output...

Machine Learning — Supervised, Unsupervised, and Reinforcement — Explanation with Example

🤖 Let's take an example of machine learning and see how it can be performed in three different ways — Supervised, Unsupervised, and Reinforcement. We want a program to be able to identify apple in pictures Supervised Learning You will create or use a model that takes a set of pictures of apple and it analyses the commonality in those pictures. Now when you show a new picture to the program, it will identify whether it has an apple or not. It can also provide details on how confident is the program about it. Unsupervised Learning In this method, you create or use a model that goes through some images and tries to group them as per the commonalities it observes such as color, shape, size, partern, etc. And now you can go through the groups and inform the program what to call them. So, you can inform the program about the group that is apple mostly. Next time you show a picture, it can tell if an apple is there or not. Reinforcement Learning Here the model you create or...