Working From Home Forum banner
1 - 1 of 1 Posts

· Registered
1 Posts
Discussion Starter · #1 ·
There are many ways for a programmer to earn extra money in his spare time from his main job. We will consider in this article only one specific way. It will be about parsing websites using an innovative online service The essence of the innovation lies in the fact that the analyzed site is divided into tags and loaded into MySQL tables. This allows you to parse sites using MySQL syntax. If you know HTML and MySQL markup, then this method will suit you. Parsing one site costs from several thousand to tens of thousands of rubles. You just have to find a customer, but it depends on you. We give you a fishing rod, and your task is to find a fishing spot.
Now let's take a closer look at the technology of the service. The analyzed site is loaded into two tables: tags(id, name, innerHTML, parentTagName, parentId, childIndex) and attributes(id, TagID, name, value). As you can understand from the name of the tags tables, this is a table with tags, and attributes with attributes. These tables are interconnected according to the scheme ->attributes.TagID.
Description of the tags table fields:
id - internal tag number in MySQL,
name - tag name,
innerHTML - tag content,
parentTagName - the name of the parent tag,
parentId - internal number in MySQL of the parent tag,
childIndex is the ordinal number of the child tag.
Description of the attributes table fields:
id - internal attribute number in MySQL,
TagID is the internal number in MySQL of the parent tag to which this attribute belongs,
name - attribute name,
value - attribute value.
After loading the site into MySQL, you can write queries directly in the online service to extract the necessary information. For example, the following query will extract all links from the analyzed site:
SELECT A.value FROM tags as T
LEFT JOIN attributes as A on
where ('a') and ('href')
In addition, you can download a dump of tables and analyze them in your MySQL editor. There is a video instruction on the site with an example of parsing.There is a technical support chat on the site
1 - 1 of 1 Posts