Towards an automated repository for indexing, analysis and characterization of municipal e-government websites in Mexico

by   Sergio R. Coria, et al.

This article addresses a problem in the electronic government discipline with special interest in Mexico: the need for a concentrated and updated information source about municipal e-government websites. One reason for this is the lack of a complete and updated database containing the electronic addresses (web domain names) of the municipal governments having a website. Due to diverse causes, not all the Mexican municipalities have one, and a number of those having it do not present information corresponding to the current governments but, instead, to other previous ones. The scarce official lists of municipal websites are not updated with the sufficient frequency, and manually determining which municipalities have an operating and valid website in a given moment is a time-consuming process. Besides, website contents do not always comply with legal requirements and are considerably heterogeneous. In turn, the evolution development level of municipal websites is valuable information that can be harnessed for diverse theoretical and practical purposes in the public administration field. Obtaining all these pieces of information requires website content analysis. Therefore, this article investigates the need for and the feasibility to automate implementation and updating of a digital repository to perform diverse analyses of these websites. Its technological feasibility is addressed by means of a literature review about web scraping and by proposing a preliminary manual methodology. This takes into account known, proven, techniques and software tools for web crawling and scraping. No new techniques for crawling or scraping are proposed because the existing ones satisfy the current needs. Finally, software requirements are specified in order to automate the creation, updating, indexing, and analyses of the repository.


page 26

page 29

page 30

page 31

page 32


Leveraging Google's Publisher-specific IDs to Detect Website Administration

Digital advertising is the most popular way for content monetization on ...

Accessibility of websites for visually impaired persons

Accessibility of websites for visually impaired persons is mishandled by...

How India Censors the Web

One of the primary ways in which India engages in online censorship is b...

Credibility of Automatic Appraisal of Domain Names

Both domain names and entire websites are increasingly frequently treate...

What is Web Scraping: Introduction, Applications and Best Practices

Web scraping typically extracts large amounts of #data from #websites fo...

Effective Notification Campaigns on the Web: A Matter of Trust, Framing, and Support

Misconfigurations and outdated software are a major cause of compromised...

Leveraging AI to optimize website structure discovery during Penetration Testing

Dirbusting is a technique used to brute force directories and file names...

Please sign up or login with your details

Forgot password? Click here to reset