Towards an automated repository for indexing, analysis and characterization of municipal e-government websites in Mexico

06/26/2020
by   Sergio R. Coria, et al.
0

This article addresses a problem in the electronic government discipline with special interest in Mexico: the need for a concentrated and updated information source about municipal e-government websites. One reason for this is the lack of a complete and updated database containing the electronic addresses (web domain names) of the municipal governments having a website. Due to diverse causes, not all the Mexican municipalities have one, and a number of those having it do not present information corresponding to the current governments but, instead, to other previous ones. The scarce official lists of municipal websites are not updated with the sufficient frequency, and manually determining which municipalities have an operating and valid website in a given moment is a time-consuming process. Besides, website contents do not always comply with legal requirements and are considerably heterogeneous. In turn, the evolution development level of municipal websites is valuable information that can be harnessed for diverse theoretical and practical purposes in the public administration field. Obtaining all these pieces of information requires website content analysis. Therefore, this article investigates the need for and the feasibility to automate implementation and updating of a digital repository to perform diverse analyses of these websites. Its technological feasibility is addressed by means of a literature review about web scraping and by proposing a preliminary manual methodology. This takes into account known, proven, techniques and software tools for web crawling and scraping. No new techniques for crawling or scraping are proposed because the existing ones satisfy the current needs. Finally, software requirements are specified in order to automate the creation, updating, indexing, and analyses of the repository.

READ FULL TEXT

page 26

page 29

page 30

page 31

page 32

research
02/10/2022

Leveraging Google's Publisher-specific IDs to Detect Website Administration

Digital advertising is the most popular way for content monetization on ...
research
11/08/2019

Accessibility of websites for visually impaired persons

Accessibility of websites for visually impaired persons is mishandled by...
research
12/18/2019

How India Censors the Web

One of the primary ways in which India engages in online censorship is b...
research
10/29/2018

Credibility of Automatic Appraisal of Domain Names

Both domain names and entire websites are increasingly frequently treate...
research
10/22/2020

What is Web Scraping: Introduction, Applications and Best Practices

Web scraping typically extracts large amounts of #data from #websites fo...
research
11/12/2020

Effective Notification Campaigns on the Web: A Matter of Trust, Framing, and Support

Misconfigurations and outdated software are a major cause of compromised...
research
01/18/2021

Leveraging AI to optimize website structure discovery during Penetration Testing

Dirbusting is a technique used to brute force directories and file names...

Please sign up or login with your details

Forgot password? Click here to reset