In many applications unverified data is processed, which often leads to inconsistence or errors.
Therefore the need for applications to validate this data is high. A lot of types of data can be
verified easily but more complex user data such as full-text-addresses pose a great challenge
towards validation.
This paper proposes an exemplary solution for such a validation by describing the development
of a web service that allows a research conference database 1 to validate conference-location-
strings. The validation consists of checking the plausibility of the location string, correction and
standardization of the spelling, classification (City, State, Country, etc.) and providing corre-
sponding data such as latitude and longitude. For this purpose, the web service accesses the
GeoNames database. Consequently, the conference database receives a rich response which it
can also use to provide further information, e.g., embedded maps or HTML5-Microdata-
Markup.
Table of Contents
1 Introduction
2 Basics
2.1 Web Services
2.1.1 SOAP/WSDL
2.1.2 RESTtful Web Services
2.1.3 Conclusion and Decision for RESTful Approach
2.1.4 Slim PHP Framework
2.2 GeoNames
3 System Design
3.1 Problem
3.2 Requirements and Structure
4 Implementation
4.1 Splitting the string
4.2 Validating the Country
4.3 Validating other Parts
4.4 Accessing GeoNames
4.5 Test-Application
5 Testing
6 Conclusion and Outlook
References
Appendix
1 Introduction
In many applications unverified data is processed, which often leads to inconsistence or errors. Therefore the need for applications to validate this data is high. A lot of types of data can be verified easily but more complex user data such as full-text-addresses pose a great challenge towards validation.
This paper proposes an exemplary solution for such a validation by describing the development of a web service that allows a research conference database1 to validate conference-location- strings. The validation consists of checking the plausibility of the location string, correction and standardization of the spelling, classification (City, State, Country, etc.) and providing corre- sponding data such as latitude and longitude. For this purpose, the web service accesses the GeoNames database. Consequently, the conference database receives a rich response which it can also use to provide further information, e.g., embedded maps or HTML5-Microdata- Markup.
The remainder of the paper is outlined as follows. In Section 2 the basic technologies for the realization of the project are introduced. A special focus is given to the understanding of web services and the current implementation approaches. In Section 3 the actual problem is broken down into technical requirements and technical conception. Afterwards, in Section 4 the imple- mentation will be explained. In Section 5, by using sample data the sufficient reliability towards validation accuracy is tested. Finally, in Section 6 a conclusion is drawn and further research opportunities are outlined.
2 Basics
For the realization in the following Sections a basic understanding of major technologies is nec- essary. First, an overview over the current web service implementation styles is given and a concrete PHP-framework for the implementation is introduced. Afterwards the GeoNames data- base is presented.
2.1 Web Services
According to the [FGA+04] and [Alo04 p.124 - 125] web services consist of loosely connected software components which communicate with each other over a wide area network such as the internet. Furthermore, as stated in [LLS06] they are not bound to one operating system or pro- gramming language.
“ A Web service is a software system designed to support interoperable machine-to-machine interaction over a network ” , W3C-Consortium1
Two main approaches for the implementation of web services can currently be observed in the internet. On the one hand, there is the SOAP/WSDL approach. These technologies have mainly been established by Microsoft and IBM. On the other hand, in the last years also ‘RESTful’ web services have gained great popularity - especially in the context of Web 2.0 - and seem to have overtaken SOAP.
In the following both of these implementation styles will be introduced briefly.
2.1.1 SOAP/WSDL
Simple Object Access Protocol (SOAP) was the first broadly accepted web service type and is well documented2. It has been released in 1999 and became especially popular in the enter- prise context. By using SOAP one can serialize methods and their environment and make them available to clients.
Due to the fact that SOAP works on an abstract layer it can be used independently towards pro- tocols and transportation layers [GCD05 p. 792]. This means whilst working on the web service the developer is not affected by the protocol. Furthermore, later it is effortless possible to change the protocol. In contrast to these advantages, as stated in [PZL08], a SOAP- implementation might create overhead because each abstraction layer increases complexity and traffic.
According to [FFG+04], SOAP supports ‘stateful’ interactions. Within a stateful web service the service provider stores information about the client / consumer over a series of requests and is able to respond to these in coherence [Bir12 p.213] and [AG05 p.29]. Especially with increas- ing complexity of the web service this could be an advantage because the client does not need to set a state in each request. In
other words: the client does not have to ‘repeat’ itself. On the contrary, this might also lead to an overhead for simple web services as many need- less states might be invoked and therefore also have to be considered by the interface designer.
As stated in [CDK+02] and Figure 1: Structure of SOAP/WSDL
illustrated in Figure 1, SOAP
can be used together with the Web Service Description Language (WSDL) to specify the methods supported by the web service. It can be seen as a contract of communication between the client and the web service. WSDL explains the behavior of the interface formally with XML and hence is processable by machines. Therefore, with WSDL, client applications can discover and understand a web service easily. Due to the fact that interface designers try to make their web services easily adaptable and accessible for consumer applications, WSDL has become popular and is often used in coherence with SOAP.
Moreover, SOAP/WSDL can be extended with a lot of other specification standards which are informally called ‘WS* Stack’ [RR07, p. xv]. Referring to the large amount of modules and specifications in WS*, SOAP/WSDL and WS* are often called “big web services”. The big variety and high quality of compatible extensions of WS* result in the advantage that additional specification modules can be integrated on demand, thus also web services with high feature demand are well supported by a SOAP-approach.
2.1.2 RESTtful Web Services
As claimed by [RR07, pp. xv] and confirmed by Amazon2 in the recent years RESTful web services have become a popular approach for the development of web based services.
In contrary to SOAP, a RESTful web service does not use further abstraction layers. Instead it is based on the RESTful Hyper Text Transfer Protocol (HTTP).
illustration not visible in this excerpt
Figure 2: Structure of RESTtful interaction
Figure 2 illustrates a RESTful web service interaction in accordance to [Fie00]: The concept of representational state transfer (REST) consists of three basic principles:
1. Identification of resources via URI (“nouns”). A resource is not necessarily a real-life entity, but could also be something more abstract such as “search”. The main point is that URIs should be self-descriptive.
2. For all resources the same basic commands (“verbs”) exist: GET, PUT, POST and DELETE.
3. Relationship and state transition is accomplished via hyperlinks and not by the server.
Because a RESTful web service basically uses the HTTP and does not require additional mod- ules it is very light-weight. This also results in reduced complexity and very intuitive handling. Nevertheless, the dependence on the HTTP could also be seen as its major disadvantage because a RESTful web service cannot be adjusted to a new environment, e.g., different protocol.
As all the technologies which are involved (HTTP, XML) are very well established, the barriers for developers (of the service as well as of the client-applications) are very low. Due to the fact that the principles of HTTP are well known, the usage of such a web service is intuitive and the implementation very swift.
As stated in [PZL08] in spite of SOAP, RESTful services are stateless. This means that the ser- vice provider does not transit into a different state after a request. On the contrary: it always stays in a neutral default state. This leads to a reduction of complexity as the programmer does not have to consider follow-up states. Nevertheless, it is still possible to work statefully with REST by using the URI to express different states. In other words: the RESTful service itself is stateless, but the client can invoke states by the choice of the URIs.
Anyway, due to the fact that URIs are used to identify the web service’s resources, one major limitation can be observed: the length of the URI is restricted depending on the web server3. For small and simple web services this limitation will not be of significance, but if one considers more complex WS which require a lot of parameters, this restriction might become a big obsta- cle.
Thus, RESTful web services seem to be a good choice for simple and light-weight services.
2.1.3 Conclusion and Decision for RESTful Approach
Regarding the scope of the location validation web service, the decision for a RESTful implementation was made. This was based on a few particular reasons.
First of all, in general, SOAP is the better approach for complex web services, whereas REST works sufficient for simple services. Also current research papers such as [PZL08] recommend this decision-strategy:
“ The main conclusion from this comparison is to use RESTful services for tacti cal, ad hoc integration over the Web ( à la Mashup) and to prefer WS-* Web ser vices in professional enterprise application integration scenarios with a longer lifespan and advanced QoS requirements. ”
Considering that the location validation web service will basically consist of a single functional- ity, this clearly favors the RESTful approach. Furthermore, due to the fact that the web service will be running in the internet, the protocol restriction (to HTTP) of RESTful WS does not pose an obstacle for the project. A further point is the fact that location validation does not require any stateful interactions. All validation requests can be treated independently. In addition, there is a lack of current literature describing the implementation of RESTful services. Therefore, the choice for REST might contribute more value to the research community. Last but not least, one will not require any further modules of the WS* stack, and therefore do not have any depend- ence on SOAP.
[...]
1 The conference database can be access here: http://dbis-group.uni-muenster.de/conferences/
2 Amazon is offering both SOAP and RESTful web services and consequently offers the opportunity to compare the popularity of both web service styles. A brief comparison can be found at[3].
3 For example according to [4] the Apache standard configuration currently limits the requests to 8190 bytes.