Thursday, November 25, 2010

Week 12 Notes

Due to the holiday I will not be posting notes for this week. 

Muddiest Point for Week 12

My question is still the one I proposed within my notes from last week.  How can more of the deep Web content get to the surface Web?  Is it once someone makes a specific request and acquires content from the deep web it automatically becomes part of the surface web? Does that mean anyone could access that information now? 

Saturday, November 20, 2010

Week 11 Notes

"Web Search Engines: Part 1 and Part 2," by David Hawking
I felt like the information in this article went right over my head.   I just felt like I was not fully grasping the definitions and concepts of crawling and indexing.  Also, the graphs in the figures were not as helpful as I thought they were going to be.  What I did gain, if I understood it correctly, was that a good "seed" URL, such as Wikipedia, will link to numerous Web sites and these "seeds" are what initializes the crawler.  After, the crawler scans the content of this "seed" URL it will add any links to other URLs into the queue.  Additionally, it saves the Webpage content for indexing.  Then in Part 2 it goes on to explain indexing algorithms.  So, basically an inverted file, used by the search engines, is a concatenation (the operation of joining two character strings end-to-end) of the posting lists for each particular word or phrase.  The list contains all the ID numbers of the webpage documents that the word is in.  In the end, I enjoyed part 2 over part 1, but I honestly understood the simpler explanation from Wikipedia then I did with this article. 

"Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting," by Sarah L. Shreeves, et al. 
The Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) was released in 2001 and is a "means to federate access to diverse e-print archives through metadata harvesting and aggregation."  Since its release a wide variety of communities had begun to use the protocol for their own specific needs.  In a 2003 study it was stated that over 300 active data providers from an array of institutions and domains were using OAI-PMH.  The article discusses the use of the protocol within these different communities as well as the challenges and future directions it faces.  My favorite part in the article were the three specific examples of communities using the protocol.  As a piano player, I was really interested in the Sheet Music Consortium, a collection of free digitized sheet music.  I am definitely intrigued to research more about it and to see how the search service progressed. 

"The Deep Web: Surfacing Hidden Value," by Michael K. Bergman
This article was the most fascinating to me this week.  I never knew there was a "deep Web" and that what we mostly view is just the "surface Web."  I was captivated that there was stored additional content on the Web, but could only be accessed by direct request.  It made me wonder how can more of the deep Web content get to the surface Web?  I also enjoyed the study performed by BrightPlanet where they used their exclusive and unique search technology to quantify the size and importance of deep Web material.  I was most surprised by the finding that the deep Web is 400 to 550 times larger than the WWW (surface Web).  Also, the finding that the "total quality content of the deep Web is 1,000 to 2,000 times greater than that of the surface web" was surprising. 

Comments for Week 11

Comment 1

Comment 2

Friday, November 19, 2010

Muddiest Point for Week 11

I support the inclusion of an institutional repository within a digital library, but I know it's not required.  Within the field, what seems to be the most popular decision; to include the repository or to disregard it due to its high costs? 

Saturday, November 13, 2010

Week 10 Notes

“Digital Libraries: Challenges and Influential Work,” by William H. Mischo
This article provided background information on the evolution of digital library technologies, a lot of which I knew nothing about.  I never knew that most of the research and projects were federally funded and were university-led.  There were six university-led projects, all focusing on different aspects of digital library research.  I found the University of Illinois at Urbana-Champaign DLI-1 project the most interesting (and the project I would have most liked to work on), for they researched the “development of document, representation, processing, indexing, search and discovery, and delivery and rendering protocols for full-text journals.”  I also enjoyed that the article highlighted the actual achievements that were born from these projects.  For example, Google grew from the Stanford DLI-1 project and the Cornell University & UK ePrint collaboration DLI-2 project contributed to the foundation of the Open Archives Initiative for Metadata Harvesting (OAI-PMH).  It was nice to see that these government funded projects led to successful and global programs/corporations. 

“Dewey Meets Turing: Librarians, Computer Scientists and the Digital Libraries Initiative,” by Andreas Paepcke and et al.
This article discussed the collaboration between librarians and computer scientists dealing with the Digital Library Initiative (DLI).  It was appealing to learn about the affect the World Wide Web had on both disciplines and the DLI.  I learned that it was more difficult for librarians to integrate the Web then it was for computer scientists.  Computer scientists were thrilled to research and incorporate subdisciplines of computer science such as machine learning and statistical approaches into their work, while librarians felt the Web was threatening traditional pillars of librarianship, such as the reference interview.   The article also stated that the Web affected both communities by turning the retrieval of information into a more laissez-faire culture.  Another interesting point was the conflict between the two disciplines, for librarians felt computer scientists were “stealing” their money that should have been going into collection development and computer scientists were frustrated with librarians’ emphasis and wariness of metadata.  In the end, I felt the article was trying to suggest that the two needed to find a common ground on how to work together. 

“Institutional Repositories: Essential Infrastructure for Scholarship in the Digital Age,” by Clifford A. Lynch
In this information packed article, Lynch discusses the definition, the importance, the cautions, the benefits, and the future developments of institutional repositories, specifically university-based repositories.  He first makes a point to state that a university-based institutional repository is defined as a “set of services that a university offers to the members of its community for the management and dissemination of digital materials created by the institution and its community members.”  I support his opinion that a repository must contain both faculty and students’ research and teaching materials along with documents about the institution’s past and recent events and performances of intellectual life.  He goes on to discuss that faculty must take the lead in adopting this new form of scholarly communication and they must make the shift in using this information network full of new distribution capabilities.  I also agreed with his three cautions about institutional repositories.  He argued that the institution should not try to assert control or ownership over works through the repository, to not overload the infrastructure with irrelevant policies, and lastly to not make light of the seriousness and importance of the repository to the community and to the scholarly world. My favorite recommendation of Lynch’s was his opinion for the extension from institutional repositories to community and public repositories.  I believe this is a brilliant idea, and if accomplished, could lead to a wonderful collaboration between societal institutions, government, and members of the local community. 

Comments for Week 10

Comment 1

Comment 2

Friday, November 12, 2010

Muddiest Point for Week 10

I am still confused over child elements.  Can you only assign a single letter to represent the child element or can you assign a single word? Can you assign child elements to all elements in your document?  Is it really necessary to include child elements? 

Saturday, November 6, 2010

Comments for Week 9

Comment 1

Comment 2

Week 9 Notes

“An Introduction to the Extensible Markup Language (XML)” by Martin Bryan
“Extending Your Markup: An XML Tutorial” by Andre Bergholz
After reading these two articles I began to learn what XML is and how it works.  However, I do feel a little shaky on all the specific and intricate details of it.  What I do understand is that XML is a language that lets you meaningfully annotate text, unlike HTML.  It does not have a single standardized way of coding text and it does not have predefined set of tags. Through Document Type Definition (DTD), the component that defines structure within the XML document or as Bergholz put it a “context-free grammar,” DTD allows users to choose their own tags, elements, and attributes.  This freedom to choose and create the structural aspects your own way is a wonderful benefit of XML.  Since, XML descriptions are structure orientated rather than HTML’s layout orientation, I believe XML is easier to write and comprehend. 

“A Survey of XML Standards: Part 1” by Uche Ogbuji
In the article, Ogbuji discusses the most important XML technologies, or as he puts it, standards.  For technologies to become standards they must be notably adopted by an array of vendors or respected organizations.  The article pointed out that most standards stemmed from W3C recommendations or from the International Organization for Standardization (ISO) and Organization for the Advancement of Structured Information Standards (OASIS).  Ogbuji listed some very interesting standards and the best part was that he included links for tutorials and other resources that would be useful in understanding each standard. I really enjoyed learning about the standards for XML Schema language like RELAX NG, the Schematron Assertion Language 1.5, and W3C XML Schema. 

XML Schema Tutorial
From what was stated in the Bergholz article, XML Schema is like DTD, for it defines “a grammar” for the document, but it’s more expressive and uses XML syntax.  In the tutorial they provided a simple list of what XML Schema is.  They stated that the main purpose of an XML Schema is to define the elements, the attributes, which elements are child elements, the order & number of elements, and datatypes & values for each element or attribute.  This tutorial was a lot easier to understand than the first two articles in their Schema explanations.  Also, the examples were extremely helpful and helped me in grasping the basic concepts.