Download or read book Automatic Disambiguation of Author Names in Bibliographic Repositories written by Anderson A. Ferreira and published by Springer Nature. This book was released on 2022-06-01 with total page 126 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book deals with a hard problem that is inherent to human language: ambiguity. In particular, we focus on author name ambiguity, a type of ambiguity that exists in digital bibliographic repositories, which occurs when an author publishes works under distinct names or distinct authors publish works under similar names. This problem may be caused by a number of reasons, including the lack of standards and common practices, and the decentralized generation of bibliographic content. As a consequence, the quality of the main services of digital bibliographic repositories such as search, browsing, and recommendation may be severely affected by author name ambiguity. The focal point of the book is on automatic methods, since manual solutions do not scale to the size of the current repositories or the speed in which they are updated. Accordingly, we provide an ample view on the problem of automatic disambiguation of author names, summarizing the results of more than a decade of research on this topic conducted by our group, which were reported in more than a dozen publications that received over 900 citations so far, according to Google Scholar. We start by discussing its motivational issues (Chapter 1). Next, we formally define the author name disambiguation task (Chapter 2) and use this formalization to provide a brief, taxonomically organized, overview of the literature on the topic (Chapter 3). We then organize, summarize and integrate the efforts of our own group on developing solutions for the problem that have historically produced state-of-the-art (by the time of their proposals) results in terms of the quality of the disambiguation results. Thus, Chapter 4 covers HHC - Heuristic-based Clustering, an author name disambiguation method that is based on two specific real-world assumptions regarding scientific authorship. Then, Chapter 5 describes SAND - Self-training Author Name Disambiguator and Chapter 6 presents two incremental author name disambiguation methods, namely INDi - Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides an overview of recent author name disambiguation methods that address new specific approaches such as graph-based representations, alternative predefined similarity functions, visualization facilities and approaches based on artificial neural networks. The chapters are followed by three appendices that cover, respectively: (i) a pattern matching function for comparing proper names and used by some of the methods addressed in this book; (ii) a tool for generating synthetic collections of citation records for distinct experimental tasks; and (iii) a number of datasets commonly used to evaluate author name disambiguation methods. In summary, the book organizes a large body of knowledge and work in the area of author name disambiguation in the last decade, hoping to consolidate a solid basis for future developments in the field.
Download or read book Knowledge Graphs and Semantic Web written by Boris Villazón-Terrazas and published by Springer Nature. This book was released on 2022-11-12 with total page 355 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the proceedings of the 4th Iberoamerican Conference and third Indo-American Conference on Knowledge Graphs and Semantic Web, KGSWC 2022, which took place in Madrid, Spain, in November 2022. The 22 full and 3 short research papers presented in this volume were carefully reviewed and selected from 63 submissions. The papers cover topics related to software and its engineering, software creation and management, Emerging technologies, Analysis and design of emerging devices and systems, Emerging tools and methodologies and others.
Download or read book International Conference on Digital Libraries ICDL 2013 written by Shantanu Ganguly and published by The Energy and Resources Institute (TERI). This book was released on 2013-11-29 with total page 1230 pages. Available in PDF, EPUB and Kindle. Book excerpt: ICDL conferences are recognized on of the most important platform in the world where noted expert share their experiences. Many DL experts have contributed thought provoking papers in ICDL 2013. These important papers are reviewed and conceptualized into ICDL on different areas of DL proceedings. The Proceedings have two volumes and has over 1100 pages.
Download or read book Information Management and Big Data written by Juan Antonio Lossio-Ventura and published by Springer Nature. This book was released on 2021-05-11 with total page 563 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book constitutes the refereed proceedings of the 7th International Conference on Information Management and Big Data, SIMBig 2020, held in Lima, Peru, in October 2020.* The 32 revised full papers and 7 revised short papers presented were carefully reviewed and selected from 122 submissions. The papers address topics such as natural language processing and text mining; machine learning; image processing; social networks; data-driven software engineering; graph mining; and Semantic Web, repositories, and visualization. *The conference was held virtually.
Download or read book Understanding and Evaluating Search Experience written by Stone Maria and published by Springer Nature. This book was released on 2022-05-31 with total page 87 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book is intended for anyone interested in learning more about how search works and how it is evaluated. We all use search—it's a familiar utility. Yet, few of us stop and think about how search works, what makes search results good, and who, if anyone, decides what good looks like. Search has a long and glorious history, yet it continues to evolve, and with it, the measurement and our understanding of the kinds of experiences search can deliver continues to evolve, as well. We will discuss the basics of how search engines work, how humans use search engines, and how measurement works. Equipped with these general topics, we will then dive into the established ways of measuring search user experience, and their pros and cons. We will talk about collecting labels from human judges, analyzing usage logs, surveying end users, and even touch upon automated evaluation methods. After introducing different ways of collecting metrics, we will cover experimentation as it applies to search evaluation. The book will cover evaluating different aspects of search—from search user interface (UI), to results presentation, to the quality of search algorithms. In covering these topics, we will touch upon many issues in evaluation that became sources of controversy—from user privacy, to ethical considerations, to transparency, to potential for bias. We will conclude by contrasting measuring with understanding, and pondering the future of search evaluation.
Download or read book Word Association Thematic Analysis written by Mike Thelwall and published by Morgan & Claypool Publishers. This book was released on 2021-02-02 with total page 131 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book explains the word association thematic analysis method, with examples, and gives practical advice for using it. It is primarily intended for social media researchers and students, although the method is applicable to any collection of short texts. Many research projects involve analyzing sets of texts from the social web or elsewhere to get insights into issues, opinions, interests, news discussions, or communication styles. For example, many studies have investigated reactions to Covid-19 social distancing restrictions, conspiracy theories, and anti-vaccine sentiment on social media. This book describes word association thematic analysis, a mixed methods strategy to identify themes within a collection of social web or other texts. It identifies these themes in the differences between subsets of the texts, including female vs. male vs. nonbinary, older vs. newer, country A vs. country B, positive vs. negative sentiment, high scoring vs. low scoring, or subtopic A vs. subtopic B. It can also be used to identify the differences between a topic-focused collection of texts and a reference collection. The method starts by automatically finding words that are statistically significantly more common in one subset than another, then identifies the context of these words and groups them into themes. It is supported by the free Windows-based software Mozdeh for data collection or importing and for the quantitative analysis stages.
Download or read book Question Answering for the Curated Web written by Rishiraj Saha Roy and published by Springer Nature. This book was released on 2022-05-31 with total page 172 pages. Available in PDF, EPUB and Kindle. Book excerpt: Question answering (QA) systems on the Web try to provide crisp answers to information needs posed in natural language, replacing the traditional ranked list of documents. QA, posing a multitude of research challenges, has emerged as one of the most actively investigated topics in information retrieval, natural language processing, and the artificial intelligence communities today. The flip side of such diverse and active interest is that publications are highly fragmented across several venues in the above communities, making it very difficult for new entrants to the field to get a good overview of the topic. Through this book, we make an attempt towards mitigating the above problem by providing an overview of the state-of-the-art in question answering. We cover the twin paradigms of curated Web sources used in QA tasks ‒ trusted text collections like Wikipedia, and objective information distilled into large-scale knowledge bases. We discuss distinct methodologies that have been applied to solve the QA problem in both these paradigms, using instantiations of recent systems for illustration. We begin with an overview of the problem setup and evaluation, cover notable sub-topics like open-domain, multi-hop, and conversational QA in depth, and conclude with key insights and emerging topics. We believe that this resource is a valuable contribution towards a unified view on QA, helping graduate students and researchers planning to work on this topic in the near future.
Download or read book Task Intelligence for Search and Recommendation written by Chirag Shah and published by Springer Nature. This book was released on 2022-06-01 with total page 140 pages. Available in PDF, EPUB and Kindle. Book excerpt: While great strides have been made in the field of search and recommendation, there are still challenges and opportunities to address information access issues that involve solving tasks and accomplishing goals for a wide variety of users. Specifically, we lack intelligent systems that can detect not only the request an individual is making (what), but also understand and utilize the intention (why) and strategies (how) while providing information and enabling task completion. Many scholars in the fields of information retrieval, recommender systems, productivity (especially in task management and time management), and artificial intelligence have recognized the importance of extracting and understanding people's tasks and the intentions behind performing those tasks in order to serve them better. However, we are still struggling to support them in task completion, e.g., in search and assistance, and it has been challenging to move beyond single-query or single-turn interactions. The proliferation of intelligent agents has unlocked new modalities for interacting with information, but these agents will need to be able to work understanding current and future contexts and assist users at task level. This book will focus on task intelligence in the context of search and recommendation. Chapter 1 introduces readers to the issues of detecting, understanding, and using task and task-related information in an information episode (with or without active searching). This is followed by presenting several prominent ideas and frameworks about how tasks are conceptualized and represented in Chapter 2. In Chapter 3, the narrative moves to showing how task type relates to user behaviors and search intentions. A task can be explicitly expressed in some cases, such as in a to-do application, but often it is unexpressed. Chapter 4 covers these two scenarios with several related works and case studies. Chapter 5 shows how task knowledge and task models can contribute to addressing emerging retrieval and recommendation problems. Chapter 6 covers evaluation methodologies and metrics for task-based systems, with relevant case studies to demonstrate their uses. Finally, the book concludes in Chapter 7, with ideas for future directions in this important research area.
Download or read book Word Association Thematic Analysis written by Michael Thelwall and published by Springer Nature. This book was released on 2022-05-31 with total page 111 pages. Available in PDF, EPUB and Kindle. Book excerpt: Many research projects involve analyzing sets of texts from the social web or elsewhere to get insights into issues, opinions, interests, news discussions, or communication styles. For example, many studies have investigated reactions to Covid-19 social distancing restrictions, conspiracy theories, and anti-vaccine sentiment on social media. This book describes word association thematic analysis, a mixed methods strategy to identify themes within a collection of social web or other texts. It identifies these themes in the differences between subsets of the texts, including female vs. male vs. nonbinary, older vs. newer, country A vs. country B, positive vs. negative sentiment, high scoring vs. low scoring, or subtopic A vs. subtopic B. It can also be used to identify the differences between a topic-focused collection of texts and a reference collection. The method starts by automatically finding words that are statistically significantly more common in one subset than another, then identifies the context of these words and groups them into themes. It is supported by the free Windows-based software Mozdeh for data collection or importing and for the quantitative analysis stages. This book explains the word association thematic analysis method, with examples, and gives practical advice for using it. It is primarily intended for social media researchers and students, although the method is applicable to any collection of short texts.
Download or read book Third Space Information Sharing and Participatory Design written by Preben Hansen and published by Springer Nature. This book was released on 2022-05-31 with total page 134 pages. Available in PDF, EPUB and Kindle. Book excerpt: Society faces many challenges in workplaces, everyday life situations, and education contexts. Within information behavior research, there are often calls to bridge inclusiveness and for greater collaboration, with user-centered design approaches and, more specifically, participatory design practices. Collaboration and participation are essential in addressing contemporary societal challenges, designing creative information objects and processes, as well as developing spaces for learning, and information and research interventions. The intention is to improve access to information and the benefits to be gained from that. This also applies to bridging the digital divide and for embracing artificial intelligence. With regard to research and practices within information behavior, it is crucial to consider that all users should be involved. Many information activities (i.e., activities falling under the umbrella terms of information behavior and information practices) manifest through participation, and thus, methods such as participatory design may help unfold both information behavior and practices as well as the creation of information objects, new models, and theories. Information sharing is one of its core activities. For participatory design with its value set of democratic, inclusive, and open participation towards innovative practices in a diversity of contexts, it is essential to understand how information activities such as sharing manifest itself. For information behavior studies it is essential to deepen understanding of how information sharing manifests in order to improve access to information and the use of information. Third Space is a physical, virtual, cognitive, and conceptual space where participants may negotiate, reflect, and form new knowledge and worldviews working toward creative, practical and applicable solutions, finding innovative, appropriate research methods, interpreting findings, proposing new theories, recommending next steps, and even designing solutions such as new information objects or services. Information sharing in participatory design manifests in tandem with many other information interaction activities and especially information and cognitive processing. Although there are practices of individual information sharing and information encountering, information sharing mostly relates to collaborative information behavior practices, creativity, and collective decision-making. Our purpose with this book is to enable students, researchers, and practitioners within a multi-disciplinary research field, including information studies and Human–Computer Interaction approaches, to gain a deeper understanding of how the core activity of information sharing in participatory design, in which Third Space may be a platform for information interaction, is taking place when using methods utilized in participatory design to address contemporary societal challenges. This could also apply for information behavior studies using participatory design as methodology. We elaborate interpretations of core concepts such as participatory design, Third Space, information sharing, and collaborative information behavior, before discussing participatory design methods and processes in more depth. We also touch on information behavior, information practice, and other important concepts. Third Space, information sharing, and information interaction are discussed in some detail. A framework, with Third Space as a core intersecting zone, platform, and adaptive and creative space to study information sharing and other information behavior and interactions are suggested. As a tool to envision information behavior and suggest future practices, participatory design serves as a set of methods and tools in which new interpretations of the design of information behavior studies and eventually new information objects are being initiated involving multiple stakeholders in future information landscapes. For this purpose, we argue that Third Space can be used as an intersection zone to study information sharing and other information activities, but more importantly it can serve as a Third Space Information Behavior (TSIB) study framework where participatory design methodology and processes are applied to information behavior research studies and applications such as information objects, systems, and services with recognition of the importance of situated awareness.
Download or read book Simulating Information Retrieval Test Collections written by David Hawking and published by Springer Nature. This book was released on 2022-06-01 with total page 162 pages. Available in PDF, EPUB and Kindle. Book excerpt: Simulated test collections may find application in situations where real datasets cannot easily be accessed due to confidentiality concerns or practical inconvenience. They can potentially support Information Retrieval (IR) experimentation, tuning, validation, performance prediction, and hardware sizing. Naturally, the accuracy and usefulness of results obtained from a simulation depend upon the fidelity and generality of the models which underpin it. The fidelity of emulation of a real corpus is likely to be limited by the requirement that confidential information in the real corpus should not be able to be extracted from the emulated version. We present a range of methods exploring trade-offs between emulation fidelity and degree of preservation of privacy. We present three different simple types of text generator which work at a micro level: Markov models, neural net models, and substitution ciphers. We also describe macro level methods where we can engineer macro properties of a corpus, giving a range of models for each of the salient properties: document length distribution, word frequency distribution (for independent and non-independent cases), word length and textual representation, and corpus growth. We present results of emulating existing corpora and for scaling up corpora by two orders of magnitude. We show that simulated collections generated with relatively simple methods are suitable for some purposes and can be generated very quickly. Indeed it may sometimes be feasible to embed a simple lightweight corpus generator into an indexer for the purpose of efficiency studies. Naturally, a corpus of artificial text cannot support IR experimentation in the absence of a set of compatible queries. We discuss and experiment with published methods for query generation and query log emulation. We present a proof-of-the-pudding study in which we observe the predictive accuracy of efficiency and effectiveness results obtained on emulated versions of TREC corpora. The study includes three open-source retrieval systems and several TREC datasets. There is a trade-off between confidentiality and prediction accuracy and there are interesting interactions between retrieval systems and datasets. Our tentative conclusion is that there are emulation methods which achieve useful prediction accuracy while providing a level of confidentiality adequate for many applications. Many of the methods described here have been implemented in the open source project SynthaCorpus, accessible at: https://bitbucket.org/davidhawking/synthacorpus/
Download or read book Trustworthy Communications and Complete Genealogies written by Reagan W. Moore and published by Springer Nature. This book was released on 2022-06-01 with total page 139 pages. Available in PDF, EPUB and Kindle. Book excerpt: Genealogies document relationships between persons involved in historical events. Information about the events is parsed from communications from the past. This book explores a way to organize information from multiple communications into a trustworthy representation of a genealogical history of the modern world. The approach defines metrics for evaluating the consistency, correctness, closure, connectivity, completeness, and coherence of a genealogy. The metrics are evaluated using a 312,000-person research genealogy that explores the common ancestors of the royal families of Europe. A major result is that completeness is defined by a genealogy symmetry property driven by two exponential processes, the doubling of the number of potential ancestors each generation, and the rapid growth of lineage coalescence when the number of potential ancestors exceeds the available population. A genealogy expands from an initial root person to a large number of lineages, which then coalesce into a small number of progenitors. Using the research genealogy, candidate progenitors for persons of Western European descent are identified. A unifying ancestry is defined to which historically notable persons can be linked.
Download or read book Bibliometrics and Citation Analysis written by Nicola De Bellis and published by Scarecrow Press. This book was released on 2009-03-09 with total page 451 pages. Available in PDF, EPUB and Kindle. Book excerpt: Can the methods of science be directed toward science itself? How did it happen that scientists, scientific documents, and their bibliographic links came to be regarded as mathematical variables in abstract models of scientific communication? What is the role of quantitative analyses of scientific and technical documentation in current science policy and management? Bibliometrics and Citation Analysis: From the Science Citation Index to Cybermetrics answers these questions through a comprehensive overview of theories, techniques, concepts, and applications in the interdisciplinary and steadily growing field of bibliometrics. Since citation indexes came into the limelight during the mid-1960s, citation networks have become increasingly important for many different research fields. The book begins by investigating the empirical, philosophical, and mathematical foundations of bibliometrics, including its beginnings with the Science Citation Index, the theoretical framework behind it, and its mathematical underpinnings. It then examines the application of bibliometrics and citation analysis in the sciences and science studies, especially the sociology of science and science policy. Finally it provides a view of the future of bibliometrics, exploring in detail the ongoing extension of bibliometric methods to the structure and dynamics of the World Wide Web. This book gives newcomers to the field of bibliometrics an accessible entry point to an entire research tradition otherwise scattered through a vast amount of journal literature. At the same time, it brings to the forefront the cross-disciplinary linkages between the various fields (sociology, philosophy, mathematics, politics) that intersect at the crossroads of citation analysis. Because of its discursive and interdisciplinary approach, the book is useful to those in every area of scholarship involved in the quantitative analysis of information exchanges, but also to science historians and general readers who simply wish to familiarize them
Download or read book Theories of Communication Networks written by Peter R. Monge and published by Oxford University Press. This book was released on 2003-03-27 with total page 431 pages. Available in PDF, EPUB and Kindle. Book excerpt: To date, most network research contains one or more of five major problems. First, it tends to be atheoretical, ignoring the various social theories that contain network implications. Second, it explores single levels of analysis rather than the multiple levels out of which most networks are comprised. Third, network analysis has employed very little the insights from contemporary complex systems analysis and computer simulations. Foruth, it typically uses descriptive rather than inferential statistics, thus robbing it of the ability to make claims about the larger universe of networks. Finally, almost all the research is static and cross-sectional rather than dynamic. Theories of Communication Networks presents solutions to all five problems. The authors develop a multitheoretical model that relates different social science theories with different network properties. This model is multilevel, providing a network decomposition that applies the various social theories to all network levels: individuals, dyads, triples, groups, and the entire network. The book then establishes a model from the perspective of complex adaptive systems and demonstrates how to use Blanche, an agent-based network computer simulation environment, to generate and test network theories and hypotheses. It presents recent developments in network statistical analysis, the p* family, which provides a basis for valid multilevel statistical inferences regarding networks. Finally, it shows how to relate communication networks to other networks, thus providing the basis in conjunction with computer simulations to study the emergence of dynamic organizational networks.
Download or read book Automatic Disambiguation of Author Names in Bibliographic Repositories written by Anderson A. Ferreira and published by Morgan & Claypool Publishers. This book was released on 2020-06-01 with total page 148 pages. Available in PDF, EPUB and Kindle. Book excerpt: This book deals with a hard problem that is inherent to human language: ambiguity. In particular, we focus on author name ambiguity, a type of ambiguity that exists in digital bibliographic repositories, which occurs when an author publishes works under distinct names or distinct authors publish works under similar names. This problem may be caused by a number of reasons, including the lack of standards and common practices, and the decentralized generation of bibliographic content. As a consequence, the quality of the main services of digital bibliographic repositories such as search, browsing, and recommendation may be severely affected by author name ambiguity. The focal point of the book is on automatic methods, since manual solutions do not scale to the size of the current repositories or the speed in which they are updated. Accordingly, we provide an ample view on the problem of automatic disambiguation of author names, summarizing the results of more than a decade of research on this topic conducted by our group, which were reported in more than a dozen publications that received over 900 citations so far, according to Google Scholar. We start by discussing its motivational issues (Chapter 1). Next, we formally define the author name disambiguation task (Chapter 2) and use this formalization to provide a brief, taxonomically organized, overview of the literature on the topic (Chapter 3). We then organize, summarize and integrate the efforts of our own group on developing solutions for the problem that have historically produced state-of-the-art (by the time of their proposals) results in terms of the quality of the disambiguation results. Thus, Chapter 4 covers HHC - Heuristic-based Clustering, an author name disambiguation method that is based on two specific real-world assumptions regarding scientific authorship. Then, Chapter 5 describes SAND - Self-training Author Name Disambiguator and Chapter 6 presents two incremental author name disambiguation methods, namely INDi - Incremental Unsupervised Name Disambiguation and INC- Incremental Nearest Cluster. Finally, Chapter 7 provides an overview of recent author name disambiguation methods that address new specific approaches such as graph-based representations, alternative predefined similarity functions, visualization facilities and approaches based on artificial neural networks. The chapters are followed by three appendices that cover, respectively: (i) a pattern matching function for comparing proper names and used by some of the methods addressed in this book; (ii) a tool for generating synthetic collections of citation records for distinct experimental tasks; and (iii) a number of datasets commonly used to evaluate author name disambiguation methods. In summary, the book organizes a large body of knowledge and work in the area of author name disambiguation in the last decade, hoping to consolidate a solid basis for future developments in the field.
Download or read book Proceedings of the Sixth SIAM International Conference on Data Mining written by Joydeep Ghosh and published by SIAM. This book was released on 2006-04-01 with total page 662 pages. Available in PDF, EPUB and Kindle. Book excerpt: The Sixth SIAM International Conference on Data Mining continues the tradition of presenting approaches, tools, and systems for data mining in fields such as science, engineering, industrial processes, healthcare, and medicine. The datasets in these fields are large, complex, and often noisy. Extracting knowledge requires the use of sophisticated, high-performance, and principled analysis techniques and algorithms, based on sound statistical foundations. These techniques in turn require powerful visualization technologies; implementations that must be carefully tuned for performance; software systems that are usable by scientists, engineers, and physicians as well as researchers; and infrastructures that support them.
Download or read book Open Access written by Peter Suber and published by MIT Press. This book was released on 2012-07-20 with total page 255 pages. Available in PDF, EPUB and Kindle. Book excerpt: A concise introduction to the basics of open access, describing what it is (and isn't) and showing that it is easy, fast, inexpensive, legal, and beneficial. The Internet lets us share perfect copies of our work with a worldwide audience at virtually no cost. We take advantage of this revolutionary opportunity when we make our work “open access”: digital, online, free of charge, and free of most copyright and licensing restrictions. Open access is made possible by the Internet and copyright-holder consent, and many authors, musicians, filmmakers, and other creators who depend on royalties are understandably unwilling to give their consent. But for 350 years, scholars have written peer-reviewed journal articles for impact, not for money, and are free to consent to open access without losing revenue. In this concise introduction, Peter Suber tells us what open access is and isn't, how it benefits authors and readers of research, how we pay for it, how it avoids copyright problems, how it has moved from the periphery to the mainstream, and what its future may hold. Distilling a decade of Suber's influential writing and thinking about open access, this is the indispensable book on the subject for researchers, librarians, administrators, funders, publishers, and policy makers.