기본 콘텐츠로 건너뛰기

4월, 2015의 게시물 표시

Vector Space Model

Motivation When you want to find some information by using Search Engines, you have to make a query used for search. Unfortunately, since you don't know exactly what it means, your query will be ambiguous and not accurate. Therefore, Search Engines give you the information in a ranked list rather than the right position. Intuition In order to make a ranked list, you need to calculate the similarity between the query and documents based on terms or words. One of the calculation of similarity is dot product on a vector space. In the vector space, there are many documents with respect to word dimensions The first to rank is d2, because to see with eyes it's the most similarity with the query. Problem How do we plot those vectors wonderfully and nicely and very very fairly ? - How do we define the dimension ? - How do we place a document vector ? - How do we place a query vector ? - How do we match a similarity ? Consideration 1. The frequency of each word of Query. First, Score in

Text Retrieval and Search Engines

by ChengXiang "Cheng" Zhai CONTENT 1. Natural Language Content Analysis 2. Text Access 3. Text Retrieval Problem 4. Text Retrieval Methods 5. Vector Space Model (Retrieval Model l) 6. System Implementation 7. Evaluation 8. Probabilistic Model (Retrieval Model ll) 9. Feedback 10. Web Search Harnessing Big Text Data: Text Retrieval + Text Mining Course Summary 1. Natural Language Content Analysis 1.1 Natural Language Processing (NLP) Natural Language Processing (NLP) is a field of computer science, artificial intelligence, and computational linguisitc concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human-computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. Computers can understand human language like that Koreans understand English. but, it's