Rubbing & Scrubbing My Data

글

최근 글

Interviews

[ Geoffrey Hinton ] Four Questions For: Geoff Hinton (Jan 16, 2017) [ Original ][ Sum-up ] * 3년전만해도 Neural Net이 raw data로부터 linguistic knowledge를 얻으리라는 예상을 하지 못해, Machine Translation에는 사용을 안했다. 하지만 지금의 NMT는 대세이다. * 특이점이 오는 AI를 두려워하진 않는다. 단, 이들을 군사목적으로 사용되는데 있어서 국제적인 동의를 반드시 가져야 할 것이다. * AI가 노동시장에 영향을 끼치는데 긍정적으로 바라본다. 이들은 생산성을 높여줘 (예전의 2차 산업때와 같이) 모든 사람들에게 좋은 영향을 주기 때문이다. 단, 올바르고 공정한 정치적 시스템이 전제되어야 할 것이다. 기술은 문제가 되지 않는다. 혜택을 모든 사람들에게 주지 않는 비공정 정치 시스템이 문제이다. * 최근 deep learning이 여러 도메인 (e.g., image, speech, text)에 막대한 영향을 끼치고 있는데, 우리는 20년 또는 그 전부터 기초 neural network으로부터 이들의 잠재력(flowering)을 봐왔다. 그 잠재력은 더 좋은 타입의 뉴런, 더좋은 아키텍쳐, Deep Net을 학습하는 더 좋은 방법, 모델이 input의 실체를 더 잘 살펴보도록 하는 방법에 있다. 이러한 deep learning의 잠재력은 앞으로 계속 발전될 것이다. * 앞으로 가장 기대가 되는 부분은, neural network가 정말로 document의 content를 이해하는 것이다. 최근 핫 토픽 중 하나인, new types of temporary memory가 이 부분에 속한다. * 한가지 문제는 neural network가 small amounts of data에서 generalize를 잘 하지 못한다는 부분이다. 추측컨데, 이것을 해결하기 위해서는 완전 새로운 타입의 neuron을 개발해야 할 것이다. * deep learning를...

자세한 내용 보기

A Collection of Papers

[ Google Research ] Sequence to Sequence Learning with Neural Networks (2014) [ Dyed ] [ Sum-up ] 1, 2, 3, 4, 5, 6 [한줄요약] For MT task, our method uses a multilayered LSTM to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. [정리] Vanilla DNN은 input/output 길이가 고정되어 있어야하고, vanilla RNN은 input과 output 길이가 1:1대응되어야 해서, input/output길이가 가변적인 sequence learning을 하기 어렵다. 이를 위해 2개 LSTM은 붙여 input/output 길이에 구애받지 않는 seq2seq모델 설계하였다. LSTM과 reversing을 이용해 long sentence도 잘 학습되게 하였고, sentence embedding을 통해 의미적/구조적으로 비슷한 문장들의 점들이 가깝게 분포되어 있는 것을 확인할 수 있다. [키워드] a fixed dimensional vector, reversing the input sequence, sentence embedding Distributed Representations of Sentences and Documents [ Dyed ] [ Total ] On -and Off-Topic Classification and Semantic Annotation of User-Generated Software Requirements [ Sum-up ] [느낀점] 파더본 대학 인턴연구를 위한 논문이며, 2번째 classif...

자세한 내용 보기

Deep Learning

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. Deep learning . MIT Press, 2016. [ Link ] 1. Introduction [ Sum-up ] * AI는 컴퓨터가 잘하는 일(formal task: e.g, 수치계산, 규칙작업)말고 사람이 (직감적으로) 잘하는 일(informal task: e.g., 언어/이미지 이해)을 해결하는데 목표를 가진다. * 사실 정작 사람은 직감적으로 잘하는 일들(informal tasks)을 왜 잘하는지 모른다. 우리는 그것들을 세상/지식으로부터 배우는 데, 그것들은 지극히 주관적(subjective)이고 직감적(intuitive)이기 때문에 형태적으로 표현(formally describe)하기 힘들다. * AI (deep learning)는 세상/문제를 '계층적 개념'(hierarchy of concepts) 형태로 이해해서 사람이 어떤 개념/형태를 명시(formally specify)해주지 않아도 사람이 잘하는 직감적인 문제(complicated concepts, informal tasks, informal knowledge)들을 자체 지능적으로 해결하고자 한다. * AI에서 Knowledge base (hard-code knowledge)는 사람이 knowledge를 직접 design해줘야 한다. 따라서, (단순 규칙을 통한 추론만으로는) 복잡한 세상/문제의 패턴을 인식하기가 어렵다. 반면, Machine Learning은 사람이 아닌 (수많은) raw data로부터 AI만의 knowledge를 얻는 방식이다. data로부터 직접 얻기 때문에 'reliability'가 있다. 물론 양에 비례할 것이다. * Machine Learning의 문제는 representation/feature에 '상당히' 의존한다는 것이다. 다시 말해, feature를 잘 design하기 위해서는 domain expert로부터의 h...

자세한 내용 보기

Mathematical Tools for Computer Vision

University of Science and Technology Electronics and Telecommunications Research Institute (ETRI) Vision System Research Team, Jae-Young LEE March 1, 2016 ~ June 24, 2016 https://sites.google.com/site/roricljy/ 1. Geometry: Vectors - Vector & Scalar & Vector Equality, Addition, and Subtraction - Euclidean Vector - Trigonometry Review - Polar Representation - Dot product & Cross product - Homework #1 - Homework #2 2. Geometry: 공간 도형 - 도형 - 직선의 방정식 - 평면의 방정식 - 부등식의 영역 - 도형의 방정식과 함수 - Homework #3 3. 행렬 연산 - Introduction - Matrices and matrix algebra - Matrices and systems of linear equations 4. 벡터공간, 선형시스템 - Vector Spaces - Basis and Dimension - Rank of a Matrix and Systems of Linear Equations - 기타 선형대수학에서 알아두어야 할 것들 ...

자세한 내용 보기

Statistics in Medicine

by Kristin Sainani CONTENTS 1. Descriptive statistics and looking at data 2. Review of study designs; Measures of disease risk and association 3. Probability, Bayes' Rule, Diagnostic Testing 4. Probability distributions 5. Statistical Inference 6. P-values (errors, statistical power, and pitfalls) 7. Statistical Tests 8. Regression Analysis 9. Logistic Regression, Cox Regression 1. Descriptive statistics and looking at data 1.1 Types of Data 1.1.1 Quantitative Variable It is a numerical data(e.g., Age, Blood pressure, BMI, Pulse) that you can add, subtract, multiply, and divide. ㆍ Continuous (quantitative) variable: can theoretically take on any value within a given range (e.g., height=68.99955... inches) ㆍ Discrete (quantitative) variable: can only take on certain values (e.g., count data) However, In the real world, sometimes the distinction between continuous and discrete actually doesn't make much difference. For example, when we analyze a family size from discrete value(e...

자세한 내용 보기

Logistic Regression

By Andrew Ng 1. Logistic Regression 1.1 Visualizing Data Part └ plotData function 1.2 Advanced Optimization Part └ mapFeature function └ costFunctionReg function └ fminunc function 1.3 Decision Boundary and Prediction Part └ plotDecisionBoundary function └ predict function 1. Logistic Regression We'll implement regularized logistic regression to predict whether microchips from a fabrication plant passes quality assurance (QA). During QA, each microchip goes through various tests to ensure it is functioning correctly. Suppose you're the product manager of factory and you have the test results for some microchips on two different tests. From...

자세한 내용 보기

Rubbing & Scrubbing My Data

이 블로그 검색

글

Discussion

Interviews

A Collection of Papers

Deep Learning

Mathematical Tools for Computer Vision

Statistics in Medicine

Logistic Regression