《Python自然语言处理（第二版）-Steven Bird等》学习笔记：第09章建立基于特征的文法

第09章建立基于特征的文法

* 9.1 文法特征
<https://blog.csdn.net/weixin_43935926/article/details/86528015#91__10>
* 句法协议 <https://blog.csdn.net/weixin_43935926/article/details/86528015#_92>
* 使用属性和约束
<https://blog.csdn.net/weixin_43935926/article/details/86528015#_104>
* 术语 <https://blog.csdn.net/weixin_43935926/article/details/86528015#_199>
* 9.2 处理特征结构
<https://blog.csdn.net/weixin_43935926/article/details/86528015#92__201>
* 包含和统一 <https://blog.csdn.net/weixin_43935926/article/details/86528015#_323>
* 9.3 扩展基于特征的文法
<https://blog.csdn.net/weixin_43935926/article/details/86528015#93__467>
* 子类别 <https://blog.csdn.net/weixin_43935926/article/details/86528015#_469>
* 核心词回顾 <https://blog.csdn.net/weixin_43935926/article/details/86528015#_471>
* 助动词与倒装 <https://blog.csdn.net/weixin_43935926/article/details/86528015#_473>
* 无限制依赖成分
<https://blog.csdn.net/weixin_43935926/article/details/86528015#_475>
* 9.4 小结
<https://blog.csdn.net/weixin_43935926/article/details/86528015#94__593>

import nltk
* 怎样用特征扩展上下文无关文法框架，以获得更细粒度的对文法类别和产生式的控制？
* 特征结构的主要形式化属性是什么，如何使用它们来计算？
* 用基于特征的文法能捕捉到什么语言模式和文法结构？
<>9.1 文法特征

基于规则的文法上下文中，特征和特征值对被称为特征结构
kim = {'CAT': 'NP', 'ORTH': 'Kim', 'REF': 'k'} chase = {'CAT': 'V', 'ORTH':
'chased', 'REL': 'chase'} chase['AGT'] = 'sbj' chase['PAT'] = 'obj'
一个简单的假设：在动词直接左侧和右侧的NP 分别是主语和宾语。
sent = "Kim chased Lee" tokens = sent.split() lee = {'CAT': 'NP', 'ORTH': 'Lee'
, 'REF': 'l'} def lex2fs(word): for fs in [kim, lee, chase]: if fs['ORTH'] ==
word: return fs subj, verb, obj = lex2fs(tokens[0]), lex2fs(tokens[1]), lex2fs(
tokens[2]) verb['AGT'] = subj['REF'] # agent of 'chase' is Kim verb['PAT'] = obj
['REF'] # patient of 'chase' is Lee for k in ['ORTH', 'REL', 'AGT', 'PAT']: #
check featstruct of 'chase' print("%-5s => %s" % (k, verb[k])) ORTH => chased
REL => chase AGT => k PAT => l surprise = {'CAT': 'V', 'ORTH': 'surprised',
'REL': 'surprise', 'SRC': 'sbj', 'EXP': 'obj'}
<>句法协议

动词的形态属性与主语名词短语的句法属性一起变化。这种一起变化被称为协议（agreement）。

表9-1. 英语规则动词的协议范式

单数复数
第一人称 I run we run
第二人称 you run you run
第三人称 he/she/it runs they run
<>使用属性和约束

例9-1. 基于特征的文法的例子。
nltk.data.show_cfg('grammars/book_grammars/feat0.fcfg') % start S #
################### # Grammar Productions # ################### # S expansion
productions S -> NP[NUM=?n] VP[NUM=?n] # NP expansion productions NP[NUM=?n] ->
N[NUM=?n] NP[NUM=?n] -> PropN[NUM=?n] NP[NUM=?n] -> Det[NUM=?n] N[NUM=?n]
NP[NUM=pl] -> N[NUM=pl] # VP expansion productions VP[TENSE=?t, NUM=?n] ->
IV[TENSE=?t, NUM=?n] VP[TENSE=?t, NUM=?n] -> TV[TENSE=?t, NUM=?n] NP #
################### # Lexical Productions # ################### Det[NUM=sg] ->
'this' | 'every' Det[NUM=pl] -> 'these' | 'all' Det -> 'the' | 'some' |
'several' PropN[NUM=sg]-> 'Kim' | 'Jody' N[NUM=sg] -> 'dog' | 'girl' | 'car' |
'child' N[NUM=pl] -> 'dogs' | 'girls' | 'cars' | 'children' IV[TENSE=pres,
NUM=sg] -> 'disappears' | 'walks' TV[TENSE=pres, NUM=sg] -> 'sees' | 'likes'
IV[TENSE=pres, NUM=pl] -> 'disappear' | 'walk' TV[TENSE=pres, NUM=pl] -> 'see'
| 'like' IV[TENSE=past] -> 'disappeared' | 'walked' TV[TENSE=past] -> 'saw' |
'liked'
例9-2. 跟踪基于特征的图表分析器
tokens = 'Kim likes children'.split() from nltk import load_parser cp =
load_parser('grammars/book_grammars/feat0.fcfg', trace=2) for tree in cp.parse(
tokens): print(tree) |.Kim .like.chil.| Leaf Init Rule: |[----] . .| [0:1]
'Kim' |. [----] .| [1:2] 'likes' |. . [----]| [2:3] 'children' Feature Bottom
Up Predict Combine Rule: |[----] . .| [0:1] PropN[NUM='sg'] -> 'Kim' * Feature
Bottom Up Predict Combine Rule: |[----] . .| [0:1] NP[NUM='sg'] ->
PropN[NUM='sg'] * Feature Bottom Up Predict Combine Rule: |[----> . .| [0:1]
S[] -> NP[NUM=?n] * VP[NUM=?n] {?n: 'sg'} Feature Bottom Up Predict Combine
Rule: |. [----] .| [1:2] TV[NUM='sg', TENSE='pres'] -> 'likes' * Feature Bottom
Up Predict Combine Rule: |. [----> .| [1:2] VP[NUM=?n, TENSE=?t] -> TV[NUM=?n,
TENSE=?t] * NP[] {?n: 'sg', ?t: 'pres'} Feature Bottom Up Predict Combine Rule:
|. . [----]| [2:3] N[NUM='pl'] -> 'children' * Feature Bottom Up Predict
Combine Rule: |. . [----]| [2:3] NP[NUM='pl'] -> N[NUM='pl'] * Feature Bottom
Up Predict Combine Rule: |. . [---->| [2:3] S[] -> NP[NUM=?n] * VP[NUM=?n] {?n:
'pl'} Feature Single Edge Fundamental Rule: |. [---------]| [1:3] VP[NUM='sg',
TENSE='pres'] -> TV[NUM='sg', TENSE='pres'] NP[] * Feature Single Edge
Fundamental Rule: |[==============]| [0:3] S[] -> NP[NUM='sg'] VP[NUM='sg'] *
(S[] (NP[NUM='sg'] (PropN[NUM='sg'] Kim)) (VP[NUM='sg', TENSE='pres']
(TV[NUM='sg', TENSE='pres'] likes) (NP[NUM='pl'] (N[NUM='pl'] children))))
<>术语

<>9.2 处理特征结构

NLTK 中的特征结构使用构造函数FeatStruct()声明。原子特征值可以是字符串或整数。
fs1 = nltk.FeatStruct(TENSE='past', NUM='sg') print(fs1) [ NUM = 'sg' ] [
TENSE = 'past' ] fs1 = nltk.FeatStruct(PER=3, NUM='pl', GND='fem') print(fs1[
'GND']) fem fs1['CASE'] = 'acc' fs2 = nltk.FeatStruct(POS='N', AGR=fs1) print(
fs2) [ [ CASE = 'acc' ] ] [ AGR = [ GND = 'fem' ] ] [ [ NUM = 'pl' ] ] [ [ PER
= 3 ] ] [ ] [ POS = 'N' ] print(fs2['AGR']) [ CASE = 'acc' ] [ GND = 'fem' ] [
NUM = 'pl' ] [ PER = 3 ] print(fs2['AGR']['PER']) 3 print(nltk.FeatStruct(
"[POS='N', AGR=[PER=3, NUM='pl', GND='fem']]")) [ [ GND = 'fem' ] ] [ AGR = [
NUM = 'pl' ] ] [ [ PER = 3 ] ] [ ] [ POS = 'N' ] print(nltk.FeatStruct(NAME=
'Lee', TELNO='01 27 86 42 96', AGE=33)) [ AGE = 33 ] [ NAME = 'Lee' ] [ TELNO =
'01 27 86 42 96' ] print(nltk.FeatStruct("""[NAME='Lee', ADDRESS=(1)[NUMBER=74,
STREET='rue Pascal'],SPOUSE=[NAME='Kim', ADDRESS->(1)]]""")) [ ADDRESS = (1) [
NUMBER = 74 ] ] [ [ STREET = 'rue Pascal' ] ] [ ] [ NAME = 'Lee' ] [ ] [ SPOUSE
= [ ADDRESS -> (1) ] ] [ [ NAME = 'Kim' ] ] print(nltk.FeatStruct("[A='a',
B=(1)[C='c'], D->(1), E->(1)]")) [ A = 'a' ] [ ] [ B = (1) [ C = 'c' ] ] [ ] [
D -> (1) ] [ E -> (1) ]
<>包含和统一
fs1 = nltk.FeatStruct(NUMBER=74, STREET='rue Pascal') fs2 = nltk.FeatStruct(
CITY='Paris') print(fs2.unify(fs1)) [ CITY = 'Paris' ] [ NUMBER = 74 ] [ STREET
= 'rue Pascal' ] fs0 = nltk.FeatStruct(A='a') fs1 = nltk.FeatStruct(A='b') fs2 =
fs0.unify(fs1) print(fs2) None fs0 = nltk.FeatStruct("""[NAME=Lee,
ADDRESS=[NUMBER=74, STREET='rue Pascal'], SPOUSE= [NAME=Kim,
ADDRESS=[NUMBER=74, STREET='rue Pascal']]]""") print(fs0) [ ADDRESS = [ NUMBER
= 74 ] ] [ [ STREET = 'rue Pascal' ] ] [ ] [ NAME = 'Lee' ] [ ] [ [ ADDRESS = [
NUMBER = 74 ] ] ] [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ] [ [ ] ] [ [ NAME =
'Kim' ] ] fs1 = nltk.FeatStruct("[SPOUSE = [ADDRESS = [CITY = Paris]]]") print(
fs1.unify(fs0)) [ ADDRESS = [ NUMBER = 74 ] ] [ [ STREET = 'rue Pascal' ] ] [ ]
[ NAME = 'Lee' ] [ ] [ [ [ CITY = 'Paris' ] ] ] [ [ ADDRESS = [ NUMBER = 74 ] ]
] [ SPOUSE = [ [ STREET = 'rue Pascal' ] ] ] [ [ ] ] [ [ NAME = 'Kim' ] ] fs2 =
nltk.FeatStruct("""[NAME=Lee, ADDRESS=(1)[NUMBER=74, STREET='rue Pascal'],
SPOUSE=[NAME=Kim, ADDRESS->(1)]]""") print(fs1.unify(fs2)) [ [ CITY = 'Paris' ]
] [ ADDRESS = (1) [ NUMBER = 74 ] ] [ [ STREET = 'rue Pascal' ] ] [ ] [ NAME =
'Lee' ] [ ] [ SPOUSE = [ ADDRESS -> (1) ] ] [ [ NAME = 'Kim' ] ] fs1 = nltk.
FeatStruct("[ADDRESS1=[NUMBER=74, STREET='rue Pascal']]") fs2 = nltk.FeatStruct(
"[ADDRESS1=?x, ADDRESS2=?x]") print(fs2) [ ADDRESS1 = ?x ] [ ADDRESS2 = ?x ]
print(fs2.unify(fs1)) [ ADDRESS1 = (1) [ NUMBER = 74 ] ] [ [ STREET = 'rue
Pascal' ] ] [ ] [ ADDRESS2 -> (1) ]
<>9.3 扩展基于特征的文法

<>子类别

<>核心词回顾

<>助动词与倒装

<>无限制依赖成分

例9-3. 具有倒装从句和长距离依赖的产生式的文法，使用斜线类别。
nltk.data.show_cfg('grammars/book_grammars/feat1.fcfg') % start S #
################### # Grammar Productions # ################### S[-INV] -> NP
VP S[-INV]/?x -> NP VP/?x S[-INV] -> NP S/NP S[-INV] -> Adv[+NEG] S[+INV]
S[+INV] -> V[+AUX] NP VP S[+INV]/?x -> V[+AUX] NP VP/?x SBar -> Comp S[-INV]
SBar/?x -> Comp S[-INV]/?x VP -> V[SUBCAT=intrans, -AUX] VP -> V[SUBCAT=trans,
-AUX] NP VP/?x -> V[SUBCAT=trans, -AUX] NP/?x VP -> V[SUBCAT=clause, -AUX] SBar
VP/?x -> V[SUBCAT=clause, -AUX] SBar/?x VP -> V[+AUX] VP VP/?x -> V[+AUX] VP/?x
# ################### # Lexical Productions # ###################
V[SUBCAT=intrans, -AUX] -> 'walk' | 'sing' V[SUBCAT=trans, -AUX] -> 'see' |
'like' V[SUBCAT=clause, -AUX] -> 'say' | 'claim' V[+AUX] -> 'do' | 'can'
NP[-WH] -> 'you' | 'cats' NP[+WH] -> 'who' Adv[+NEG] -> 'rarely' | 'never'
NP/NP -> Comp -> 'that' tokens = 'who do you claim that you like'.split() from
nltkimport load_parser cp = load_parser('grammars/book_grammars/feat1.fcfg') for
treein cp.parse(tokens): print(tree) (S[-INV] (NP[+WH] who) (S[+INV]/NP[]
(V[+AUX] do) (NP[-WH] you) (VP[]/NP[] (V[-AUX, SUBCAT='clause'] claim)
(SBar[]/NP[] (Comp[] that) (S[-INV]/NP[] (NP[-WH] you) (VP[]/NP[] (V[-AUX,
SUBCAT='trans'] like) (NP[]/NP[] ))))))) tokens = 'you claim that you like cats'
.split() for tree in cp.parse(tokens): print(tree) (S[-INV] (NP[-WH] you) (VP[]
(V[-AUX, SUBCAT='clause'] claim) (SBar[] (Comp[] that) (S[-INV] (NP[-WH] you)
(VP[] (V[-AUX, SUBCAT='trans'] like) (NP[-WH] cats)))))) tokens = 'rarely do
you sing'.split() for tree in cp.parse(tokens): print(tree) (S[-INV] (Adv[+NEG]
rarely) (S[+INV] (V[+AUX] do) (NP[-WH] you) (VP[] (V[-AUX, SUBCAT='intrans']
sing))))
<>9.4 小结

* 上下文无关文法的传统分类是原子符号。特征结构的一个重要的作用是捕捉精细的区分，否则将需要数量翻倍的原子类别。
* 通过使用特征值上的变量，我们可以表达文法产生式中的限制，允许不同的特征规格的实现可以相互依赖。
* 通常情况下，我们在词汇层面指定固定的特征值，限制短语中的特征值与它们的原子中的对应值统一。
* 特征值可以是原子的或复杂的。原子值的一个特定类别是布尔值，按照惯例用[+/- feat]表示。
* 两个特征可以共享一个值（原子的或复杂的）。具有共享值的结构被称为重入。共享的值被表示为AVM 中的数字索引（或标记）。
* 一个特征结构中的路径是一个特征的元组，对应从图的根开始的弧的序列上的标签。
* 两条路径是等价的，如果它们共享一个值。
* 包含的特征结构是偏序的。FS0 包含FS1，当FS0 比FS1 更一般（较少信息）。
* 两种结构FS0 和FS1 的统一，如果成功，就是包含FS0 和FS1 的合并信息的特征结构FS2。
* 如果统一在FS 中指定一条路径π，那么它也指定等效与π的每个路径π’。
* 我们可以使用特征结构建立对大量广泛语言学现象的简洁的分析，包括动词子类别，倒装结构，无限制依赖结构和格支配。
致谢
《Python自然语言处理》1
<https://blog.csdn.net/weixin_43935926/article/details/86528015#fn1>2
<https://blog.csdn.net/weixin_43935926/article/details/86528015#fn2>3
<https://blog.csdn.net/weixin_43935926/article/details/86528015#fn3> 4
<https://blog.csdn.net/weixin_43935926/article/details/86528015#fn4>，作者：Steven
Bird, Ewan Klein & Edward
Loper，是实践性很强的一部入门读物，2009年第一版，2015年第二版，本学习笔记结合上述版本，对部分内容进行了延伸学习、练习，在此分享，期待对大家有所帮助，欢迎加我微信（验证：NLP），一起学习讨论，不足之处，欢迎指正。

参考文献

*
http://nltk.org/ <http://nltk.org/> ↩︎
<https://blog.csdn.net/weixin_43935926/article/details/86528015#fnref1>

*
Steven Bird, Ewan Klein & Edward Loper,Natural Language Processing with
Python,2009↩︎
<https://blog.csdn.net/weixin_43935926/article/details/86528015#fnref2>

*
（英）伯德，（英）克莱因，（美）洛普，《Python自然语言处理》，2010年，东南大学出版社 ↩︎
<https://blog.csdn.net/weixin_43935926/article/details/86528015#fnref3>

*
Steven Bird, Ewan Klein & Edward Loper,Natural Language Processing with
Python,2015↩︎
<https://blog.csdn.net/weixin_43935926/article/details/86528015#fnref4>

热门工具换一换

《Python自然语言处理（第二版）-Steven Bird等》学习笔记：第09章 建立基于特征的文法