Initial commit

2026-02-01 09:31:38 +01:00
commit e02db93960
4396 changed files with 1511612 additions and 0 deletions
--- a/backend/venv/Lib/site-packages/nltk/test/propbank.doctest
+++ b/backend/venv/Lib/site-packages/nltk/test/propbank.doctest
@@ -0,0 +1,176 @@
+.. Copyright (C) 2001-2025 NLTK Project
+.. For license information, see LICENSE.TXT
+
+========
+PropBank
+========
+
+The PropBank Corpus provides predicate-argument annotation for the
+entire Penn Treebank.  Each verb in the treebank is annotated by a single
+instance in PropBank, containing information about the location of
+the verb, and the location and identity of its arguments:
+
+    >>> from nltk.corpus import propbank
+    >>> pb_instances = propbank.instances()
+    >>> print(pb_instances)
+    [<PropbankInstance: wsj_0001.mrg, sent 0, word 8>,
+     <PropbankInstance: wsj_0001.mrg, sent 1, word 10>, ...]
+
+Each propbank instance defines the following member variables:
+
+  - Location information: `fileid`, `sentnum`, `wordnum`
+  - Annotator information: `tagger`
+  - Inflection information: `inflection`
+  - Roleset identifier: `roleset`
+  - Verb (aka predicate) location: `predicate`
+  - Argument locations and types: `arguments`
+
+The following examples show the types of these arguments:
+
+    >>> inst = pb_instances[103]
+    >>> (inst.fileid, inst.sentnum, inst.wordnum)
+    ('wsj_0004.mrg', 8, 16)
+    >>> inst.tagger
+    'gold'
+    >>> inst.inflection
+    <PropbankInflection: vp--a>
+    >>> infl = inst.inflection
+    >>> infl.form, infl.tense, infl.aspect, infl.person, infl.voice
+    ('v', 'p', '-', '-', 'a')
+    >>> inst.roleset
+    'rise.01'
+    >>> inst.predicate
+    PropbankTreePointer(16, 0)
+    >>> inst.arguments
+    ((PropbankTreePointer(0, 2), 'ARG1'),
+     (PropbankTreePointer(13, 1), 'ARGM-DIS'),
+     (PropbankTreePointer(17, 1), 'ARG4-to'),
+     (PropbankTreePointer(20, 1), 'ARG3-from'))
+
+The location of the predicate and of the arguments are encoded using
+`PropbankTreePointer` objects, as well as `PropbankChainTreePointer`
+objects and `PropbankSplitTreePointer` objects.  A
+`PropbankTreePointer` consists of a `wordnum` and a `height`:
+
+    >>> print(inst.predicate.wordnum, inst.predicate.height)
+    16 0
+
+This identifies the tree constituent that is headed by the word that
+is the `wordnum`\ 'th token in the sentence, and whose span is found
+by going `height` nodes up in the tree.  This type of pointer is only
+useful if we also have the corresponding tree structure, since it
+includes empty elements such as traces in the word number count.  The
+trees for 10% of the standard PropBank Corpus are contained in the
+`treebank` corpus:
+
+    >>> tree = inst.tree
+
+    >>> from nltk.corpus import treebank
+    >>> assert tree == treebank.parsed_sents(inst.fileid)[inst.sentnum]
+
+    >>> inst.predicate.select(tree)
+    Tree('VBD', ['rose'])
+    >>> for (argloc, argid) in inst.arguments:
+    ...     print('%-10s %s' % (argid, argloc.select(tree).pformat(500)[:50]))
+    ARG1       (NP-SBJ (NP (DT The) (NN yield)) (PP (IN on) (NP (
+    ARGM-DIS   (PP (IN for) (NP (NN example)))
+    ARG4-to    (PP-DIR (TO to) (NP (CD 8.04) (NN %)))
+    ARG3-from  (PP-DIR (IN from) (NP (CD 7.90) (NN %)))
+
+Propbank tree pointers can be converted to standard tree locations,
+which are usually easier to work with, using the `treepos()` method:
+
+    >>> treepos = inst.predicate.treepos(tree)
+    >>> print (treepos, tree[treepos])
+    (4, 0) (VBD rose)
+
+In some cases, argument locations will be encoded using
+`PropbankChainTreePointer`\ s (for trace chains) or
+`PropbankSplitTreePointer`\ s (for discontinuous constituents).  Both
+of these objects contain a single member variable, `pieces`,
+containing a list of the constituent pieces.  They also define the
+method `select()`, which will return a tree containing all the
+elements of the argument.  (A new head node is created, labeled
+"*CHAIN*" or "*SPLIT*", since the argument is not a single constituent
+in the original tree).  Sentence #6 contains an example of an argument
+that is both discontinuous and contains a chain:
+
+    >>> inst = pb_instances[6]
+    >>> inst.roleset
+    'expose.01'
+    >>> argloc, argid = inst.arguments[2]
+    >>> argloc
+    <PropbankChainTreePointer: 22:1,24:0,25:1*27:0>
+    >>> argloc.pieces
+    [<PropbankSplitTreePointer: 22:1,24:0,25:1>, PropbankTreePointer(27, 0)]
+    >>> argloc.pieces[0].pieces
+    ...
+    [PropbankTreePointer(22, 1), PropbankTreePointer(24, 0),
+     PropbankTreePointer(25, 1)]
+    >>> print(argloc.select(inst.tree))
+    (*CHAIN*
+      (*SPLIT* (NP (DT a) (NN group)) (IN of) (NP (NNS workers)))
+      (-NONE- *))
+
+The PropBank Corpus also provides access to the frameset files, which
+define the argument labels used by the annotations, on a per-verb
+basis.  Each frameset file contains one or more predicates, such as
+'turn' or 'turn_on', each of which is divided into coarse-grained word
+senses called rolesets.  For each roleset, the frameset file provides
+descriptions of the argument roles, along with examples.
+
+    >>> expose_01 = propbank.roleset('expose.01')
+    >>> turn_01 = propbank.roleset('turn.01')
+    >>> print(turn_01)
+    <Element 'roleset' at ...>
+    >>> for role in turn_01.findall("roles/role"):
+    ...     print(role.attrib['n'], role.attrib['descr'])
+    0 turner
+    1 thing turning
+    m direction, location
+
+    >>> from xml.etree import ElementTree
+    >>> print(ElementTree.tostring(turn_01.find('example')).decode('utf8').strip())
+    <example name="transitive agentive">
+      <text>
+      John turned the key in the lock.
+      </text>
+      <arg n="0">John</arg>
+      <rel>turned</rel>
+      <arg n="1">the key</arg>
+      <arg f="LOC" n="m">in the lock</arg>
+    </example>
+
+Note that the standard corpus distribution only contains 10% of the
+treebank, so the parse trees are not available for instances starting
+at 9353:
+
+    >>> inst = pb_instances[9352]
+    >>> inst.fileid
+    'wsj_0199.mrg'
+    >>> print(inst.tree)
+    (S (NP-SBJ (NNP Trinity)) (VP (VBD said) (SBAR (-NONE- 0) ...))
+    >>> print(inst.predicate.select(inst.tree))
+    (VB begin)
+
+    >>> inst = pb_instances[9353]
+    >>> inst.fileid
+    'wsj_0200.mrg'
+    >>> print(inst.tree)
+    None
+    >>> print(inst.predicate.select(inst.tree))
+    Traceback (most recent call last):
+      . . .
+    ValueError: Parse tree not available
+
+However, if you supply your own version of the treebank corpus (by
+putting it before the nltk-provided version on `nltk.data.path`, or
+by creating a `ptb` directory as described above and using the
+`propbank_ptb` module), then you can access the trees for all
+instances.
+
+A list of the verb lemmas contained in PropBank is returned by the
+`propbank.verbs()` method:
+
+    >>> propbank.verbs()
+    ['abandon', 'abate', 'abdicate', 'abet', 'abide', ...]