Skip to content

Add Typing to Features#1956

Merged
mscuthbert merged 9 commits into
masterfrom
typing_features
Jun 25, 2026
Merged

Add Typing to Features#1956
mscuthbert merged 9 commits into
masterfrom
typing_features

Conversation

@mscuthbert

Copy link
Copy Markdown
Member
  • Features are typed
  • Take many X|None attributes and make them X = '' or other falsey values.
  • Bring __init__: self.name = .... to be class variables.
  • Put defaults for dimensions, discrete, etc. and remove inits that mirror the default.
  • Fix grammar, capitalization, and punctuation.
  • Bug fix: add .stream() to values that were returning StreamIterator but which expectedSteams.

Type-annotate the whole music21/features/ directory (base, jSymbolic,
native, outputFormats): __init__/process return types, attribute and
signature annotations, with narrowing guards where typed bodies need them.

Move the dimensions=1 default into FeatureExtractor and drop the 122
redundant `self.dimensions = 1` lines from the extractors that kept it.

Fix docstring grammar, capitalization, and ending punctuation across the
class docstrings and description strings.

Behavior notes:
- Feature.vector now starts as [] rather than None (Changed in v11).
- native.QualityFeature: a single key with a non-major/minor mode now falls
  through to key analysis instead of writing None into the vector.

AI-assisted (Claude).
Make name, description, isSequential, dimensions, discrete, and normalize
class-level attributes on FeatureExtractor with non-None defaults
('', '', True, 1, True, False). FeatureExtractor.__init__ now only sets the
per-instance stream/data/feature. Feature's name/description default to ''
and isSequential/discrete to bool True. DataInstance.classLabel is now str=''
(falsy) rather than str|None, and setClassLabel takes a str.

Hoist each jSymbolic/native extractor's metadata to class attributes and
remove the now-empty __init__ methods (133 of them). Drop values that match
the new defaults; keep the non-default ones. FifthsPitchHistogramFeature's
mapping loop becomes a dict comprehension class attribute; LanguageFeature
keeps __init__ only for its LanguageDetector instance.

Net ~674 fewer lines. mypy/ruff/pylint clean; doctests and unit tests pass.

AI-assisted (Claude).
Define a SecondsMapEntry TypedDict describing the per-element dictionary
returned by Stream.secondsMap (offsetSeconds, durationSeconds,
endTimeSeconds, element, voiceIndex), and type _getSecondsMap to return
list[SecondsMapEntry], building each entry as a typed literal.

AI-assisted (Claude).
Type the boolean keyword arguments (includeClassLabel, includeId,
concatenateLists, prepareStream) as bool across DataSet, DataInstance,
StreamForms, and the OutputFormat classes, plus outputFmt/fp/format.
addFeatureExtractors now takes type[FeatureExtractor] or a Collection of
them; DataSet's featureExtractors arg is typed likewise.

Parameterize the histogram helpers' Counter returns (Counter[str], [float],
[int]) and type their pitches arguments as Iterable[pitch.Pitch]. Thread the
new SecondsMapEntry through formSecondsMap and formBeatHistogram.

In outputFormats, lineBreak defaults to '\n' (str) and the redundant
'if lineBreak is None' guards are removed; _getOutputFormatFromFilePath uses
rsplit(maxsplit=1) and a single _getOutputFormat call.

AI-assisted (Claude).
Add PEP 695 aliases (StreamOrPath, DataSource, ValueOrFunction) and use them
to type addData, addMultipleData, DataInstance.__init__, and setClassLabel.
addMultipleData's dataList accepts a Sequence of sources or a MetadataBundle.

Typing surfaced two cleanups: addData's local `s` was dead (assigned, never
read) so it's removed, and addMultipleData now builds normalized local lists
instead of reassigning its parameters to different types.

AI-assisted (Claude).
@coveralls

coveralls commented Jun 25, 2026

Copy link
Copy Markdown

Coverage Status

coverage: 93.295% (+0.2%) from 93.134% — typing_features into master

extractorsById discarded the results of featureId.replace('-', '') and
.replace(' ', ''), so hyphen/space variants such as 'p-20' or 'p 21'
silently returned no extractor. Assign the result back so the
normalization actually applies, and add a doctest covering it.

Also remove the unreachable None member from the format-membership list
in DataSet._getOutputFormat, since featureFormat is typed str.

AI-assisted (Claude)
…place no-op

Per the project typing convention (no t.Any; use real types or leave
untyped):

- Add a `ClassValue = str|float|int` alias and thread it through
  `ValueOrFunction`, `DataInstance._id`/`_classValue`, `getClassValue`,
  `getUniqueClassValues`, `_dataSetParallelSubprocess`, and
  `addMultipleData` (whose `classValues` is also loosened from
  `Sequence[str]` to `Sequence[ValueOrFunction]`, restoring the ability to
  pass per-item callables). `getClassValue`/`getId` are restructured to use
  a local so the callable union narrows without t.Any.
- Type the public utility functions' parameters: extractorById,
  extractorsById, vectorById, getIndex, allFeaturesAsList, and
  formMidiIntervalHistogram. extractorsById now narrows str-vs-iterable with
  isinstance (equivalent to common.isIterable, which treats str as a scalar)
  so the types check.
- Parameterize bare list/tuple returns and add return types to
  DataSet.write / OutputFormat.write / the OutputFormat.getHeaderLines
  overrides. DataInstance.__getitem__ is deliberately left untyped (its
  "form" return is genuinely heterogeneous).
- ARFF getHeaderLines now joins class values via str(v); the previous
  ','.join(values) raised TypeError on numeric class values.

Also fix an unrelated unassigned str.replace no-op in lily/lilyObjects.py
(_reprInternal discarded the newline-flattening result).

AI-assisted (Claude)
Covers the case where DataSet class values are ints/floats rather than
strings: OutputARFF.getHeaderLines must produce `@ATTRIBUTE class {3,4}`
without raising. Before the str(v) coercion, ','.join(values) raised
`TypeError: sequence item 0: expected str instance, int found`.

AI-assisted (Claude)
@mscuthbert

mscuthbert commented Jun 25, 2026

Copy link
Copy Markdown
Member Author

b299663, 8e45d1c, and 2954f69 were all from an adversarial review with another agent. Looking good.

@mscuthbert mscuthbert merged commit ef798e5 into master Jun 25, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants