Awesome-omni-skill numpy-string-ops
Vectorized string manipulation using the char module and modern string alternatives, including cleaning and search operations. Triggers: string operations, numpy.char, text cleaning, substring search.
install
source · Clone the upstream repo
git clone https://github.com/diegosouzapw/awesome-omni-skill
Claude Code · Install into ~/.claude/skills/
T=$(mktemp -d) && git clone --depth=1 https://github.com/diegosouzapw/awesome-omni-skill "$T" && mkdir -p ~/.claude/skills && cp -r "$T/skills/data-ai/numpy-string-ops-majiayu000" ~/.claude/skills/diegosouzapw-awesome-omni-skill-numpy-string-ops && rm -rf "$T"
manifest:
skills/data-ai/numpy-string-ops-majiayu000/SKILL.mdsource content
Overview
NumPy's
char submodule provides vectorized versions of standard Python string operations. It allows for efficient processing of arrays containing str_ or bytes_ types, though it is being transitioned to a newer strings module in recent versions.
When to Use
- Cleaning large text datasets (e.g., stripping whitespace, normalization).
- Performing batch substring searches across thousands of records.
- Concatenating columns of text data using broadcasting.
- Converting character casing for entire datasets simultaneously.
Decision Tree
- Starting new development?
- Use
if available;numpy.strings
is legacy.numpy.char
- Use
- Comparing strings with potential trailing spaces?
comparison operators automatically strip whitespace.numpy.char
- Concatenating a constant prefix to an array of names?
- Use
.np.char.add(prefix, name_array)
- Use
Workflows
-
Batch String Concatenation
- Create two arrays of strings, A and B.
- Use
to join them element-wise.np.char.add(A, B) - Broadcasting applies if one array is a single string and the other is multidimensional.
-
Cleaning Text Datasets
- Identify an array of messy text.
- Apply
to remove whitespace.np.char.strip(arr) - Use
to normalize casing across the entire dataset.np.char.lower(arr)
-
Finding Substrings in Arrays
- Use
.np.char.find(text_array, 'target_word') - Identify elements with non-negative indices (where the word was found).
- Filter the original array using boolean indexing based on the search result.
- Use
Non-Obvious Insights
- Legacy Status: The
module is considered legacy; future-proof code should look towards thechar
alternative.numpy.strings - Implicit Stripping: Unlike standard Python
,==
module comparison operators strip trailing whitespace before evaluating equality.char - Vectorization Reality: While these operations are vectorized, string manipulation is inherently less performant than numeric math because strings have variable lengths and require more complex memory management.
Evidence
- "Unlike the standard numpy comparison operators, the ones in the char module strip trailing whitespace characters before performing the comparison." Source
- "The numpy.char module provides a set of vectorized string operations for arrays of type numpy.str_ or numpy.bytes_." Source
Scripts
: Routines for batch text cleaning and search.scripts/numpy-string-ops_tool.py
: Simulated string concatenation logic.scripts/numpy-string-ops_tool.js
Dependencies
(Python)numpy