Applications of Natural Language Processing (NLP) in industry
by tangbasky
In this chapter, we will introduce specific industrial application cases of NLP. It will elaborate on how NLP empowers search engines, customer service systems, medical diagnosis systems, and intelligent assistants. Through the explanation of these cases, readers will gain a clear understanding of how NLP is applied to actual business scenarios and begin to form a concept of the NLP design framework in their minds.
NLP’s Empowerment to Search Engines
Search consists of two parts: query and document. A query is the user’s input, mainly representing their search needs; therefore, NLP is required to perform NLP-based understanding of the query. A document is the target of the search. In search engines, target documents are usually composed of individual web pages, but since queries are textual, documents also need to undergo semantic conversion through NLP in many cases.
To meet user needs, the first step is to conduct query understanding on the user’s query. General NLP-based understanding includes operations such as word segmentation, key information extraction, semantic embedding, and query classification. Documents also undergo the same text understanding operations; in addition, documents involving videos, images, and audio will also be processed for content understanding.
After understanding the query and document, how can we find documents relevant to the query from a vast amount of documents? Here, we will use Figure 1 as an example for a simple illustration.
(1) Through entity recognition, we can identify that “Morocco” is a location and “Monday” is a time. This allows us to filter out documents that either lack a location entity or have a location entity other than “Morocco”; similarly, we can filter out documents that lack a time entity or have a time other than “Monday”. This gives us a filtered candidate set of documents, denoted as S.
(2) From the candidate set S, using the embeddings of the query and documents, we can directly calculate the text matching similarity between the query embedding and document embeddings, and select the top-k most similar documents as candidates.
(3) From the candidate set S, select documents categorized as “travel”.
(4) Combine the candidates obtained from steps (2) and (3), and use a fine-ranking model to output the top-n documents as the final result to present to the user.
From the above search case, we can see which NLP technologies are utilized. First, word segmentation requires tokenization of both the query and the text documents. Then, key information extraction relies on entity recognition technology. Embedding involves text semantic embedding techniques, and determining the category of the query uses text classification technology. Additionally, operations to enhance query capabilities, such as query rewriting and long-text summarization, also heavily depend on various NLP capabilities.
Therefore, search engines are deeply reliant on various NLP capabilities. Moreover, they do not depend on a single NLP capability but on an NLP system integrating multiple capabilities, where each task complements others to ultimately deliver a good user experience. This characteristic is not only evident in search engines but will also be deeply reflected in other products discussed later.
NLP’s Empowerment to Customer Service Systems
AI customer service systems are currently widely used in banking, finance, telecommunications, and other industries, saving companies significant labor costs. They mainly consist of modules such as intent recognition, script design, knowledge understanding, knowledge matching, knowledge clarification, and response generation. Most AI customer service systems on the market are usually tailored to the specific business needs of their respective companies. Their main functions are twofold: product recommendation and after-sales service. However, whether for product recommendation or after-sales service, a predefined script can be designed. The system identifies the user’s product needs, guides the user into the corresponding business script, and completes the entire process to finalize the business. Below is an explanation of how each module operates in an AI customer service system.
Intent Recognition
The primary function of intent recognition is to analyze the user’s query and determine which business category it belongs to, thereby transferring the service to the corresponding module. For example, if a user says, “I want to book a flight,” the system will automatically route them to the flight booking service process. Intent recognition is the starting point of the entire AI customer service system and directly affects the quality of service, so high accuracy is required. It involves NLP technologies such as automatic speech recognition (ASR) and text classification.
Script Design
Script design is a crucial part in AI customer service systems, directly impacting the quality of user interaction. It mainly includes dialogue flows and corresponding response strategies designed based on business needs. These scripts are customized for different scenarios, such as pre-sales consultation, product recommendation, order processing, and after-sales service. Additionally, scripts need to account for various user reactions — for example, how to redirect users back to the script if they digress, or how to obtain key information through context or knowledge clarification when critical details are missing. The overall script framework should not be overly complex, but many details require careful consideration and design. In summary, an excellent script can not only accurately respond to user queries but also guide users to complete intended operations through natural and smooth dialogue.
Knowledge Understanding
This part is similar to document understanding and query understanding in search engines, so it will not be elaborated on here. Furthermore, algorithm engineers need to classify and organize business data in conjunction with specific operations, enabling users’ needs to be quickly matched with the most relevant business knowledge. Since customer service knowledge often involves professional Q&A, knowledge can be categorized by similarity and labeled with topic headings for easy retrieval. Meanwhile,common user questions can be compiled with corresponding answers to form a query-answer database, allowing relevant answers to be found through query-to-query matching.
Knowledge Matching
After intent recognition, knowledge understanding, and script design, the system selects the corresponding service script based on intent recognition and begins interacting with the user. During interaction, when the user raises business-related questions, the system understands the query and searches for answers in the knowledge base associated with the script. This process of finding answers is essentially knowledge matching. Common methods of knowledge matching include:
1). Query-to-query: Searching the query-answer database for queries related to the user’s query and using the documents corresponding to these related queries as candidate answers.
2). Query-to-document: Directly retrieving relevant documents from the knowledge database based on the user’s query.
3). Topic-to-topic: Finding documents with the same topic in the knowledge base based on the topic understood from the user’s query.
4). Query rewriting: Rewriting the query to enhance relevance before performing query-to-query or query-to-document searches.
Knowledge Clarification
Sometimes, a user’s query may lack key information, making it difficult to complete the tasks outlined in the script. This requires knowledge clarification. The role of the knowledge clarification module is to ask for more details or confirm the user’s true intent when ambiguity arises, ensuring that subsequent services or answers best meet user needs. For example, if a user says, “I want to learn about your products,” the system may ask which specific product they are interested in to provide more accurate information.
Response Generation
After the previous steps, in each conversation, beyond the scripted responses, numerous relevant documents are retrieved as reserves for answering knowledge-related queries. Before the advent of large models, the common approach was to select the most relevant information to present to the user through a series of matching or ranking strategies. However, this approach had several issues: (1) The information presented by selecting only the most relevant segment might be insufficient. (2) Displaying too much information could include irrelevant content, impairing the user experience. (3) Presenting pre-prepared knowledge directly might result in rigid responses. The strong generative capability of large language models addresses these issues: they can accept large volumes of input, summarize content, and generate appropriate responses, making them the optimal choice for the final response module in Q&A systems.
In summary, the various modules of AI customer service systems collaborate closely to provide users with efficient and convenient service experiences. With the continuous development of artificial intelligence technology, we can expect AI customer service systems to become more intelligent in the future. They will not only be able to understand complex language expressions but also continuously optimize their service capabilities through learning, creating greater value for enterprises and users.
NLP’s Empowerment to Medical Systems
The application of Natural Language Processing (NLP) technology in the medical field, especially in integrating multiple examination reports to form a comprehensive doctor’s assessment report, holds enormous potential. NLP helps doctors quickly generate comprehensive and structured assessment reports by automatically extracting, analyzing, and summarizing valid information, then structuring and aggregating it. The entire process strongly relies on technologies such as text data preprocessing, key information extraction, and knowledge matching. Below are the key steps through which NLP empowers medical diagnosis:
Data Preprocessing
Text Cleaning and Format Unification:
(1). Collect patients’ examination reports from multiple sources, including scale assessment reports, laboratory test reports, imaging reports, pathological reports, and electronic health records (EHR or EMR). (2). convert reports from different sources into a unified text format, such as removing HTML tags and standardizing date and time formats. (3). Remove irrelevant characters, duplicate content, and other noise to ensure the text is clean and consistent.
Word Segmentation and Annotation:
Split text into words or phrases to facilitate subsequent processing. Annotate the part-of-speech (noun, verb, etc.) for each word to aid in understanding sentence structure. Identify and annotate key entities in the text, such as disease names, drug names, and test results.
Information Extraction
Key Information Extraction and Symptom Recognition:
Use pre-trained NLP models or rule engines to identify symptom descriptions in reports. Extract diagnostic conclusions from each report, such as “pneumonia” or “fracture”. Extract specific test results, such as blood routine data and detailed descriptions from imaging tests.
Data Structuring and Tabulation:
Organize extracted key information into tabular forms for easier subsequent analysis and aggregation. Map different terms to standard medical terminology sets (e.g., SNOMED CT) to ensure term consistency.
Information Fusion and Analysis
Pattern Recognition and Correlation Analysis:
Identify potential correlations between different examination reports through big data analysis, such as the relationship between certain symptoms and specific test results. Compare results from multiple examinations to identify trends in the patient’s condition, such as increases or decreases in indicators.
Comprehensive Evaluation and Personalized Recommendations:
Calculate comprehensive scores (e.g., health risk scores, prognosis scores) based on multiple examination results. Provide personalized treatment recommendations by integrating factors such as the patient’s medical history and lifestyle.
Report GenerationAutomatic Summary and Key Point Compilation:
Extract the most important information from all examination reports to automatically generate concise and clear summaries. Ensure the generated assessment reports comply with hospital standards, improving their quality and consistency.
Visualization and Chart Display:
Intuitively present trends in test results and key data using charts and graphs. Generate interactive assessment reports, allowing doctors to view detailed information or original reports by clicking links.
Get tangbasky’s stories in your inbox
(1). Ranking Models (LLM, Ranking Models):
Treatment recommendations: Based on comprehensive evaluation results, recommend optimal treatment pathways and interventions.
Evidence support: Integrate the latest clinical guidelines and research findings to provide doctors with scientific basis for decision-making.
(2). Risk Early Warning:
Early warning: Predict potential future health issues through data analysis to provide early alerts.
Prognosis assessment: Evaluate the patient’s prognosis, predict treatment effects and recovery time, and provide references for doctors in formulating personalized treatment plans.
Continuous Learning and Improvement
- Data Feedback:
Learning mechanism: The system can continuously learn and optimize itself by receiving new case data, improving the accuracy and reliability of assessments.
- User feedback:
Collect feedback from doctors and patients to continuously improve system performance and user experience.
Intelligent Assistants
Intelligent assistants represent the culmination of current NLP, with their most fundamental manifestation being their function as a “brain.” Beyond tasks like text classification and text extraction mentioned earlier, in the era of large language models, intelligent assistants can handle any text-based task. In addition to the tasks previously noted, they can assist with writing papers, booking flights, chatting, summarizing content, and more. Leveraging the powerful modeling and reasoning capabilities of large language models, intelligent assistants have emerged as the optimal scenario for AI Agents. Through prompts tailored to different business tasks — each with its own specific objective — multiple tasks collaborate to form an intelligent assistant.
Currently, a leading example of such an intelligent assistant is Manus, an autonomous AI agent built around foundational models (primarily Claude 3.5/3.7 and Alibaba’s Qwen). It operates in a cloud-based virtual computing environment with full access to tools such as web browsers, shell commands, and code execution. A key innovation of the system is its ability to invoke external tools (e.g., using executable Python code) as part of its action mechanism, enabling it to independently perform complex operations. Its architecture includes an iterative agent loop (analyze → plan → execute → observe), complemented by specialized modules for planning, knowledge retrieval, and memory management. Manus uses file-based memory to track progress and store information across operations. The system can be replicated using open-source components, including CodeActAgent (a fine-tuned Mistral model), Docker for sandboxing, Playwright for web interaction, and LangChain for orchestration.
Let’s use Manus as an example to explain how to implement a mature Manus technical architecture:
Foundational Model Architecture
Manus is built on top of powerful LLM backbones (e.g., fine-tuned versions of Claude and Qwen). In other words, Manus’ “brain” is a combination of existing LLMs.
Some reports suggest that Manus can even dynamically invoke multiple models to complete different subtasks (“dynamic multi-model invocation”), leveraging the strengths of each model — for instance, using Claude 3 for complex logical reasoning, GPT-4 for programming tasks, and Google’s Gemini for broad knowledge queries. Crucially, Manus acts as an orchestrator of top-tier LLMs rather than a single standalone model — a design that allows it to harness the best AI capabilities for each specific task.
Cloud Agents and Tool Sandboxes
Unlike typical text-only chatbots, Manus operates in a cloud-based virtual computing environment: a full Ubuntu Linux workspace with internet access. Manus can use a range of tools and software, much like an advanced user.
As defined in its system prompt, Manus has access to a shell with sudo privileges, a controllable web browser, a file system, and interpreters for programming languages such as Python and Node.js. It can even launch web servers and expose them to the internet. All operations occur server-side — Manus continues working even when the user’s device is turned off, distinguishing it from agents running in the user’s browser (e.g., OpenAI’s experimental “Operator”).
The sandboxed tool environment means Manus is not limited to natural language responses; it can take actions: browsing websites, filling out forms, writing and executing code, or autonomously calling APIs. This architecture makes Manus more akin to a digital worker in the cloud than merely a conversational bot.
Agent Loop and Orchestration
Manus achieves autonomy through an agent loop. At a high level, each phase of the loop includes:
Analyzing the current state and user request (from an event stream of recent interactions)
Planning/selecting an action (deciding which tool or operation to use next)
Executing the action in the sandbox
Observing the results and adding them to the event stream
This loop repeats until Manus determines the task is complete, at which point it outputs the final result to the user and enters an idle state. The design explicitly limits the agent to one tool operation per step — it must wait for the action to return results before deciding on the next step. This control flow prevents the model from running amok with unconfirmed operations and allows the system (and user) to monitor each step.
Planning Module (Task Decomposition)
To manage complex tasks, Manus integrates a planning module that breaks down high-level goals into ordered lists of steps. When a user provides a goal or project, the planning module generates a plan resembling pseudocode or an enumerated list (with step numbers, descriptions, and statuses) and inserts it into Manus’ context as a “plan” event.
For example, if tasked with creating a data visualization, the planning module might generate: 1. Collect data, 2. Clean data, 3. Generate charts, 4. Save and send charts. Manus uses this as a roadmap, executing each step sequentially. The plan can be updated in real-time as tasks evolve. In each iteration, the agent references the plan, ensuring all steps are completed to finish the task.
This mechanism endows Manus with forward-looking, structured decision-making capabilities, rather than mere turn-by-turn reactivity. The concept mirrors how AutoGPT or BabyAGI maintain goal lists, ensuring the AI does not lose sight of the overarching objective while performing minor operations.
Knowledge and Data Modules
Manus does not rely solely on knowledge internalized by LLMs; it includes a knowledge module that provides relevant references or best practices from a knowledge base when needed. This information appears in the context as “knowledge” events, offering domain-specific or task-relevant insights (e.g., style guides or factual snippets for paper writing).
Additionally, Manus can use a data source module to fetch factual data via APIs. The agent has access to a pre-defined library of data APIs (e.g., for weather, finance) and their documentation. When using these APIs, the agent invokes them via Python code instead of web scraping, as the system prompt prioritizes authoritative data sources over general web information.
This approach integrates Retrieval-Augmented Generation (RAG): the agent proactively acquires external knowledge and data rather than relying on parameterized memory. Developers confirm that Manus “supports RAG,” combining external data retrieval with the model’s generative capabilities. All retrieved facts or data are injected into the event stream as read-only context, enabling the LLM to incorporate them into reasoning and outputs.
Multi-Agent Collaboration
A standout feature of Manus’ architecture is its multi-agent (multi-module) design. Unlike a single agent handling all tasks, Manus is structured to let specialized sub-agents or components process different aspects of a task in parallel. For example, one sub-agent might focus on web browsing and information gathering, another on coding, and another on data analysis — each operating in its own sandbox. A high-level coordinator (Manus’ “main brain”) orchestrates these sub-agents, assigns tasks, and integrates results.
This design, inspired by distributed problem-solving, enhances efficiency by “employing” specialized agents to handle complex projects. It also boosts robustness: if one agent is busy or stuck, another can proceed with other subtasks. Ultimately, this multi-agent architecture enables Manus to deliver substantial outcomes requiring multiple steps and skills — such as generating formatted Excel reports or deploying websites — beyond mere text output. Behind the scenes, sub-agents might write code, launch servers, and validate web outputs, all as part of a single user request. From the user’s perspective, Manus seamlessly handles the entire project (notably, this complexity is hidden; users interact with a single AI assistant, while multi-agent coordination occurs internally).
In summary, building a powerful task-oriented intelligent assistant involves core steps: selecting a robust LLM, equipping it with rich prompts and/or fine-tuning to act as an agent, providing a secure tool environment (for code, web access, etc.), and implementing a loop to keep it on track. This section offers only a brief overview; detailed content will be explored in the AI Agent section.
In conclusion, with the development of NLP technolgoies, many complicated businss taskes could be fastly solved. However, as a good NLP enginner, it is important to select an appropriate NLP system architecture for his business. This is because the more complicated task are, the more complex architecture are. For a simple NLP task like text classification, it may only few computing resources, and deploy one model. But for a complex business task like customer service assistant, it may require a considerable number of computing resources, and various modules with different functions to maintain the robustness and reliabilities of this system. I would elaborate the design details tailered to different business scenarios in subsequent chapters.
https://medium.com/@tangbasky/applications-of-natural-language-processing-nlp-in-industry-04a863c9d569a>