自然言語処理

Azureに本好きを食わせる

Azureにも本好きを食わせてみた。しかし、本の頻度分布多すぎ。 import codecs import configparser from azure.core.credentials import AzureKeyCredential from azure.ai.textanalytics import TextAnalyticsClient config = configparser.ConfigParser() config.read('azure.config') endpoint = config['AZURE']['azure_endpoint'] key = config['AZURE']['azure_ai_key'] client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key)) ifs = codecs.open('N4830BU-1.txt', 'r', 'utf-8') lines = ifs.readlines() documents = [''.join(lines)] response = client.recognize_entities(documents, language = "ja") result = [doc for doc in response if not doc.is_error] for doc in result: for entity in doc.entities: print(entity.text, entity.category) プロローグ Organization 本須　麗乃 Person もとすうら Person 22歳 Quantity 本 Product 誰か PersonType 筆者 PersonType 本 Product 本屋 Location 図書館 Location 写真集 Product 外国 Location 本 Product 百科事典 Product 文学全集 Product 紙 Product 専門誌 Product 雑誌 Product 小説 Product ライトノベル Product 絵本 Product 日本 Location 素人が PersonType 同人誌 Product パラ Quantity 美酒 Product 図書館 Location 本 Product 書庫 Location 本 Product 本 Product 紙 Product インク Product そこに Location 本 Product 本 Product 書庫 Location 本 Product 本 Product 本 Product 畳 Product ベッド Product 本 Product わたし PersonType 大地震 Event 本 Product ぇ Person 司書 PersonType 大学図書館 Location 神様 PersonType 転生 Event 次 Quantity 本 Product 図書館 Location 司書 PersonType 本 Product 司書 PersonType 本 Product 本 Product 本 Product 本 Product 紙 Product インク Product 本 Product 神様 PersonType わたし PersonType 本 Product ifs.close()

GiNZAに本好きを食わせる

GiNZAに本好きの下克上のプロローグを食わせてエンティティ認識を試してみた。本須　麗乃が本と須でぶった切れた。なんか、国家安康君臣豊楽っぽくてなんだかなぁ。 from ginza import * import codecs import spacy nlp = spacy.load("ja_ginza") # GiNZAモデルの読み込み ents = [] with codecs.open('N4830BU-1.txt', 'r', 'utf-8') as text: for line in text: doc = None try: doc = nlp(line.strip()) except: pass if doc: for ent in doc.ents: ents.append(ent) for ent in ents: print(ent.text, ent.label_) 須　麗乃 Person 22歳 Age 三度 Frequency 顔 Animal_Part ニヨニヨ Doctrine_Method_Other 一冊 N_Product 目 Animal_Part 教育学 Academic 民俗学 Academic 数学 Academic 物理 Academic 化学 Academic 生物学 Academic 芸術 Academic 体育 Academic 人類 Mammal 一冊 N_Product 日本 Country 日光 Domestic_Region 肌 Animal_Part 司書資格 Position_Vocation 大学図書館 Facility_Other 司書 Position_Vocation 一日 Period_Day 司書 Position_Vocation 人間 Mammal

OpenAI Example on python

OpenAIのAPIが使えるようになったので、噂のGPT-3を試してみた。 import configparser import openai config = configparser.ConfigParser() config.read('openai.config') openai.api_key = config["OPENAI"]["API_KEY"] response = openai.Completion.create( engine="davinci", prompt="人類は多くの問題を抱えていた。大量の破壊兵器・増えつづけた人口・国際的なテロ・国家間の極端な貧富の差･･･これら問題を解決する為にある計画が実現に向かう。地球上すべての国家をある1つのコンピュータによって統括しようという大胆な計画。そしてその中央処理装置はMESIAと呼ばれていた。", temperature=0.7, max_tokens=60, top_p=1.0, frequency_penalty=0.0, presence_penalty=0.0 ) print(response) { "choices": [ { "finish_reason": "length", "index": 0, "logprobs": null, "text": " \u5927\u7fa9\u540d\u5206\u3068\u306f\u4f55\u304b\uff1f\u9053\u5fb3\u7684\u306b\u6b63\u5f53\u306a\u7406\u7531\u3068\u306f\uff1f\u7b54\u3048\u306f\u305f\u30601\u3064\u3002\u300c\u65b0\u3057\u3044\u751f\u547d\u4f53\u3092" } ], "created": 1616638002, "id": "cmpl-2h9gQIewmSimGqJ6AZgDAhqBLCSEh", "model": "davinci:2020-05-03", "object": "text_completion" } response_text = response["choices"][0]["text"] response_text ' 大義名分とは何か？道徳的に正当な理由とは？答えはただ1つ。「新しい生命体を'

Azure Text AnalyticsをPythonから呼び出す

import configparser from azure.core.credentials import AzureKeyCredential from azure.ai.textanalytics import TextAnalyticsClient import seaborn as sns import pandas as pd sns.set_style('white') config = configparser.ConfigParser() config.read('azure.config') endpoint = config['AZURE']['azure_endpoint'] key = config['AZURE']['azure_ai_key'] client = TextAnalyticsClient(endpoint=endpoint, credential=AzureKeyCredential(key)) documents = [ "iPhoneのマップやばいよ" ] result = client.analyze_sentiment(documents) docs = [doc for doc in result if not doc.is_error] doc = docs[0] confidience_scores = {key:value for key, value in doc.confidence_scores.items()} sentiment = pd.Series(confidience_scores) sentiment positive 0.07 neutral 0.90 negative 0.03 dtype: float64 sentiment.plot.bar() <AxesSubplot:>

Cognitive Servicesのテキスト分析にPowerShellから投げてみる

PowerShellからCognitive Servicesのネガポジ分析をたたいてみました。 $apiKey = “************************” $texts = @(“ColorfulのSSDめっちゃヤバいwww Galaxy S6や廃棄品のSSDから剥がしたフラッシュやIntelの偽物が搭載されているのが確認されてるらしいwww 安価なNVMe CN600もリマークチップを使ってるとのこと、、、最近めっちゃ秋葉原で売ってるけど買わない方が良さそうだな、、、”) $headers = @{ ‘Ocp-Apim-Subscription-Key’ = $apiKey } $endPoint = “https://eastasia.api.cognitive.microsoft.com/text/analytics/v2.0/sentiment” $texts | ForEach-Object { $text = $_ $obj = @{ documents = @(@{ language = “ja” id = “1” text = $text }) } $json = ConvertTo-Json $obj $result = Invoke-RestMethod -Uri $endPoint -Headers $headers -Body $json -ContentType ‘application/json; charset=utf-8’ -Method Post $items = $result.documents $item = $items[0] $item.score }